III ) UNits and Coordinate systems

bonnarel at alinda.u-strasbg.fr bonnarel at alinda.u-strasbg.fr
Wed Oct 29 08:49:19 PDT 2008


Follow up of my last Sunday compilations


>
> The team had a mailing list, some teleconfs, partial side-meeting in
> Garching and Trieste.
>
> Recently we had a very hot discussion on several aspects which I try to
> tidy up there. The DM1 session (and part of DM 2) of this interop will
> show various presentation showing where we are in this questions.
>
>       I ) Models "dissemination" , usage and suitability with the
> user/developper/dataprovider needs. Are the model too complex ?
>
>
>      II ) Formats and vocabularies
>         Do we need utypes and ucds ? Is a model transportable in VOTABLE.
> IS JSON alternative or complementary to XML ?
>
>      III )  UNits and Coordinate systems
>             Do we force dataproviders to use only one or do we allow
> various systems by providing acccurate description using STC and a
> (coming) Unit datamodel...
>
>
>       IV ) How are metadata related to data and Data access layer...
>
> I will post now 4 emails with the best part of the discussion on these
> subjects...
>
> Cheers
> François
>
III )  UNits and Coordinate systems

--------------------------------------------------------------------------------
*Step1 FB answers to Fabien
--------------------------------------------------------------------------------
<1.3 Stop using STC. I am not saying that STC is bad or useless, but I 
believe <it is not needed for characterization. The simple way to avoid 
using it is to <fix in the standard which reference frame should be 
used for each <characterization parameter. The same can be done for 
units. If a fixed unit is <defined in the standard, there is no need to 
even specify it in the <serialization (it is implicit). For example I 
would suggest to define sky <directions in ICRS in degree.
<The usual reaction to this idea is that people who own data recorded 
in another <native reference frame and unit feel discriminated. But it 
has to be clear that <we speak here about the characterization 
meta-data only. It doesn't mean that <the real data need to be 
converted from its native unit (for this STC would be <perfect), but 
only that descriptors such as the central position should be <given in 
a common reference frame. Having a single unit and reference frame 
<allow software developers to concentrate on only one instead of being 
forced to <parse and convert all of them. Furthermore, if a data 
provider owning <observations in a non standard reference frame (such 
as planetary
<observations) is unable to convert its own data to a standard one, he 
can not <expect all clients VO software to do that for him!

----> all this discussion leads us back to the beginning of the VO. Do
----> we want to reopen this and with which arguments?
I have at least one conter example to what you say.
    Assume an astronomer using a catalog where positions are in 
Galactic coordinates (an optical PN catalog). He wants to find out 
Radio catalogues
  of PN. Probably most of them are also natively in galactic coordinates.
Should we force all catalog servers to convert in ICRS to do these 
comparisons ?
----------------------------------------------------------------------------------
*Step 2 Anita answers to Fabien
--------------------------------------------------------------------------------
On coordinates systems and units, other models (Registry, Cone Search) 
have mandated ICRF, search radius in decimal degrees etc. which has 
been successful to varying degrees (some would argue even these are too 
limiting).  One of the motivations of Char was that these other models 
did not contain enough information to pass data on to packages for 
calibration, analysis etc.  (or not conveniently).  To follow on from 
Francois' example, converting individual positions in and out of 
Galactic coordinates is tedious; converting a large region which might 
be a simple box in Galactic is far worse.  Moreover, it is impossible 
for planet- and Solar-centred coordinate systems.  A similar argument 
applies to frequency v. wavelength; my radio data has a single spectal 
resolution of
1 kHz between 4.7-7.8 GHz in frequency units, but if I try and express 
it in wavelengths it changes by almost a factor of 2 across the band.  
Using homogenised units and hence approximate values is OK for crudely 
finding data but Char is supposed to do moare than that.
---------------------------------------------------------------------------------
*Step 3 Fabien's defence
--------------------------------------------------------------------------------
> ----> all this discussion leads us back to the beginning of the VO. 
> Do ----> we
> want to reopen this and with which arguments?

I was not yet there! Sorry for the repetitions.. but I still don't see 
any good argument against that.

> I have at least one conter example to what you say.
>    Assume an astronomer using a catalog where positions are in 
> Galactic c (an optical PN catalog). He wants to find out Radio 
> catalogues of PN.
> Probably most of them are also natively in galactic coordinates. 
> Should we force all catalog servers to convert in ICRS to do these 
> comparisons ?

Yes, if they want to be VO compatible. Imagine you have then another 
catalog in ecliptical coordinates, and another one in equatorial 1950, 
and that the user also want to compare that with spectrums taken by a 
satellite in a strange reference frame and time frame. What do you do 
then? Assuming that the scientist uses a VO tool, he will expect that 
everything works automatically, this means that all the VO tools need 
to be able to understand every possible reference frames in the world 
which means to implement the full STC (not only parsing, but also the 
coordinate conversion code dealing with e.g. relativist frames etc..), 
which is either impossible, either extremely complicated.
My solution fixes the problem in a simple way: the data provider makes 
the conversion once and then it's over: it's easier, and even much 
better for the clients in term of performances.
--------------------------------------------------------------------------------
*Step 4 Fabien answers Anita
--------------------------------------------------------------------------------
> On coordinates systems and units, other models (Registry, Cone 
> Search) have mandated ICRF, search radius in decimal degrees etc. 
> which has been successful to varying degrees (some would argue even 
> these are too limiting).

Indeed, the fact that the standard strictly defied that the query 
regions for SIA have to be defined in ICRS made it easy to implement.
And it never forbid anyone to look for data in galactic coordinates.

   One of the motivations of Char was that these other models
> did not contain enough information to pass data on to packages for 
> calibration, analysis etc.  (or not conveniently).

This is an important point: to my opinion, this is not the goal of 
characterization to describe the native file format of all data sets, 
and which parameters to use for a given reduction tool. I think the 
goal is more to describe in a generic way the scientific content of a 
data set, so that tools can know what is potentially inside without 
having to even open  it.

> To follow on from
> Francois' example, converting individual positions in and out of 
> Galactic coordinates is tedious; converting a large region which 
> might be a simple box in Galactic is far worse.

If it's tedious for the data provider, you can imagine how it is for 
the poor software developer who needs to implement all possible 
conversion in the world!

Moreover, it is impossible
> for planet- and Solar-centred coordinate systems.

It is possible to describe them if the observer position is clearly 
specified. Observer position is as important as the date in these cases.

    A similar argument
> applies to frequency v. wavelength; my radio data has a single 
> spectal resolution of 1 kHz between 4.7-7.8 GHz in frequency units, 
> but if I try and express it in wavelengths it changes by almost a 
> factor of 2 across the band.  Using homogenised units and hence 
> approximate values is OK for crudely finding data but Char is 
> supposed to do moare than that.

Well, that's again the question. I think high level characterization 
descriptors should not try to do more than that. Going more in depth is 
the problem of the level 4, but this is another problem.
---------------------------------------------------------------------------------
*Step 5 Anita to Fabien
---------------------------------------------------------------------------------
>
> Indeed, the fact that the standard strictly defied that the query 
> regions for SIA have to be defined in ICRS made it easy to implement. 
> And it never forbid anyone to look for data in galactic coordinates.
Just from finding it?

> This is an important point: to my opinion, this is not the goal of 
> characterization to describe the native file format of all data sets, 
> and which parameters to use for a given reduction tool. I think the 
> goal is more to describe in a generic way the scientific content of a 
> data set, so that tools can know what is potentially inside without 
> having to even open  it.

I agree with you, even more strongly than some of my colleagues; I 
would like to see all data published in a reasonably calibrated, 
insstrument-independent format, but that is not always the case.

>> To follow on from Francois' example, converting individual positions 
>> in and out of Galactic coordinates is tedious; converting a large 
>> region which might be a simple box in Galactic is far worse.
>
> If it's tedious for the data provider, you can imagine how it is for 
> the poor software developer who needs to implement all possible 
> conversion in the world!

But it is already done.  There are libraries for every required conversion
- some even at enough precision for today's milli-arcsec accuracy 
instruments (e.g. as used inside STILTS, Vizier).

> Moreover, it is impossible
>> for planet- and Solar-centred coordinate systems.
>
> It is possible to describe them if the observer position is clearly 
> specified. Observer position is as important as the date in these 
> cases.

Agreed - and I think that is in STC.  To my mind the problem with STC 
is not in the concept but in the implimentation, and my impression with 
that has been that people have been hung up on trying to create a thing 
with beautiful internal logic and aesthetic completeness, not ad-hoc 
usefulness (with room for extensibility).  Perhaps for STC we need a 
large number of use cases to see what really is useful


>  A similar argument
>> applies to frequency v. wavelength; my radio data has a single 
>> spectal resolution of 1 kHz between 4.7-7.8 GHz in frequency units, 
>> but if I try and express it in wavelengths it changes by almost a 
>> factor of 2 across the band.  Using homogenised units and hence 
>> approximate values is OK for crudely finding data but Char is 
>> supposed to do more than that.
>
> Well, that's again the question. I think high level characterization 
> descriptors should not try to do more than that. Going more in depth 
> is the problem of the level 4, but this is another problem.

Fabien, I appreciate what you are trying to do but maybe it is more 
appropriate for the Registry than for Char. I want to know the spectral 
resolution of a data set to better than a factor of 2.  The libaries 
for all these conversions exist, what the VO needs to provide is a 
framework to implement them.
--------------------------------------------------------------------------------
*Step 6 Fabien to Anita (tennis game is going on)
-------------------------------------------------------------------------------
>>> To follow on from Francois' example, converting individual 
>>> positions in and out of Galactic coordinates is tedious; converting 
>>> a large region which might be a simple box in Galactic is far worse.
>>
>> If it's tedious for the data provider, you can imagine how it is for 
>> the poor software developer who needs to implement all possible 
>> conversion in the world!
>
> But it is already done.  There are libraries for every required 
> conversion - some even at enough precision for today's milli-arcsec 
> accuracy instruments (e.g. as used inside STILTS, Vizier).

Well, I am still waiting for a C++ implementation of STC... And even if 
there are very good libraries (and there is actually the excellent AST 
lib in C), it is already quite complicated for a developer just to 
integrate it and use it, especially when most application make no use 
of such a complexity for its core purpose.

>> Moreover, it is impossible
>>> for planet- and Solar-centred coordinate systems.
>>
>> It is possible to describe them if the observer position is clearly 
>> specified. Observer position is as important as the date in these 
>> cases.
>
> Agreed - and I think that is in STC.

Yes it's in STC.

To my mind the problem with STC is
> not in the concept but in the implimentation, and my impression with 
> that has been that people have been hung up on trying to create a 
> thing with beautiful internal logic and aesthetic completeness, not 
> ad-hoc usefulness (with room for extensibility).  Perhaps for STC we 
> need a large number of use cases to see what really is useful

I really think STC concepts are not bad. There are complicated because 
the reality is complicated! And an implementation will also therefore 
always be complicated. The only one I am familiar with which is doing 
something close to that is again the AST lib 
(http://www.starlink.ac.uk/~dsb/ast/ast.html). It has an excellent 
design but it is still non trivial to use. Using this in a software 
like VirGO would be like using a bazooka to kill a mosquito..

>
>>  A similar argument
>>> applies to frequency v. wavelength; my radio data has a single 
>>> spectal resolution of 1 kHz between 4.7-7.8 GHz in frequency units, 
>>> but if I try and express it in wavelengths it changes by almost a 
>>> factor of 2 across the band.  Using homogenised units and hence 
>>> approximate values is OK for crudely finding data but Char is 
>>> supposed to do more than that.
>>
>> Well, that's again the question. I think high level characterization 
>> descriptors should not try to do more than that. Going more in depth 
>> is the problem of the level 4, but this is another problem.
>
> Fabien, I appreciate what you are trying to do but maybe it is more 
> appropriate for the Registry than for Char. I want to know the 
> spectral resolution of a data set to better than a factor of 2.  The 
> libaries for all these conversions exist, what the VO needs to 
> provide is a framework to implement them.

The max resolution (or bounding box) would be enough for a VO 
application to select or reject a spectrum as a possible candidate. For 
further investigation of a spectrum, a more in depth characterization 
is indeed needed, but this is already level 4 charac, and this is a 
more complicated problem. I would say at the current status, you VO 
application would just download the spectrum and display it..
-------------------------------------------------------------------------------
*Step 8 Anita to Fabien
------------------------------------------------------------------------------
>
> Well, I am still waiting for a C++ implementation of STC... And even 
> if there are very good libraries (and there is actually the excellent 
> AST lib in C), it is already quite complicated for a developer just 
> to integrate it and use it, especially when most application make no 
> use of such a complexity for its core purpose.

Maybe that is the priority then - to impliment AST (which I believe is 
what STILTS does).

Regarding C++, quite a lot is presumably in aips++, btw.

>
> The max resolution (or bounding box) would be enough for a VO 
> application to select or reject a spectrum as a possible candidate. 
> For further investigation of a spectrum, a more in depth 
> characterization is indeed needed, but this is already level 4 
> charac, and this is a more complicated problem. I would say at the 
> current status, you VO application would just download the spectrum 
> and display it..
>

If I only wanted one spectrum I would not use the VO!

What you are describing is registry-level searches, and there is a lot 
to be said for getting the registry working better, but supposing I 
want to make Zeeman-splitting measurements of OH lines anywhere between 
1.6 and
6.1 GHz... I can use the Registry to find potential data between 18-5 
cm wavelength with a spectral resolution better than some value which 
would be adequate at one end of the wavelength range but not the other.
However, if that brings me back a list of 10000 spectra, I want to be 
able to narrow it down a bit more!  I want to do things which I can't 
do already.
--------------------------------------------------------------------------------
*Step 9 Peter Skoda comments
-------------------------------------------------------------------------------
>>> Moreover, it is impossible
>>>> for planet- and Solar-centred coordinate systems.

Still there is an issue of TARGETNAME - our stellar experience shows that
the postion of object cannot describe the spectrum - we have spectroscopic
binaries (e.g. Sirius A,B, Mizar (2 pairs of binaries etc and we simply
have to  NAME the object .
Some interesting discoveries may happen as well - someone will mention the
star as emission and later someone else claims it does not have emission .
And you can now start the detective investigation if this was physical
effect or the people swapped the ID of the stars ;-)

So TARGETNAME is required not only for planets bu for stars as well,
despite having POS . No-one supports this in VO clients !


>>> It is possible to describe them if the observer position is clearly 
>>> specified. Observer position is as important as the date in these 
>>> cases.

>> not in the concept but in the implimentation, and my impression with 
>> that has been that people have been hung up on trying to create a 
>> thing with beautiful internal logic and aesthetic completeness, not 
>> ad-hoc usefulness (with room for extensibility).  Perhaps for STC we 
>> need a large number of use cases to see what really is useful

I suppose the experts should tell their "cookbooks" in different fields of
astronomy - perhaps the ESA workshop in December could start this
discussion - I know that exoplanet hunting  people are very jealous to
give any information - even they require to fake the observation time in
the FITS headers! For Kepler mission there are requirements to
artificially noise the photometry data before processing and delivery to
the world - only the closed club of the very PI's will be able to see the
real data !
The serious discussion is ongoing about how to distort the data not to
allow the discovery of the planet but to be still somehow usefull for e.g.
asteroseismology .

>
> I really think STC concepts are not bad. There are complicated 
> because the reality is complicated! And an implementation will also 
> therefore always be complicated.

Perfectly said ! To start with something we have to make an abstraction
and say the tool developers - do this ! And when it works and is
regularly used by 90% of scientists, we can go step forward and satisfy
the
needs of the rest (e.g. the ultra high time precision for exoplanets ..)


> a software like VirGO would be like using a bazooka to kill a mosquito..

>>>  A similar argument
>>>> applies to frequency v. wavelength; my radio data has a single 
>>>> spectal resolution of 1 kHz between 4.7-7.8 GHz in frequency 
>>>> units, but if I try and express it in wavelengths it changes by 
>>>> almost a factor of 2 across the band.  Using homogenised units and 
>>>> hence approximate values is OK for crudely finding data but Char 
>>>> is supposed to do more than that.

The same issue is with nonlinear dispersion of echelle spectra - for
Fourier disentangling you need to rebin it in radial velocities - most
people do it several times - once during the reduction in rebinning to
equidistant lambda nad than after getting it from archives before
disentangling .. The original IRAF header describing the nonlinear
polynomials is much better to maintain from reduction. But the
incompatibility of IRAF wcs with Paper III (and the STC ) is a obstacle of
interoperability.



>>>
>>> Well, that's again the question. I think high level 
>>> characterization descriptors should not try to do more than that. 
>>> Going more in depth is the problem of the level 4, but this is 
>>> another problem.

I agree fully ! I have to define exact requiremnts in SSA params , see the
charc results in a clever way to decide what is usefull for me and use it
to select given spectra .
Thats all. Another issue is when I want co call some web service to
postprocess my spectra (e.g. on the grid)  Than I need full charc which
tells the archive server to the GRID service  .


>> resolution of a data set to better than a factor of 2.

HAHA  - the spectral resolution as a number is a chimera !
I tell everyone that it is nonse to tell its R=13420 - you can tell its
about 10-15000 - it depends on the processing - removal of instrumental
profile, you have to do a complicated measurement with laser comp to
measure it realy - It depend on the pointing slit orientation etc ...
It changes in echelle spectra from order to order, depends on binning ....
Most intruments give only rough estimate in the reference point.

I would personally only use the resolution  to disctinguish between
SLOAN and UVES stating the order of 1000 (e.g less than 5000 or more than
40000 )

> The max resolution (or bounding box) would be enough for a VO 
> application to select or reject a spectrum as a possible candidate.
Exactly

   For further
> investigation of a spectrum, a more in depth characterization is 
> indeed needed, but this is already level 4 charac, and this is a more 
> complicated problem.

My words !

I would say at the current status, you VO application would just
> download the spectrum and display it..

And make a cutout around given line  as my is able ;-)
-------------------------------------------------------------------------------
*Step 10 Anita answers Peter
-------------------------------------------------------------------------------
>
>>> resolution of a data set to better than a factor of 2.
>
> HAHA  - the spectral resolution as a number is a chimera !
> I tell everyone that it is nonse to tell its R=13420 - you can tell 
> its about 10-15000 - it depends on the processing - removal of 
> instrumental profile, you have to do a complicated measurement with 
> laser comp to measure it realy
> - It depend on the pointing slit orientation etc ...
> It changes in echelle spectra from order to order, depends on binning ....
> Most intruments give only rough estimate in the reference point.
>
> I would personally only use the resolution  to disctinguish between 
> SLOAN and UVES stating the order of 1000 (e.g less than 5000 or more 
> than 40000 )

Not in all domains; in radio most correlators provide very precise
spectral resolution, usually expressed in km/s or Hz, e.g. a channel width
  of 0.5 kHz at 1.5 GHz (3 000 000 as a ratio) or 0.5 km/s...
>
>> The max resolution (or bounding box) would be enough for a VO 
>> application to select or reject a spectrum as a possible candidate.
> Exactly

Not if you are looking for e.g. polarization measurements or fine
velocity structure or complicated line ID's in crowded sub-mm
spectra... Sorry, but not everyone uses optical telescopes!

The various uses and the different conventions in different regimes,
  are why Char has 4 levels (not just 1 and 4)


----------------------------------------------------------------------------------
*Step 11 Arnold enters the dance
----------------------------------------------------------------------------------
> 1.3 Stop using STC. I am not saying that STC is bad or useless, but I 
> believe it is not needed for characterization. The simple way to 
> avoid using it is to fix in the standard which reference frame should 
> be used for each characterization parameter. The same can be done for 
> units. If a fixed unit is defined in the standard, there is no need 
> to even specify it in the serialization (it is implicit). For example 
> I would suggest to define sky directions in ICRS in degree.
> The usual reaction to this idea is that people who own data recorded 
> in another native reference frame and unit feel discriminated. But it 
> has to be clear that we speak here about the characterization 
> meta-data only. It doesn't mean that the real data need to be 
> converted from its native unit (for this STC would be perfect), but 
> only that descriptors such as the central position should be given in 
> a common reference frame. Having a single unit and reference frame 
> allow software developers to concentrate on only one instead of being 
> forced to parse and convert all of them. Furthermore, if a data 
> provider owning observations in a non standard reference frame (such 
> as planetary
> observations) is unable to convert its own data to a standard one, he 
> can not expect all clients VO software to do that for him!

Of course, we can say: everybody should use ICRS, TT, optical Doppler 
definition, epoch 2000, etc. - but people won't do it, largely because 
it does not make sense for everybody's data.
So, the effect is: everybody will use whatever they like and we are 
back to the messy situation that the VO is trying to take care of.
And the VO bacomes a pointless exercise; we might as well throw in the 
towel and give up.
The most fundamental requirement of VO interaction is that one must be 
explicit about all one's metadata. That is the only hope we have for 
understanding each other.
And, let me say it once again, STC IS NOT COMPICATED.
It only becomes complicated if one wants to implement every possible 
combination of metadata.
It is perfectly OK for a service (or a client) to say: sorry, I can't 
handle this, because I only understand ICRS. At least it will 
understand what it can and cannot handle when these things are 
explicitly stated. Being silent about them only fosters uncertainty and 
chaos.
--------------------------------------------------------------------------------
*Step 12 Anita answers to Arnold
--------------------------------------------------------------------------------

> The most fundamental requirement of VO interaction is that one must 
> be explicit about all one's metadata. That is the only hope we have 
> for understanding each other. And, let me say it once again, STC IS 
> NOT COMPICATED. It only becomes complicated if one wants to implement 
> every possible combination of metadata. It is perfectly OK for a 
> service (or a
> client) to say: sorry, I can't handle this, because I only understand 
> ICRS. At least it will understand what it can and cannot handle when 
> these things are explicitly stated. Being silent about them only 
> fosters uncertainty and chaos.

For once I agree with Arnold!  A significant of SIAP queries used to 
fail or confuse the user because the search radius was not interpreted 
in the 'mandated' and 'implicit' units of decimal degrees, as a 
specific example.
-------------------------------------------------------------------------------
*Step 13 Igor's comments on STC and char level 4
-------------------------------------------------------------------------------
2) STC and Char must be connected as they are. And, moreover, we have 
to find a way to interconnect with SimDB at least to use the same 
"object repository" if I can call it like this (Gerard, I know you 
won't like this statement).

3) We do need to provide a real example of full level-4 
characterisation at least for a simple case by the Baltimore meeting 
(e.g. 1d spectrum, let's say from Sloan, which will be already quite 
tricky). If there are volunteers for the technical implementation (e.g. 
XML), contact me -- I'll give more details.
-------------------------------------------------------------------------------
*Step 14 the 2 hats' guy (Alberto) point of view
-------------------------------------------------------------------------------
  I'd like to repeat here one sentence from Fabien:
> if a data provider owning observations in a non standard reference 
> frame (such as planetary observations) is unable to convert its own 
> data to a standard one, he can not expect all clients VO software to 
> do that for him!
How can that sentence be disputed on technical and pragmatic grounds?

And please do not get confused by that: the CHARACTERISATION aspects
should be expressed in the standard-defined reference frame, NOT the
data themselves.

Not everything is possible in a given ref. frame?
I overall think that having ONE standard to cover all possibilities is
not an
efficient way to proceed. Take the HTML example: it does not cover all
one wants to do with the web, but it was (and is) a simple standard that
anybody
could pick up quickly and easily. Later, driven by experience and demand
-not by
philosophical disputes- applets, javascripts, and other technologies
were added
to the simpleminded and still valid HTML.
Similarly, in the VO we should go for simpleminded, efficient approaches,
and should not try to solve the universe at once.

Do I really need a single DAL protocol based on a set of
omni-comprehensive models (chardm, stc, etc)
that can return data both for solar system objects and AGN at once?
I very much doubt it...

The proof is in the pudding, I repeat, the proof is in the pudding...
-----------------------------------------------------------------------------
*Step 15 Fabien answering Arnold
-----------------------------------------------------------------------------
Of course, we can say: everybody should use
> ICRS, TT, optical Doppler definition, epoch 2000, etc. - but people 
> won't do it, largely because it does not make sense for everybody's 
> data.
> So, the effect is: everybody will use whatever they like and we are 
> back to the messy situation that the VO is trying to take care of.
> And the VO bacomes a pointless exercise; we might as well throw in 
> the towel and give up.
> The most fundamental requirement of VO interaction is that one must 
> be explicit about all one's metadata. That is the only hope we have 
> for understanding each other.

As a software engineer, I can tell you that your last comments 
("everybody will use whatever they like", "messy situation") are a good 
summary of the current situation. You say one must be explicit about 
metadata, I say one must be non ambiguous and simple so that it can be 
used. For this I just wanted to point out that the simplest solution is 
to adopt a single reference frame and unit. Maybe I am being naive, but 
I don't see where the madness is in this statement.

> And, let me say it once again, STC IS NOT COMPICATED.
> It only becomes complicated if one wants to implement every possible 
> combination of metadata.
> It is perfectly OK for a service (or a client) to say: sorry, I can't 
> handle this, because I only understand ICRS. At least it will 
> understand what it can and cannot handle when these things are 
> explicitly stated. Being silent about them only fosters uncertainty 
> and chaos.

If a client can says that, it means that we will never achieve a full 
interoperability. Actually, this is pretty much what the current 
situation is. With my simple proposal, this would be obviously fixed.
Whether the idea is acceptable for philosophical reasons or not is 
another debate.
----------------------------------------------------------------------------
*Step 16 Carlos Rodrigo comments on the coord sys and units discussion
-----------------------------------------------------------------------------
>
> As a software engineer, I can tell you that your last comments 
> ("everybody will use whatever they like", "messy situation") are a 
> good summary of the current situation. You say one must be explicit 
> about metadata, I say one must be non ambiguous and simple so that it 
> can be used. For this I just wanted to point out that the simplest 
> solution is to adopt a single reference frame and unit. Maybe I am 
> being naive, but I don't see where the madness is in this statement.

Whenever somebody asks me what is the VO I answer something like VO is 
essentially an standardization project, that it is an effort so that 
all data centers (and corresponding
applications) speak "a common language" so that they can understand 
each other. I tell that before VO it "was" a mess because each one used 
their own standards, their own formats, their own way to produce their 
data (and their own ways to search for them). In that situation it is 
almost impossible to work with data from different sources in an 
automatic way...

Then comes the VO: "the solution"! We just agree in a common data 
format, a common way to describe the data and a common way to query for 
data. And then we tell people to use those new standards. If they do, 
all their data will be compatible, comparable, and so on.

I know this is a little simplified or naive description of the VO and 
that, unluckily, the real VO is not so nice yet.

But the main idea still is: we agree in a common
"language": common ways to describe data and communicate
(standards) so that, if data providers adopt those standards, the 
future of astronomical data will be much better.

So I don't think that Fabien's idea is mad at all.

We tell people to use votables, to put some fields in a SSA protocol, 
to use one of the few available protocols.... to do many things in a 
compatible way. We could tell people: if you want your data to be VO 
compatible you must express it in this particular reference system.

I don't know if we need to do that, I don't really know if, in this 
case, it is really a convenient idea. But I don't think that it is mad 
at all: it is what we do everyday:
telling people that, if they want their data to be VO compatible, they 
cannot offer it in any way that they
desire: they must restrict to one of the options defined by the VO.

And, in most cases, that means to transform the original data to the 
one required by the VO. This would be just another transformation.

Personally, I usually have the opinion that the less restrictive, the 
better. I dislike the idea of some "illuminated" one telling me how I 
must describe my data.
But, for the sake of compatibility, we usually define some aspects on 
how to describe the data, don't we?

I don't find it mad to ask people to write their data in a particular 
reference system (or in systems easy to transform to a given one). 
Actually it will improve clearly the compatibility among data. Some of 
you point that transforming some existing archives to another reference 
system would be difficult as they have little maintenance.
That would be a problem.

In what respect to units, I don't find the need to restrict to 
particular ones. I actually think that a approach like the one used by 
VOSpec could be enough: "you use the units you want and then tell how 
to transform them to a fundamental unit for that physical quantity". It 
could be implemented (for instance in a FIELD) like :

unit="ERG/CM2/S/A;ML-1T-3;1E+7"

(this is the VOSpec approach but condensed in a "one sentence" 
expression). Then you can use the unit you want and any application can 
compare and transform easily from one unit to another.

thus, in the case of units, I don't think that we need to restrict to 
particular ones. I wonder if there is any "simple" solution like that 
possible when talking about reference systems...
-------------------------------------------------------------------------------
*Step 17    Fabien answers Carlos Rodrigo
-------------------------------------------------------------------------------
> In what respect to units, I don't find the need to restrict to 
> particular ones. I actually think that a approach like the one used 
> by VOSpec could be enough: "you use the units you want and then tell 
> how to transform them to a fundamental unit for that physical 
> quantity". It could be implemented (for instance in a FIELD) like :
>
> unit="ERG/CM2/S/A;ML-1T-3;1E+7"
>
> (this is the VOSpec approach but condensed in a "one sentence" 
> expression). Then you can use the unit you want and any application 
> can compare and transform easily from one unit to another.
>
> thus, in the case of units, I don't think that we need to restrict to 
> particular ones. I wonder if there is any "simple" solution like that 
> possible when talking about reference systems...

Well I think the problem with the units is exactly the same as for the 
reference frame. The example you give is complicated enough to parse to 
give headaches to many engineers. Of course there are already libraries 
for doing units conversions, but why can't the data provider use them 
once instead of forcing all the developers working on VO tools to try to
   integrate them in their code?

Another important point is that many developers are software engineers 
and don't necessarily know much about astro units and reference frames.
Forcing them to implement that requires them to understand these 
complicated subjects in details and this takes time and is error prone.
Fixing unit and reference frames allows to let each category of people 
concentrate on what they can do best:
   - the developers can code applications from clear and simple specifications
   - the astronomers and technicians working in data centers can do the 
units and frame conversions
   - the astronomers can provide their high level requirements and use 
the tools for making science
----------------------------------------------------------------------------------------
*Step 18 Carlos to Fabien
-----------------------------------------------------------------------------------------
The problem with units, in your idea, is that we would have to decide 
which is the "good unit" for each physical quantity. I don't think, at 
first sight, that this is a good idea.

It is true that, for me, when I develop an application working with VO 
data, it would be most confortable to receive, for instance, 
wavelengths always in Armstrongs and flux densities always in 
erg/s/cm2/A. But is it fair and necessary to request that?

First, maybe other people, working in other astronomy ranges, prefer to 
have wavelengths always in microns (or we could even require them to be 
in meters, in the International System) or they could prefer flux 
densities in Janskys...

In fact, the transformation between different reference systems could 
be much more tricky and maybe it could be justified to require a given 
one (I don't have a clear opinion). But I think that the transformation 
between different units is not such a difficult problem and it is not 
necessary a deep knowledge of the physics involved. It should be just 
arithmetics. Thus, I think that I would not require one particular unit 
as mandatory in a protocol.

I agree with you that the simpler we make thinks the better.
And that it's good that each one dedicates to what they do better, but 
trying to find some kind of equilibrium.
------------------------------------------------------------------------------
*Step 19  Fabien to Carlos
-------------------------------------------------------------------------------
  > The problem with units, in your idea, is that we would have to decide
> which is the "good unit" for each physical quantity. I don't think, 
> at first sight, that this is a good idea.

What I meant is that we just need to decide which one to use for 
interoperability purpose. Such an arbitrary choice has in theory no 
impact on user experience. Users should not even be aware that only one 
unit was chosen because the client VO application should then convert 
it to the one it wants to display the data with (according to the taste 
of the user if the client has such feature).

> It is true that, for me, when I develop an application working with 
> VO data, it would be most confortable to receive, for instance, 
> wavelengths always in Armstrongs and flux densities always in 
> erg/s/cm2/A. But is it fair and necessary to request that?

There is nothing to loose, but a lot to gain in simplicity in the 
definition and implementation of the standard.
-------------------------------------------------------------------------------
*Step 20 Anita answers to Carlos
-------------------------------------------------------------------------------
> The problem with units, in your idea, is that we would have to decide 
> which is the "good unit" for each physical quantity. I don't think, 
> at first sight, that this is a good idea.
>
> It is true that, for me, when I develop an application working with 
> VO data, it would be most confortable to receive, for instance, 
> wavelengths always in Armstrongs and flux densities always in 
> erg/s/cm2/A. But is it fair and necessary to request that?

Indeed not, for how would I represent flux density in Jy? It is easy to 
do for a particular data instance, but either silly or imprecise for a 
collection spanning a wide frequency range.  And if I have a simple box 
survey region in Galactic coordinates, then translating that into an 
STC polygon in Equatorial coordinates is at least as difficult as 
translating a cone search round a single point into Galactic 
coordinates (a circle is a circle is a circle...).  In these examples, 
the Location and Bounds can be translated into a point and a box, 
respectively, in any coordinates, in an inclusive sense (so that there 
might be a lot of empty space in inappropriate coordinates) but 
finer-grained levels should be allowed to use the appropriate coords.

In fact, as you all know(?) there is a group led by Pedro Osuna and 
Jesus Salgardo working on a far more complex issue - photometry 
including magnitudes... if we can cope with that, physical units should 
be a piece of cake!

....
> In fact, the transformation between different reference systems could 
> be much more tricky and maybe it could be justified to require a 
> given one (I don't have a clear opinion). But I think that the 
> transformation between different units is not such a difficult 
> problem and it is not necessary a deep knowledge of the physics 
> involved. It should be just arithmetics.

This is what the Units model is supposed to enable.  We (Me, Mireille et
al.) are working on this, see http://www.ivoa.net/internal/IVOA/UnitsDesc
and I hope that a draft will be presented at the Interop at the end of 
this month.  There are some complications (ambiguities in unit labels, 
prefixes etc.) to iron out - it is semantics as well as arithmetic - 
but doable.

Somewhere along the line, the VO does have to do unit conversions.  
Hence, we need a model for conventions to use internally and we need to 
employ (usually pre-existing) libraries.  Having done all that, we can 
encourage data providers to use our preferred units in their metadata, 
by providing conversions in the VO tools.  This should not force 
everything into the same units, I think, but it might at least 
homogenise the labels.
However, we will still need to recognise all kinds of units in other 
situations, for example in processing data base queries intelligently.
And if we have to do it at all, we might as well allow it whereever it 
is the simplest or most accurate solution.
------------------------------------------------------------------------------
*Step 21 Arnold to Alberto
-------------------------------------------------------------------------------
>
> I'm writing here wearing my new ESO hat...
>
> ...
> I'd like to repeat here one sentence from Fabien:
> > if a data provider owning observations in a non standard reference 
> > frame (such as planetary observations) is unable to convert its own 
> > data to a standard one, he can not expect all clients VO software 
> to > do that for him!
> How can that sentence be disputed on technical and pragmatic grounds?

Very simple: one person's standard frame is another person's
outlandish custom frame.
Planetary frames are not non-standard for planetary scientists.

But let me repeat again (and again and again): there is nothing wrong
with a service (or a client) saying: sorry, can't handle this.
------------------------------------------------------------------------------
*Step 22 Arnold to Fabien
------------------------------------------------------------------------------
> As a software engineer, I can tell you that your last comments 
> ("everybody will use whatever they like", "messy situation") are a 
> good summary of the current situation. You say one must be explicit 
> about metadata, I say one must be non ambiguous and simple so that it 
> can be used. For this I just wanted to point out that the simplest 
> solution is to adopt a single reference frame and unit. Maybe I am 
> being naive, but I don't see where the madness is in this statement.

The point is that ICRS (just to take spatial coordinates as an example, 
but the same holds true for Doppler velocities) makes sense for many 
astronomical data, but definitely not for all. It would be 
counter-productive to force, say, a Galactic survey into ICRS.
(continued below)

>
> > And, let me say it once again, STC IS NOT COMPICATED.
> > It only becomes complicated if one wants to implement every 
> possible > combination of metadata.
> > It is perfectly OK for a service (or a client) to say: sorry, I > 
> can't handle this, because I only understand ICRS. At least it will > 
> understand what it can and cannot handle when these things are > 
> explicitly stated. Being silent about them only fosters uncertainty > 
> and chaos.
>
> If a client can says that, it means that we will never achieve a full 
> interoperability. Actually, this is pretty much what the current 
> situation is. With my simple proposal, this would be obviously fixed.
> Whether the idea is acceptable for philosophical reasons or not is 
> another debate.

And your proposal will not change that: people are highly unlikely to 
transform that Galactic survey to ICRS, so it still will not be 
available. Whereas, if the metadata are properly specific (and 
present), a client looking for data in L and B will happily be able to 
use data presented in Galactic coordinates.
And when clients and services get around to integrating 
transformations, things will be nicely interoperable.
My point is, that trying to force a one-size-fits-all is 
counter-productive, since some data will never become available, 
whereas providing full and complete metadata maintains the potential 
for interoperability - maybe not right now, but at a later date.

Put differently:
The key to interoperability is to provide complete metadata, not 
forcing a single frame. The former makes all data potentially 
available, the latter will exclude certain data forever.
------------------------------------------------------------------------------
*Step 23 Fabien to Arnold
------------------------------------------------------------------------------
> But let me repeat again (and again and again): there is nothing wrong 
> with a service (or a client) saying: sorry, can't handle this.

Well I thought the goal of the VO was to allow interoperability for all 
astronomical data, especially to allow astronomers from one field to 
use data from other fields without efforts.

Something I didn't say in my proposal (but I will come on that in
Baltimore) is that even data providers who cannot (because it's too 
complicated or because they don't have the resources) to provide 
characterization meta data using strictly defined reference frame and 
units can still benefit from my proposed standard by providing at least 
other meta data following the same format. These data without 
standardized characterization will just not be usable/searchable with 
tools relying on them. It is however important that they don't use the 
same keywords to describe other quantities.

A bit unrelated, I would also like to report something that Alberto 
pointed out (privately).
I understand perfectly that some of you really want to use reference 
frames and units which are more natural to work on your data sets. I 
also understand that there is a real need for a complete model based on 
STC, units and other concepts for describing the true complexity of 
astronomical data sets. This is a very useful task which have many 
applications in advanced processing tasks such as pipe lines 
conceptions, image stacking etc..
However, the current scope of my proposal is bounded to a much more 
limited usage: namely to propose a serialization allowing true 
interoperability between data discovery applications like viewers, 
browsers or search engines.
-------------------------------------------------------------------------------
*Step 24 That's all folks
-------------------------------------------------------------------------------






More information about the dm mailing list