Standardising units and formats (and ref frames?) in transmission
Anita M. S. Richards
a.m.s.richards at manchester.ac.uk
Mon May 18 02:58:28 PDT 2009
On Mon, 18 May 2009, Mark Taylor wrote:
> I agree that it would be very nice for data consumers if only a very few,
> very well-defined units and formats ever appeared in data coming over
> the wire. But in some cases it will put a considerable burden on
> data providers. Since we do not have the power of obligation over
> data providers, simply decreeing "time in VOTables will always be
> represented as ISO-8601 TT" (or whatever we come up with) will most
> likely have the effect that many data providers simply fail to comply,
> either providing VOTables for which this is not the case, or just
> giving up on VOTables/VO-blessed data formats altogether.
>
> In practice I think that this sort of rule does work,
> and is a good thing, for protocols where the quantities in question
> have protocol-specific semantics; for instance it's quite right
> that the RA and Dec specified in Cone search, SIA and SSA queries,
> and in the RA/Dec columns which they return, must be ICRS in
> decimal degrees, rather than permitting any angular description
> as long as it's described. However, for situations in which
> the data is being passed through, e.g. rules for TAP responses,
> or simply for what is allowed in any VOTable sitting on someone's
> disk, I don't think it's feasible (read: will not be obeyed) to
> decree that only a small set of formats is permissible for certain
> data types.
>
> I'm all in favour of encouraging data providers to provide position/
> time/whatever in certain standardised formats that we can decide on,
> I just don't think that uptake will be 100% (or even 90%), whether
> or not we call it a requirement.
>
I agree with Mark, but I would go further. There are two types of use
case, one is simply finding data, and the other is handling it.
For finding data, a certain imprecision is allowable, as long as it errs
on the side of being inclusive. Hence, converting coordinates roughly,
inside the VO (or by data providers filling in a registration), is OK _as
long as it is understood_, and as long as the user can get the data back
in the native coordinates.
Why imprecision? Because very accurate conversion will have a
computational overhead, for example for every row of a huge table, or
regridding large images, and because some data only make sense in certain
coordinate systems.
For example, I publish data in Galactic coordinates. If someone wants to
search in RA and Dec then a rough conversion is quite quick, as long as
they don;t mind getting back a region which may be slightly larger than
they asked for. But, we have had repeated complaints from users in the
past who want to search Galactic plane surveys in Galactic coordinates,
since a simple box (from an image or catalogue) will give them what they
want, but in RA and Dec it is horrid. So, either we allow a search to be
passed through in Galactic coordinates, or we convert the search box for
the user.
Or, I want data with a certain spectral resolution which I specify in
wavelength unts, but the data are in frequency units with a non-linear
conversion - i.e., the spectal resolution at one end of the bandpass is
different from that at the other, if the units are changed.
My suggestion would be that we should support a limited number of the
commonest alternatives fully - my impression is that this is a modified
80-20 problem, i.e. we can satisfy maybe 50% of the users with a single
flavour of coordinates, and another 30% or even more by supporting just
one or two alternatives (Spatial: RA/Dec, Galactic, Healpix; Spectral:
wavelength, frequency, energy...).
The default should be that the user gets the data back in its native
coordinates (regardless of the search coordinates). If they want to
perform a coordinate conversion then they should have the tools to do it,
but this should be explicitly requested and they should be aware of the
native units in case re-gridding introduces errors (e.g. converting a data
cube with a frequency axis to wavelength is often a very bad idea).
Even where coordinate conversion is linear, e.g. milliarcsec to decimal
degrees of arc, or wavelengths in the X-ray regime to metres, we have to
make sure that all the stages involved have adequate precision. I still
gnash my teeth about once a month because an astronomical software
package either thinks that 0.999999876579 is the same as 1.0, or that
9.999998766e-01 is the same as 0.999999876579, in a situation where it
ain't.
The other issue which arises from Mark's very true observation that data
providers will not perform what they see as arduous, useless or even
damaging conversions, is that whatever our standards for units are, we
should always label units. That way, if I search for data in a radius of
0.1 degrees and get nothing back, it makes it easier for someone to
establish whether that is because the data provider interprets search
radii in arcsec. Happens regularly!
all the best
Anita
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dr. A.M.S. Richards, UK ARC Node/AstroGrid,
Jodrell Bank Centre for Astrophysics, Alan Turing Building,
University of Manchester, M13 9PL
+44 (0)161 275 4124
and
MERLIN/VLBI National Facility, Jodrell Bank Observatory,
Cheshire SK11 9DL, U.K. +44 (0)1477 571321 (tel) 571618 (fax)
"Socialism or barbarism?" Rosa Luxemburg (1915)
More information about the dal
mailing list