Standardising units and formats (and ref frames?) in transmission

Anita M. S. Richards a.m.s.richards at manchester.ac.uk
Mon May 18 02:58:28 PDT 2009


On Mon, 18 May 2009, Mark Taylor wrote:

> I agree that it would be very nice for data consumers if only a very few,
> very well-defined units and formats ever appeared in data coming over
> the wire.  But in some cases it will put a considerable burden on
> data providers.  Since we do not have the power of obligation over
> data providers, simply decreeing "time in VOTables will always be
> represented as ISO-8601 TT" (or whatever we come up with) will most
> likely have the effect that many data providers simply fail to comply,
> either providing VOTables for which this is not the case, or just
> giving up on VOTables/VO-blessed data formats altogether.
>
> In practice I think that this sort of rule does work,
> and is a good thing, for protocols where the quantities in question
> have protocol-specific semantics; for instance it's quite right
> that the RA and Dec specified in Cone search, SIA and SSA queries,
> and in the RA/Dec columns which they return, must be ICRS in
> decimal degrees, rather than permitting any angular description
> as long as it's described.  However, for situations in which
> the data is being passed through, e.g. rules for TAP responses,
> or simply for what is allowed in any VOTable sitting on someone's
> disk, I don't think it's feasible (read: will not be obeyed) to
> decree that only a small set of formats is permissible for certain
> data types.
>
> I'm all in favour of encouraging data providers to provide position/
> time/whatever in certain standardised formats that we can decide on,
> I just don't think that uptake will be 100% (or even 90%), whether
> or not we call it a requirement.
>

I agree with Mark, but I would go further.  There are two types of use 
case, one is simply finding data, and the other is handling it.

For finding data, a certain imprecision is allowable, as long as it errs 
on the side of being inclusive.  Hence, converting coordinates roughly, 
inside the VO (or by data providers filling in a registration), is OK _as 
long as it is understood_, and as long as the user can get the data back 
in the native coordinates.

Why imprecision? Because very accurate conversion will have a 
computational overhead, for example for every row of a huge table, or 
regridding large images, and because  some data only make sense in certain 
coordinate systems.

For example, I publish data in Galactic coordinates.  If someone wants to 
search in RA and Dec then a rough conversion is quite quick, as long as 
they don;t mind getting back a region which may be slightly larger than 
they asked for.  But, we have had repeated complaints from users in the 
past who want to search Galactic plane surveys in Galactic coordinates, 
since a simple box (from an image or catalogue) will give them what they 
want, but in RA and Dec it is horrid.  So, either we allow a search to be 
passed through in  Galactic coordinates, or we convert the search box for 
the user.

Or, I want data with a certain spectral resolution which I specify in 
wavelength unts, but the data are in frequency units with a non-linear 
conversion - i.e., the spectal resolution at one end of the bandpass is 
different from that at the other, if the units are changed.

My suggestion would be that we should support a limited number of the 
commonest alternatives fully - my impression is that this is a modified 
80-20 problem, i.e. we can satisfy maybe 50% of the users with a single 
flavour of coordinates, and another 30% or even more by supporting just 
one or two alternatives (Spatial: RA/Dec, Galactic, Healpix; Spectral: 
wavelength, frequency, energy...).

The default should be that the user gets the data back in its native 
coordinates (regardless of the search coordinates).  If they want to 
perform a coordinate conversion then they should have the tools to do it, 
but this should be explicitly requested and they should be aware of the 
native units in case re-gridding introduces errors (e.g. converting a data 
cube with a frequency axis to wavelength is often a very bad idea).

Even where coordinate conversion is linear, e.g. milliarcsec to decimal 
degrees of arc, or wavelengths in the X-ray regime to metres, we have to 
make sure that all the stages involved have adequate precision.  I still 
gnash my teeth  about once a month because an astronomical software 
package either thinks that 0.999999876579 is the same as 1.0, or that 
9.999998766e-01 is the same as 0.999999876579, in a situation where it 
ain't.

The other issue which arises from Mark's very true observation that data 
providers will not perform what they see as arduous, useless or even 
damaging conversions, is that whatever our standards for units are, we 
should always label units.  That way, if I search for data in a radius of 
0.1 degrees and get nothing back, it makes it easier for someone to 
establish whether that is because the data provider interprets search 
radii in arcsec.  Happens regularly!

all the best

Anita

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Dr. A.M.S. Richards, UK ARC Node/AstroGrid,
Jodrell Bank Centre for Astrophysics, Alan Turing Building, 
University of Manchester, M13 9PL
+44 (0)161 275 4124
and
MERLIN/VLBI National Facility, Jodrell Bank Observatory, 
Cheshire SK11 9DL, U.K. +44 (0)1477 571321 (tel) 571618 (fax)

"Socialism or barbarism?" Rosa Luxemburg (1915)



More information about the dal mailing list