VOUnits RFC

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Wed Jul 31 01:48:27 PDT 2013


Hi Norman, hi others,

On Tue, Jul 30, 2013 at 03:10:18PM +0100, Norman Gray wrote:
> > VOTable unit strings (in the end, at least).   Without the scale
> > factors, quite a few of my unit strings will become invalid, at least
> > until we'd have the great unit translation and I'd have my data
> > providers' strange units pushed in there.  It will come as no
> > surprise that I don't like that.
> 
> Are VOTables often _stored_ rather than generated on the fly?  That
> is, are you presenting an archival problem (stored VOTables will
> stop being valid) or a behaviour problem (generated VOTables will
> change the syntax of unit strings)?

Well, the unit strings come from somewhere, and in my case that's the
metadata descriptions.  And generated or not, I'll need to give the
units in some way, and I'd really like to give them in a way that
unit parsers can understand.  I don't have an issue with migrating
them, as long as there's something to migrate to.

Deferring unit validity (as opposed to well-formedness, which I'd
define to mean here "parseable with the top-level grammar with
arbitary unit strings") to the "application level" and at the same
time disallowing scale factors make that migration goal murky at
best.

> Is this actually a problem?  Do VOTable parsers actually try to
> parse the unit strings?  If so, they're presumably going to have to
> be pretty tolerant, if they have to cope with the mish-mash of
> units you've listed in your ADQL query.  If they're tolerant, then
> they can tolerate a change of mandated unit syntax.

Ah, but that was exactly my use case: For any unit I encounter in a
valid VOTable, I want to be able to break it down to scale factor
plus SI unit.

A simple example where that's necessary: SED building.  If you can't
parse the unit strings of both the flux and the spectral coordinate,
that just won't work.  True, the data models try to work around this
by limiting the units allowed there ("application level"), but,
really, I'd much prefer if VOUnit were enough to build a tool that
can do this regardless of whether we're talking about spectra,
images, or camboodles.

Or think of units in VOTables uploaded to TAP services -- there, I'd
at least like to be able to decide if the units roughly match and
give warnings if they don't.  And, of course, I'm dreaming of a
CAST_UNIT(col, dest_unit) function in future ADQL versions, for which
you'd have to know col's unit.  This is not hard when VOUnit
satisfies my use case, nigh impossible if it doesn't.

That right now unit attribute values in VOTables cannot reliably
parsed is sad and poses a big problem for the VO's promise of
bringing together easily data from different sources, but I'd like to
fix that to the extent possible, which is why I really don't like
arbitrary atomic units.

But see below.

> > http://dc.zah.uni-heidelberg.de/__system__/adql/query/form?__nevow_form__=genForm&query=select%20distinct%20unit%20from%20rr.table_column%20where%20unit%20like%20%27%25.%25%27&_TIMEOUT=5&_FORMAT=HTML&submit=Go
> 
> Urghh.  I presume that list has been case-folded in some way, since

Yuck! Bug!  Stomp, stomp, stomp.  Fixed data due in a few minutes.


> Permitting 'unknown unit' strings is a sort of loose provenance,
> yes, but that's not the motivation.

But it wouldn't be necessary and thus wouldn't ruin my use case if we
had suitable provenance...


> If it is the case (as you argue, Markus, above) that it is an
> important use-case to be able to convert a FITS file to a VOTable
> (that is, moving the FITS file's TSCALn to a numerical prefix),
> then I'm rather persuaded that it's necessary to include a
> numerical prefix.  We could ensure interoperability by demanding a
> very simple form for the prefix, such as /[0-9]\.[0-9]+e[-+][0-9]+/

It's not so much about conversion as about keeping VOTables a
superset of FITS binary tables.

I'd be happy with this, except I don't think the complexity of the RE
matters much, and thus

[+-]?(\d+\.?\d*|\.\d+)([eE][+-]?\d+)?

wouldn't hurt anyone and reduces surprising validity problems
significantly.

> However:
> 
>   * I think it would be good to include language in the spec that
>   deprecates this in most cases, as OGIP does, for example; and

If you absolutely must; however, I'd still much more like easily
computable units, so I'd much rather deprecate the unknown units.

>   * I think it's still necessary to permit 'unknown units' to deal
>   with the 'jupMass' case.

This discussion has shown that there's probably no way around it.
It's clearly destroying my use cases, but since...

(a) I currently have nothing to offer that would cover the use cases
behind this, and
(b) here's to hoping that the data providers will like the automatic
convertability well enough that they'll restrict themselves to "known"
units whenever humanly feasible...

> Would people here agree about the importance of this use-case, and
> this as the resolution?

...you'd have my agreement.

Cheers,

         Markus



More information about the semantics mailing list