VOUnits RFC

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Tue Jul 30 01:24:32 PDT 2013


Dear Fellow unit enthusiasts,

Nice discussion, and a big thanks to all dropping in.  As the
troublemaker who started it, I feel compelled to reply to some of the
mails.  Sorry for disturbing threads, but I think that's preferable
to lots of small mails.

So, first, Norman in <83AB7956-24D4-411A-994D-60028AD9BEC6 at astro.gla.ac.uk>
> And the specification _is_ concerned with the FITS syntax because (as I've
> stressed above), it's a goal that any VOUnits-compatible units string would
> also be a syntactically valid FITS unit string (and the same for CDS and
> nearly so for OGIP).  Permitting a scaling factor would break that.

As you've pointed out elsewhere, it's not an intersection anyway, so
another slight breakage wouldn't hurt, would it?

Then, replying to Tom's <51F681A6.40804 at nasa.gov>
> In VOTables (and elsewhere) we don't have, AFAIK, a comparable scaling
> capability nor is it likely any time soon.  Since I perceive that astronomers
> are oft enamored of non-SI units, we'd be requiring wholesale rescaling of
> values in tables for tables to be able to use this convention.  I don't see
> that happening.
> 
> A point of clarification: I'm not positive I follow where the rescaling would
> be necessary.  Do you mean that at present VOTables can use "1.9x10+27kg" as a
> unit string (because they use CDS-format unit strings), but couldn't if there
> was an immediate switch to VOUnits strings, and therefore that the content of
> the VOTable would have to be scaled when it's generated?

That, at least, is 50% of what I am worried about.  You see, as far
as I understood things, VOUnit is supposed to say what's allowed in
VOTable unit strings (in the end, at least).   Without the scale
factors, quite a few of my unit strings will become invalid, at least
until we'd have the great unit translation and I'd have my data
providers' strange units pushed in there.  It will come as no
surprise that I don't like that.

Norman lists VOTable units strings as a use case and then adds:

> Other places where you might want a unit string are:
> 
>   * in a structured comment in a RDBMS or other schema, documenting a column;
>   * in a request to a web service (SOAP or otherwise), indicating the desired
> units of the result; or
>  * in an annotation (RDFa-style) to a number in a web page; et cetera.

plus registry metadata; that's not very different from the
"structured comment" thing, but you suddenly have these things an an
RDBMS with RegTAP.  This, by the way, lets you assess where we're
coming from in terms of units declared to the registry:

http://dc.zah.uni-heidelberg.de/__system__/adql/query/form?__nevow_form__=genForm&query=select%20distinct%20unit%20from%20rr.table_column%20where%20unit%20like%20%27%25.%25%27&_TIMEOUT=5&_FORMAT=HTML&submit=Go

> These seem to leave us with two alternatives for VOUnits:
> 
>   1. permit numerical scale-factors, and thus units of "1.9e27kg" (or whatever
> f.p. syntax we choose); or
> 
>   2. forbid numerical scale-factors, but permit 'unrecognised units', such as
> 'jupMass'.
> 
>
> Option (1) means that we effectively smuggle a TSCALn behaviour into the unit
> string.
> 
> Option (1) also breaks consistency with FITS unit strings.

...but maintains VOTable's capability to represent everything that
FITS binary tables can, which would otherwise get lost.  I'm pretty
sure I know what I prefer...

> The problem with (1) is that this loses the information that this is a
> 'jupiter mass', and leaves it as being some apparently random scaling factor.
> That's not a problem if the data is going into a pipeline and nowhere else,
> but it could be a problem in some of the other cases.  If I found this
> 1.9e27kg as a unit column in a structured comment, I'd probably want to
> strangle someone.  If I want my results in units of jupiter masses, and so

Well, to figure out how to convert the value to kg, it's perfect, so
you'd have little reason to strangle someone.  I think what bugs you
is that *provenance* is lost.  That is regrettable, true, but I'm
pretty sure overloading units with a part of provenance is making it
unsuitable for both.

This is also what I'd say to Rob's statement from
<64294334-B5A6-495D-9459-698436CBBCEA at noao.edu>  (where I'd like to
stress that I agree there's a problem worth solving, it's just that
VOUnits is the wrong place):

> A more fundamental issue is that often measurements are calibrated in
> terms of other measurements.  Quoting something as 1.5 jupMass might
> not just be a handy way to provide a sense of scale, but it might be
> that as measurements are refined of what the mass of Jupiter actually
> is, that the number quoted (in the table or what have you) ought be
> adjusted to suit.  Examples abound such as the Hubble constant, etc.

Going on to Rob's second mail:

On Mon, Jul 29, 2013 at 08:30:31AM -0700, Rob Seaman wrote:
> On Jul 29, 2013, at 7:52 AM, Tom McGlynn <Thomas.A.McGlynn at nasa.gov> wrote:
> > In practice when reading these our software will read
> >   1.2
> > and
> >   1.2345678901234567891234567801233e33
> > with equal facility
> 
> So either an arbitrary precision library must be used or the
> handling of units must permit scale factors only as opaque
> literals?
> 
> > and whether the second really has vastly more precision than the
> > first is unknowable and unaddressed by this standard.

The question of precision is, I would argue, beside the scope of
units -- I give you we should have had a quantity data model ages
ago, but alas we don't have, and trying to shoehorn this into units
well break units while not actually answering natural questions like
"what's the error on this, and what kind of error is this".

The question of determining equality is an interesting one, though.
If the floating point prefixes were the only thing holding this up,
I'd say that's a heavy blow.  However, we don't actually say how to
compare units in the current draft, and so I'd claim making that
comparison "harder" is a weak argument.

In the end, as long as I can (where applicable) compute the factor
between two unit strings reliably (and preferably without having to
refer to network resources), you can make up your mind whether
1.000001 is close enough to unity.


> I don't disagree with the notion of borrowing from earlier
> standards, but there are implications.  Still haven't heard
> comments on embedding the scale factors other than as prefixes, as
> denominators (not an unknown usage), etc.

Doesn't help expressivity, complicates standard: I'd say let's not.

> Are digits forbidden in unit names?  Are hex or other non-decimal
> bases permitted in scale factors?

Ah, come on.  Basically all formal languages developed in the last 30
years and in measurable usage agree on how floating point literals
look like.  Let's just follow them.

And on Rob's series of questions in
<20130729122733.GD10303 at ari.uni-heidelberg.de>, as far as I think
they are relevant for use cases that have been put forward:

>  Might then inputs differ from outputs?

I don't think it's up to VOUnits to tell applications how to process
what's described.

> What about non-ASCII?

In Float literals?

> How do we truncate if a user supplies more digits than fit in the 
> required precision?

Given we're not even saying this for our values (in VOTable), I'd
argue we can safely leave that unspecified here, too

> In that case can they themselves be expressed recursively as a FP literal?

Interesting thought, by why would you even want such a thing?  All
that's proposed is a single, simple, standard FP-literal at the start
of the unit string.  No further changes, bit benefit.  Which I guess
would answer the remaining questions, insofar as they concern the
scaling factor rather than VOUnits in general.



So... May I try to suggest a compromise?

  How about if we say: VOUnit allows a single floating point scaling
  factor at the start of the unit string.  For serializing into FITS,
  the scaling factor must be split off into a TSCALn card (or absorbed
  into the value in the unfortunate event the value is a card value).
  For serializing from FITS binary tables, TSCALn cards SHOULD be
  preserved into the the unit strings rather than being baked into the
  values.

And then on to Quantity and Provenance data models.  The phenomena
they describe deserve to be done right, not implicitely in unit
strings.

Cheers,

           Markus




More information about the semantics mailing list