VOUnits RFC

Norman Gray norman at astro.gla.ac.uk
Thu Aug 1 06:59:41 PDT 2013


Greetings, all.

This is a compendium reply to important points in Arnold's, Rob's and Rick's messages; a reply to Markus is in a separate message.

On 2013 Jul 30, at 15:58, Arnold Rots wrote:

> There are two ways custom units can be (intended to be) used:
> 
> As a handy, well-known unit (handier than standard SI units) - like
> solar mass, earth mass, speed of light, etc.
> Or as a handy scalar whose exact value is not (yet) known - like
> the Hubble constant.
> 
> So, in addition to the question what the scale factor is (or, more
> precisely, how the custom unit is to be converted to SI units),
> there is the question whether the author intended to use the custom
> unit as an SI surrogate or whether (s)he merely wants to provide
> a ratio (like: this planet's mass is 5.23 Jupiters and I don't care
> what that is in kg).

That's an interesting distinction, but I think it's too complicated for a unit-string specification to think about -- for me, that's firmly in the application layer.  It's the sort of thing which might be in a discussion about Quantities, where there would be plenty of pointy-brackets to play with!

> This further muddies the waters and leads me to prefer that only
> units from Norman's tables be allowed.
> If that is not feasible, then creative units should be quoted and not
> linked to SI units.

It sounds as if there is indeed a case for quoting units, so that someone could write 

    kW/'martianDay'

to prevent the denominator being parsed as milli-'artianDay.  However I don't think it's necessary to require that _any_ non-known unit be quoted, if only because that would require any unit writer to have memorised the complete set of what is and is not a known unit for the syntax intended.

Remember that both of the Unity library implementations allow the application to ask 'is everything in this parsed unit a recognised one?', and to investigate (or simply object or warn) if not, and I would hope that this would be the same for any implementations in other languages.

> If that's not good enough, then we need an explicit mechanism to define
> a custom unit in terms of allowed units, like 'H' = 75 km/s/Mpc

Defining other units would be necessary, I think, but (a) couldn't feasibly be done _within the unit string_, and (b) would be a natural facility for a library implementation to provide.

----

Rob:

> An enjoyable, productive discussion.

And further proof that most WG discussion pops up at PR time rather that before...

>> ...as is local solar day.
> 
> Time-of-day is not the same kind of unit as duration, but is rather a type of phase.

LST is, yes, but I was thinking of some unit like J/LocalSolarDay as a less whimsical version of Arnold's MyWeight unit, as a unit which wouldn't be straightforwardly reducible to a scalefactor * base-unit.

>> All of this does of course leave the question of how one communicates the meaning of these units.  Marco sketched a mechanism based on the VOTable LINK element: that's a nice approach which would work in that context, and which would tie in neatly to the mention of QUDT in an earlier message of this thread.  But it also leaves the problem of 'odd' units where it belongs, at an application level rather than in an IVOA Recommendation.
> 
> A complementary notion to compiling lists of useful terms was the notion of using the registry to manage stream-dependent metadata. Which is to say that one of the applications (whether registry-based or some other paradigm) can be a mechanism for curating such master lists and for keeping track of evolving scale factors, zero points, etc.  The Olson TZ database is one example of such.
> 
> Maintaining a list of units - not just defining it - would in particular be a useful role that IVOA could serve for astronomy as well as broader physical sciences.

Yes: this would be useful, and since (in this framework) the canonical name of a unit is a URL, the lookup/resolution mechanism is immediate.  For example, there's <http://qudt.org/vocab/unit#Ampere> in the example I quoted earlier in the thread, and <http://bitbucket.org/nxg/unity/ns/unit#Jansky> for the Jansky, which isn't amongst the QUDT units (hmm: that URL isn't dereferenceable right now, since it was a placeholder when I first wrote that file -- I should have a think about that -- it should probably be at an ivoa.net URL).

The QUDT framework seems a useful one, and I see no reason to reinvent this.  We should work with the QUDT people to see if there's a good way to arrange this.

>> Incidentally (and finally), I should point out that the unit 'MyWeight' would be parsed, according to the VOUnits spec, as the mega-'yWeight', which is pretty clearly undesirable (the consequence of this hadn't fully struck me before).   There are two alternatives: (i) allow units to be quoted (thus <'jupMass'/hr> or <'MyWeight'/'USD(Au)'>), or (ii) forbid prefixes on all unknown units.  Option (i) has the advantage of highlighting that a unit is 'unknown', but right now, I'm inclined to invert the prescription of the spec and go for (ii).
> 
> I don't think you can ultimately avoid a quoting requirement or some other way of resolving ambiguous specifications.  Before the application recognizes that a unit is undefined / user-defined it has to parse it at least provisionally.  Presumably the rules would avoid mega-yocto-Weight as an option, but it would be trivial for a user to choose a name similar enough to a known unit as to render the ambiguity intrinsic independent of whatever set of rules.

Indeed -- see the note on this above.  I propose changing the standard text to say something like the following (using [...] to quote unit strings):

  * 'known units' are spotted first (thus 'Pa' is the pascal, and never the peta-year)
  * SI prefixes are spotted whether or not the resulting base unit is a 'known unit' (so a [MyWeight] is a mega-yWeight, but a [jupMass] parses to the corresponding unknown unit)
  * _except that_ if a unit is in quotes '...', then no prefix is searched for, and it is an 'unknown unit' (therefore ['MyWeight'] is always a unit 'MyWeight' with no prefix, and ['kg'] is an _unknown_ unit called the 'kg' -- see below).

Does anyone have any objections to this refinement?

> There's the related question of overloading of unit symbols (and even names) that has been touched upon previously.  Which is to say that it isn't just the combination of units that can be ambiguous, but the individual units themselves.  Those of us in the leap second wars need only nominate the diverse meanings of "second".

If the unit in question is an 'unknown unit' then yes, this will always be the case, and the application level will have to resolve it in whatever way is appropriate.

But for 'known units', the document implies, I think, that there is no ambiguity.  The unit in the denominator of [arcsec/ha] parses to centuries unequivocally and unambiguously -- that's the point of this being a known unit.  The rule above, however, would mean that [arcsec/'ha'] would have an _unknown_ unit in the denominator.

I'm not sure I have persuasive cases where that facility would be useful, but it does mean that it provides a general escape.  This means that [J/s] would be joules in whatever definition of the second QUDT has settled on (which I think is the SI second), but [J/'s'] would not be that, and might be useful if someone wants to use some different definition of the second for some mad reason, and would require them to communicate what this 's' unit was in some other out-of-band way.

Marco and Rick's [km/s/h] example might also illustrate this.  Rick says:

> 		- "h" could be
> 			- "hour", generic context
> 			- "Planck's constant", generic physical context, not often combined with a velocity, but you never know ….
> 			- "Hubble factor", probably based on the standard "h_100" scaling, cosmological context

Now, [h] is in fact the 'known unit' of the hour, so this would be parseable _only_ as Hour (were you thinking of [hr], Marco?).  But if you wanted to use this unit in a different sense for some reason, then writing [km/s/'h'] would 'escape' the h and it would have this ambiguity again.

I don't suggest this as a particularly important positive feature of the rules above (essentially all of the time, ['MyWeight'] units would be used only because the user is aware or wants to highlight an unknown unit or avoid it being erroneously prefixed), but it's a harmless consequence of keeping these rules simple.

> As far as unknown units (and their prefixes), astronomy in particular is chock full of them.  It would be an interesting exercise to trace all the innumerable astrophysical units back to their originating papers.  Even now-familiar terms like "parsec" were once unknown, yet astronomers soon after started referring to kiloparsecs and megaparsecs.


So is that an argument that all 'unknown units' should be permitted to have SI prefixes?  (I'd agree)

----

Rick lists a number of questions that should be answerable.  Here are my suggestions:

> 	- the above formal grammar can be found at …

These should be in the VOUnits document (I'll add a regexp for floats).

> 	- applications MUST handle unknown_unit_string and scalefactors without bombing

Yes.  I think we should prescribe a single permitted form for scalefactors.  Part of the motivation, remember, is to make it easy for a data provider to reassure themself that they've written a unit string which is readable anywhere.  A single permitted form makes it easier to 'be conservative in what you write', even if parsers are subsequently liberal in what they read.

Markus suggests:

[+-]?(\d+\.?\d*|\.\d+)([eE][+-]?\d+)?

I'd simplify that to

[+-]?\d+(\.\d+)?(e[+-]?\d+)?

or possibly even

[+-]?\d+\.\d+(e[+-]\d+)?

> 	- applications SHOULD attempt to look up unrecognized unknown_unit at some knowable vo resource for such info

I would loosen this to MAY.  After all, a unit conversion library might feel it's useful to  convert [kjupMass/hr] to  [3.6 jupMass/s], and if so it doesn't have to look anything up.  Also, a lookup suggests that such an application would have to be online, which we probably wouldn't want to suggest.

> 		- a minimal unit vocabulary service will be provided by IVOA at … via ….
> 			- the corresponding translated scalefactor and standardized units for further numerical manipulations will be provided in the form ….
> 			- this is the way to load your units parser with this information up front: ….

I think we can provide 'known units' in the form of <http://bitbucket.org/nxg/unity/ns/unit#Jansky>, as above.  Regarding a general service which answers 'what might this symbol mean?', I don't know.  It would be interesting, but ... I'm going to say 'application layer' again!

> 	- VOTable unit metadata SHOULD be encapsulated in the document via …

Do we want to head into VOTable territory?  Shouldn't this be left to the Apps WG (Tom -- do you have an opinion here?)

> 	- applications SHOULD attempt to keep track of unit metadata when encountered and pass it on with little or no manipulation/alteration

A good thing to point out, yes.  One conversion that's probably always legitimate is changing syntax, so 

% ./unity -ifits -ocds 'kg m s-1'
kg.m/s

So maybe applications should be permitted/encouraged to silently normalise unit syntax if they can do so.  The only problem there is dealing with a unit (eg [erg]) which is 'known' in one syntax but unknown in another; however since this would most often be in the direction of 'normalisation' to VOUnits, which has the union of the known units, it wouldn't be a problem in practice.

All the best,

Norman


-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK



More information about the semantics mailing list