VOUnits RFC

Rob Seaman seaman at noao.edu
Tue Jul 30 08:10:19 PDT 2013


Hi Norman,

An enjoyable, productive discussion.

> ...as is local solar day.

Time-of-day is not the same kind of unit as duration, but is rather a type of phase.  Resolving phase requires either a convention (e.g., "degrees east of north", "radians CCW", "degrees past top dead center", ...) or explicit metadata.  For "local" that's typically timezone as an offset in hours from UTC.  This is not a scale factor, but a type of zero point.  Are these to be supported as well?

> When I was implementing the VOUnit stuff to check the grammars and design, I realised, somewhat to my surprise, that all that a list of 'known units' adds is disambiguation plus a bit of extra metadata.  Syntactically, 'known' or 'not-known' makes very little difference.

This was Andrew Main's point regarding timescales.

> That means that the question of what to do with unknown units gets punted to the application which is calling the parser, which is probably in much the best place to decide on the best course of action.  That action may consist of "I don't recognise this unit, so you FAIL", but it doesn't have to be.

Some will recall that vocabularies including some discussion of units arose early in the history of the VOEvent WG.  One concern about moving everything to the application layer is that any application needing units is going to have to do the sorts of things that have been described.  Which is to say that for serendipity based discovery such as VOEvent, all [meta]data may be "odd" [meta]data.

> All of this does of course leave the question of how one communicates the meaning of these units.  Marco sketched a mechanism based on the VOTable LINK element: that's a nice approach which would work in that context, and which would tie in neatly to the mention of QUDT in an earlier message of this thread.  But it also leaves the problem of 'odd' units where it belongs, at an application level rather than in an IVOA Recommendation.

A complementary notion to compiling lists of useful terms was the notion of using the registry to manage stream-dependent metadata.  Which is to say that one of the applications (whether registry-based or some other paradigm) can be a mechanism for curating such master lists and for keeping track of evolving scale factors, zero points, etc.  The Olson TZ database is one example of such.

Maintaining a list of units - not just defining it - would in particular be a useful role that IVOA could serve for astronomy as well as broader physical sciences.

> Incidentally (and finally), I should point out that the unit 'MyWeight' would be parsed, according to the VOUnits spec, as the mega-'yWeight', which is pretty clearly undesirable (the consequence of this hadn't fully struck me before).   There are two alternatives: (i) allow units to be quoted (thus <'jupMass'/hr> or <'MyWeight'/'USD(Au)'>), or (ii) forbid prefixes on all unknown units.  Option (i) has the advantage of highlighting that a unit is 'unknown', but right now, I'm inclined to invert the prescription of the spec and go for (ii).

I don't think you can ultimately avoid a quoting requirement or some other way of resolving ambiguous specifications.  Before the application recognizes that a unit is undefined / user-defined it has to parse it at least provisionally.  Presumably the rules would avoid mega-yocto-Weight as an option, but it would be trivial for a user to choose a name similar enough to a known unit as to render the ambiguity intrinsic independent of whatever set of rules.

There's the related question of overloading of unit symbols (and even names) that has been touched upon previously.  Which is to say that it isn't just the combination of units that can be ambiguous, but the individual units themselves.  Those of us in the leap second wars need only nominate the diverse meanings of "second".

As far as unknown units (and their prefixes), astronomy in particular is chock full of them.  It would be an interesting exercise to trace all the innumerable astrophysical units back to their originating papers.  Even now-familiar terms like "parsec" were once unknown, yet astronomers soon after started referring to kiloparsecs and megaparsecs.

...and need I point out that the parsec was implicitly redefined within the past year:

	http://phys.org/news/2012-09-iau-votes-redefine-astronomical-constant.html

Which is all to say that it will perhaps be most efficient to recognize the curation complications up front.

Rob
--
	"The poets made all the words, and therefore language is the archives of history, and, if we must say it, a sort of tomb of the muses. For, though the origin of most of our words is forgotten, each word was at first a stroke of genius, and obtained currency, because for the moment it symbolized the world to the first speaker and to the hearer. The etymologist finds the deadest word to have been once a brilliant picture. Language is fossil poetry. As the limestone of the continent consists of infinite masses of the shells of animalcules, so language is made up of images, or tropes, which now, in their secondary use, have long ceased to remind us of their poetic origin."  - Ralph Waldo Emerson


More information about the semantics mailing list