VOUnits: _another_ version, based on implementation feedback

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Mon Oct 28 02:46:24 PDT 2013


Dear Semantics WG,

Let me first thank Norman for doing the heavy lifting on this.
VOUnits has become a really useful document by now.

Having said that, I'd like to point at some issues that I'd like to
draw attention to -- I'm fine with all but one of them, but I think
the decision to leave them as they are needs to be passed conciously.

On Fri, Oct 25, 2013 at 06:29:22PM +0100, Norman Gray wrote:
> Markus has a couple of other suggestions or issues which I didn't
> feel able to take an independent decision on; I'll let him raise
> those if he feels so inclined.

So, here goes:

>>>>>>> Adding known units

In section 2.4, there's:

  Future versions of this specification may add to the set of known units.

This means that adding new known units requires a new version (with a
new major?  a new minor?).  I'm ok with this, but given that the
well-formedness of a VOUnit string doesn't depend on known units any
more, I suspect our future selves will be grateful to us if we had a
more lightweight process here (I've not investigated the UCD
acceptance process in a long while, but might that be not a precedent
to look at here?)

As for me, I'm ok with new documents, too (but maybe a clarification
that adding units will only bump the minor would be helpful).



>>>>>>>>> Atomic units

On p. 32, it says:

  STRING   a non-empty sequence of letters [a-zA-Z]+

STRING, to save you the lookup, is used for both "atomic" units and
function names.  I'm fine with this, but note that it excludes the
underscore (so, no M_Jup with this).  Please protest now if you want
underscores.  Me, I can live without, and maybe it's a good idea to
reserve it for future use.

I also see a slight contradiction with Table 6 that offers a "?" as a
placeholder for unknown units.

This contradiction is resolved right now by stipulating that some
higher processing level should be removing ? before passing things on
to the VOUnits processor.  This is probably the least
specification-intensive way of resolving the contradiction, but it
also goes quite a bit against my sense of specification aesthetics.

A simple alternative could be to let

  STRING -> [A-Za-z]+|\?

An alternative I'd like it even better would be to ditch the question
mark altogether and just say 

  Unit authors SHOULD write "unknown" when a quantitiy is known to
  have a unit but that unit cannot be determined.

This would work just fine with the way we're specifying units right
now.

But again, I can live with what's there.



>>>>>>>> SI prefixes on unknown units

The unit production on p. 39 allows

  unit: STRING QUOTED_STRING

which is there to allow prefixes on quoted units, like
M'jupiterMass'.  I'm fairly opposed to that; as Norman writes in
2.12.1, "this is not often likely to be a good idea," and I could
find stronger language about it.  

Does *anyone* actually want this?  If yes, so be it, but if not,
let's not do it.  It complicates the already ungraceful quoted units
business even more.



>>>>>>>>> Quoted function names

The VOUnits grammar on p. 39 has this:

  function_application: STRING OPEN_P complete_expression CLOSE_P
    | QUOTED_STRING OPEN_P complete_expression CLOSE_P
    
This is the only place at which I'd *really* like to see a change --
allowing quoted strings as function names is an IMHO unnecessary
complication.  Quoted strings were introduced to avoid the expansion
of SI prefixes, and of course SI prefixes are not allowed for
function names anyway.

The point for quoted function names then appears to be that authors
may want to use known units as function names, as in 'km'(adu/s);
however, that is not actually required, as the km in km(adu/s), by
the grammar, must be a function name anyway.

Whether it's a good idea to allow arbitrary function names is of
course yet another matter.  Do we really want km(adu/s) and
km.(adu/s) both be well-formed but having a completely different
semantics?  Shouldn't log, ln, exp, and sqrt be good enough for
anyone?

That aside: I'd really, really like see quoted function names go
away.


Finally, from the latest changes,

>>>>>>> The deka situation

> \item Clarified that the ambiguity in \unit{dadu} should remain
>     unresolved, and the correct behaviour unspecified (is it
>     deci-\texttt{adu} or deka-\texttt{du}?).

Ouch.  It always hurts to have to keep something unspecified, but in
the presence of unknown units it's *really* hard to prescribe
sensible behaviour here.  What a mess.  I really, really wish whoever
came up with the "da" prefix (the only two-letter SI prefix) hadn't
done so.

May I suggest to change:

  We can think of no cases where the ambiguity is plausible enough
  that resolving it is worth the specification effort, so we deem the
  parse of da.* to be unspecified.

to:

  In the light of this ambiguitiy, we leave the parse of da.*
  unspecified.  This means that unit authors SHOULD not apply the
  deci-prefix to units starting with a and not apply the deka-prefix
  at all.

I'm sure a lot of Austrians will hate me for that -- when I was last
there, people would ask for "10 deka[sc. gram] of that Tiroler goat
cheese" --, but maybe this affront could be an incentive for an
Austrian contribution to the VO.

Cheers,

          Markus


More information about the semantics mailing list