VOUnits: _another_ version, based on implementation feedback

Norman Gray norman at astro.gla.ac.uk
Tue Dec 24 11:33:07 PST 2013


Francois, hello.

Thanks for these remarks.  I've incorporated some changes in the latest (last...?) version of the RFC document).

On 2013 Nov 6, at 17:12, Francois Ochsenbein <Francois.Ochsenbein at astro.unistra.fr> wrote:

> * allowing both "unrecognised" units and "quoted" units (by the way, why the
>  single quote('), and not the double quote(") more common for a citation?):
>  isn't there some contradiction ? At least in validation procedures, allowing
>  only explicitly _known units_ and _quoted units_  would produce more reliable
>  documents, assuming that _quoted units_ are defined somewhere in the document
>  (e.g. a VOTable) which makes use of such non-standard units.

There's no deep reason for the single quotes, rather than double.  Myself, I tend to use single quotes for 'scare-quotes' rather than quotations of text.  Mostly, though, they seem marginally less messy, and one of the main objections to the quoting innovation was that it made things hard to read.

Originally, there was no syntactic marking of unknown units at all, so that 

    observations/jovianDay

was an acceptable unit string, which is only cluttered by changing it to 

    'observations'/'jovianDay'

However this causes problems with, for example

    martianDay

which is required to be parsed as the milli-'artianDay', and it was for this reason that the on-list discussion rather reluctantly conceded that the quotes in 'martianDay' were necessary in order to disambiguate this.

Thus the quoting is an 'as necessary' feature, and it's expected that a validator knows the list of 'known units' beforehand.

> * still about _quoted units_: while not explicitely specified in the document
>  I imagine these can be combined in expressions like m'MoonMass'/yr, as it looks
>  to be possible from the grammar ? The usage of  _quoted units_ becomes quite
>  useful to represent some "natural" unit in the case of some modelisations
>  (e.g. gravitational potential in a galaxy)

That's correct.  In that particular case, mMoonMass must be parsed as the milli-MoonMass because a unit string can start with at most one prefix, but MoonMass/yr would be the Mega-oonMass, for the same reason, so 'MoonMass'/yr would be required.

> * About the units listed in Table 2: I tend to agree with Arnold that the abbreviations
>  "Ba" and "ta" proposed for the Besselian and tropical years look strange -- if such
>  units are required, "Byr"  and "tyr" would likely reach a better consensus;

I think that Arnold's concern was that he was torn between being reluctant to include these units in the 'known units' list, but at the same time being reluctant to discard them completely.  They're currently in the document as known-but-deprecated in FITS, and unknown in VOUnits.

Since they're in the FITS spec as 'Ba' and 'ta', I'd be reluctant to _change_ their name if they were included in the VOUnits list.

> * still about Table 2, the "B" for "Byte" looks also quite unusual; I understand
>  that the authors wish to allow units like "MB/s" or "MiB/s", but recommending
>  "B" alone as meaning "byte" looks bizarre (capitalized unit symbols refer to
>  human names like Joule, Kelvin, Herz, etc). I would feel more comfortable if "byte"
>  would be recommended for byte unit, and saying that multiples of "bytes" can be
>  written "B" instead of "byte" (in other terms, multiples of "B" are "bytes",
>  while sub-multiples are "Bell"). Maybe "B" alone (without prefix) just be forbidden?

The problem here is that we're trying to be as compatible as possible with a network standards which aren't always consistent.  Section 2.5 discusses this.  The document has VOUnits accept both 'byte' and 'B' for bytes, preferring 'byte'.  I agree that most people would tend to write "1234 bytes" rather than "1234B" -- we certainly don't discourage that.

The theoretical confusion with the Bel is unfortunate, but I think practically unresolvable.  We've semi-resolved it in the document by declaring that 'dB' is an unprefixed and unprefixable known unit, namely the decibel.  Since submultiples of the byte appear only _very_ rarely, and any other multiples of the Bel are possibly even rarer, we/I decided simply not to worry about it very much!

> * some wide-spread physical constants like the speed-of-light (c), Planck's (h)
>  Boltzmann's (k), or gravitation (G) constants -- not talking about pi -- are
>  frequently used in units (e.g. MeV/c2 for masses); the document says a few words
>  about their usage for transformations (section 3.3), but are these constants forbidden
>  in units ? Note that c,or k are unambiguous, but h is commonly used for the cosmological
>  factor, and G is collapsing with Gauss.

It's true that we haven't made any mention of such constants.  This is because, when I was writing the grammars and parsing library, I realised that I had little idea of where they fitted in to a discussion of _units_.  They are things which might appear as scalings in a discussion of _quantities_ -- that is, the combination of number and scale which represents a measured thing -- but which wouldn't naturally appear at the rather primitive level we were concerned with.

As you point out, they bring their own set of complications when they collide with the symbols for existing units.

It's for a broadly similar reason that 'Sun' has disappeared.  If this has a place it's in an expression such as "M_Sun", and either that's a new unknown unit with a five-character symbol, or else we have to think about the syntactic role of the underscore as indicating something like 'of', and... my head hurts.

Thanks for looking over the document so carefully.  The (final, pleeeeease) RFC is just about to start, so we may have time for a few more clarifications then.

Best wishes,

Norman


-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK



More information about the semantics mailing list