VOUnits update: Empty/missing units

Norman Gray norman.gray at glasgow.ac.uk
Mon Dec 13 15:59:26 CET 2021


Greetings, all.

(I'm a little late to this discussion -- it turns out I had unsubscribed to the list)



The starting point of my puzzlement here, is that the VOUnits spec currently forbids an empty string as a units string, and says

> An empty unit string positively indicates that the corresponding quantity is dimensionless. Since an empty string does not conform to the grammars below, this also must be checked for before unit-parsing starts.

I don't think this is satisfactory text.

The document mandates a two-step check: first check for an empty string, and if not, then do the parse.  That's ugly, and I'd quite like to adjust the grammar (in each of the published cases) so that it permits the 'dimensionless' case (whether it's "" or "1"), so that in turn parse("") isn't a syntax error, and it can return a suitable value as the result of the parse.  I'm not sure when or why we (or I?) decided that "" shouldn't be a valid units string -- I don't think it was a good decision.

I'm somewhat inclined to unilaterally add "" as a valid unit string to all of the syntaxes (or at least to the VOUnits one).  But when I think I might go ahead and do that, I stop myself, and wonder 'is "" really the right thing here?', and I'd like to ventilate the question on this list.

I think there are a couple of inter-related issues here, each of which is rather nit-picking, but each of which is irritating to the author of a document such as this.

  1. Do people _know_ that an empty string 'positively indicates that the corresponding quantity is dimensionless'?  I think the answer is 'very probably not', and that, in contrast, "" is the sort of thing that might reasonably appear in a units column by accident or by default.

  2. If I were filling in metadata about a table, without reading the document (but what sort of barbarian would ever do that...?), it might seem to me reasonable to drop in an empty string to mean 'nothing known, or I don't care', or a system might default an empty string in this case.  Thus "1" has the _virtue_ of being slightly non-obvious, enough to prompt whatever is validating an input of "" to ask 'do you mean unitless or don't-care?'

Regarding 2, Mark earlier said:

> I don't think it's a problem that the VOUnit syntax does not
> permit the empty string.  In most contexts (e.g. TAP_SCHEMA.columns
> UCD column or ucd attribute in VOSI-tables) you can just omit to
> supply a unit (null value).  IMHO reasonable software interpreting unit
> strings will/should anyway treat an empty string as "don't try to make
> sense of this as a unit" (without necessarily taking a stand on whether
> it's dimensionless or unsuitable for units or the author hasn't
> thought about units) rather than attempting to parse it against
> a given grammar.

That's very true for the pragmatic case of plotting data -- the program has been presented with (meta)data, and _something_ has to go on the screen.

I'm thinking, though, of other potential uses of information about units and dimensions, such as metadata validation, or doing simple unit conversions of data, where distinguishing 'dimensionless' from 'unknown' might be useful.  That's speculation, of course, rather than an explicit use-case demand.

More broadly, if we're talking about unit strings, it seems strange not to have _some_ way of positively indicating 'this is dimensionless', and "" seems inadequate for this, for the reasons above.

On that point, Mark goes on to say:

> I take the point about distinguishing dimensionless quantities
> from "unit not applicable" and "author hasn't thought about units".
> I don't think that using the empty string as distinct from null is
> suitable for this (or anything else), since those two inevitably
> get confused with each other (as above).  Given that, allowing "1" in
> the syntax for that purpose is probably reasonable for metadata authors
> who want to make the point that a quantity really is dimensionless.

With Mark, I don't really think that "" is a suitable marker, because of its confusability.  If so, and if "1" were an adequate alternative, then the VOUnits wording above should be changed so that 'A unit string "1" positively indicates...'.

And I agree that it's probably going to be rare for metadata authors to be so careful as to mark this information.  But if they are feeling careful enough to want to record this, it would be good to be clear about how best they should do so.

So, to be clear, I think my proposal is:

  * Have the VOUnits 'explicitly dimensionless' marker become "1" rather than "" (on the grounds that the latter is too easily confusable with NULL/don't know), for the benefit of those metadata authors who wish to explicitly mark this.

  * Adjust the grammars (only VOUnits, or all of them?) to remove the two-step parse, by permitting parse(<dimensionless marker>) to produce an appropriate valid result.

Finally, and as Markus said in the first message in this thread, is there any case for adding a third possibility: dimensionless / unknown / not-a-quantity (eg a name)?  That would be very easy to do in this revision of the document, and would mean that 'the units field must be non-NULL' would become a reasonable validation requirement.  But this might be too much detail to hope metadata authors will supply.

I doubt that changing the acceptability or otherwise of "" as a units string will have any practical impact, given whatever uptake there appears to be of the VOUnits specification to date.

Best wishes,

Norman


-- 
Norman Gray  :  https://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK


More information about the semantics mailing list