VOUnits update: Empty/missing units

Norman Gray norman.gray at glasgow.ac.uk
Wed Dec 8 14:33:08 CET 2021


Greetings, all.

(I'm a little late to this discussion -- it turns out I had unsubscribed to the list)

Mark said:

> I was talking about an example case where there is a dimensionless
> column having the name "count".  A plotting tool like topcat writes
> the label "<column-name> / <units-text>" (if <units-text> is provided)
> along the axis, so if the unit text is set to "1" the user will see
> "count / 1".  If the unit text is left blank the user will just see
> "count", which I'd say is a better outcome.

But wouldn't that also fall foul of a case where a user provides "", as opposed to NULL for the unit string.  In this case, the plotting tool would have to specifically check for the empty string, and so write '<column-name>' rather than "<column-name> / ""'.  The only difference I'm suggesting is that the tools checks for "1" rather than "", on the grounds that "1" is marginally easier to read and talk about than "" (and NULL would still mean 'nothing known').

At present, of course, the VOUnits spec forbids an empty string as a units string, and says

> An empty unit string positively indicates that the corresponding quantity is dimensionless. Since an empty string does not conform to the grammars below, this also must be checked for before unit-parsing starts.

So I think there are a couple of inter-related issues here, each of which is rather nit-picking, but each of which is irritating to the author of a document such as this.

1. Do people _know_ that an empty string 'positively indicates that the corresponding quantity is dimensionless'?

2. If I were filling in metadata about a table, without reading the document (but what sort of barbarian would ever do that...?), it might seem to me reasonable to drop in an empty string to mean 'nothing known, or I don't care', or a system might default an empty string in this case.  Thus "1" is slightly non-obvious, enough to prompt whatever is validating that input to ask 'is that unitless or don't-care?'

3. The document mandates a two-step check: first check for an empty string, and if not, then do the parse.  That's a bit of a wart, and I'd quite like to adjust the grammar (in each of the published cases) so that it permits the 'dimensionless' case (whether it's "" or "1"), so that in turn parse("") isn't a syntax error, and it can return a suitable value as the result of the parse.

Regarding 2, Mark earlier said:

> I don't think it's a problem that the VOUnit syntax does not
> permit the empty string.  In most contexts (e.g. TAP_SCHEMA.columns
> UCD column or ucd attribute in VOSI-tables) you can just omit to
> supply a unit (null value).  IMHO reasonable software interpreting unit
> strings will/should anyway treat an empty string as "don't try to make
> sense of this as a unit" (without necessarily taking a stand on whether
> it's dimensionless or unsuitable for units or the author hasn't
> thought about units) rather than attempting to parse it against
> a given grammar.

That's very true for the pragmatic case of plotting data -- the program has been presented with (meta)data, and _something_ has to go on the screen.

I'm thinking, though, of other potential uses of information about units and dimensions, such as metadata validation, or doing simple unit conversions of data, where distinguishing 'dimensionless' from 'unknown' might be useful.  That's speculation, of course, rather than an explicit use-case demand.

Mark goes on to say:

> I take the point about distinguishing dimensionless quantities
> from "unit not applicable" and "author hasn't thought about units".
> I don't think that using the empty string as distinct from null is
> suitable for this (or anything else), since those two inevitably
> get confused with each other (as above).  Given that, allowing "1" in
> the syntax for that purpose is probably reasonable for metadata authors
> who want to make the point that a quantity really is dimensionless.

With Mark, I don't really think that "" is a suitable marker, because of its confusability.  If so, and if "1" were an adequate alternative, then the VOUnits wording above should be changed so that 'A unit string "1" positively indicates...'.

And I agree that it's probably going to be rare for metadata authors to be so careful as to mark this information.  But if they are feeling careful enough to want to record this, it would be good to be clear about how best they should do so.

And I would still like to change the grammars so that parsing "1" or "" doesn't produce a parse exception.  I'm sort-of inclined to do that anyway, in the Unity library, but would be uncomfortable doing so independently of this forum.

So I think my proposal is slightly adjusted to:

  * Have the VOUnits 'explicitly dimensionsless' marker become "1" rather than "" (on the grounds that the latter is too easily confusable with NULL/don't know), for the benefit of those metadata authors who wish to explicitly mark this.

  * Adjust the grammars to remove the two-step parse, by permitting parse(<dimensionless marker>) to produce an appropriate valid result.

Finally, and as Markus said in the first message in this thread, is there any case for adding a third possibility: dimensionsless / unknown / not-a-quantity (eg a name)?  That would be very easy to do in this revision of the document, and would mean that 'the units field must be non-NULL' would become a reasonable validation requirement.  But this is heading into the territory of unimportant distinctions.

Best wishes,

Norman


-- 
Norman Gray  :  https://www.astro.gla.ac.uk/users/norman/it/
Research IT Coordinator, School of Physics and Astronomy
(Autumn 2021: I expect to be on-campus Mondays and Thursdays)


More information about the semantics mailing list