VOUnits: _another_ version, based on implementation feedback

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Tue Nov 5 04:35:52 PST 2013


Dear List,

I'm going to be brief (by the standards of this thread), promised.  So:

On Tue, Nov 05, 2013 at 11:08:17AM +0000, Norman Gray wrote:
>   * so we must, I think, allow prefixes on those quoted units (or else we have
>     to write "'martianDay'" for those units, but remember to drop the quotes and
>     write "kmartianDay" when we talk about 1000s of them (and then how do we
>     parse the unit of 1000 days on Io, the "kioDay"?)).

That's the one I'd dispute.  Quoted units aren't independent of the
prefixes, they're there *exactly* because of the prefixes.

Now, the kioDay (or, equivalently, the dafternoon -- deci-afternoon
or deka-fternoon) is just a repetition of the horrible deka-mess and
should be resolved in a similar way. In this case, I'd say the binary
prefixes should only be allowed on a controlled vocabulary (probably
bit and byte -- anything else?).  We'll need that anyway, just as
with deka.

If that's fixed and all prefixes are just a single char, you'll save
all the complicated explanations what the STRING might or might not
be by just saying, wherever you introduce quoted units: 

  Unknown units with SI prefixes must not be quoted.

> But as Rick says, the motivation here is to ensure robust parsing of
> unusual units.  If we forbid prefixes on quoted units, then we're saying
> that quoted units are very significantly different from unquoted but
> unknown ones.  That means that we forbid for example M'jupiterMass' --
> that looks pretty harmless to me, and so forbidding it doesn't sound
> like a great idea.

See, that's where we disagree.  To me, M'jupiterMass' is quite a bit
of a disgrace, and I really think there's no use case for that.
MjupiterMass is more in line with everything else and works  exactly
as well, once we've dealt with two-letter prefixes (which we must do
anyway).

So: I stand by my "*do* you *really* have to do this?" emotions.

> Quoted function names
> 
> I _think_ that the idea of quoted function names was introduced (by me?) largely
> out of symmetry with the quoted units.  I can't (come to think of it) think of
> any reason why we'd want to distinguish 
> 
> log(Hz)
> 
> from 
> 
> 'log'(Hz)

Hm -- may I offer:

Python 2.7.3 (default, Jan  2 2013, 16:53:07) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> log
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'log' is not defined
>>> 'log'
'log'

> How about the following:
> 
> > \section{Indicating dimensionless and unknown units}
> >
> > This specification reserves the unit \texttt{UNKNOWN}, which may not
> > appear in a VOUnits unit-string except as discussed here.  A unit-string
> > consisting of the string \texttt{UNKNOWN}, alone, indicates that a
> > quantity has unknown units.  This string should be recognised
> > case-insensitively by an application, as a separate step before attempting
> > any VOUnits parsing.

Can we drop the thing with case-insensitivity and just say it's
"unknown"?  And again, I'd make very clear that this indicates the
case in which "it is known that there should be a unit but due to
some unspecified mishap it got lost".

I've vowed to speak out against case-insensitivity whereever it crops
up; too many time I've made weird hacks to make something case
insensitive, and the hoops I had to jump to in RegTAP sealed the case
for me.

Here, it's particularly insiduous: "Unit strings are case-sensitive,
except when they're not."  Ugh.

> > A unit string consisting of the string \texttt{-}, alone, indicates that a
> > quantity is dimensionless.

That I'd consider a major change, and one that'd been discussed
before and clearly not liked very much.

I'd agree to a grammar change that accepts the empty string
(actually, I'd like that a lot).

The "-" I really don't like.
 
> The current document takes its list of known units from the list in
> src/grammar/known-units.csv at <https://bitbucket.org/nxg/unity>, and one way of
> updating the list of units would be to declare that this file has some normative
> value.

That's about what I had in mind.  But I shun the effort of specifying
how these changes would be done that late in the standardization
process...

Cheers,

         Markus




More information about the semantics mailing list