VOUnits: _another_ version, based on implementation feedback

Norman Gray norman at astro.gla.ac.uk
Tue Nov 5 06:04:29 PST 2013


Markus and co, hello.

On 2013 Nov 5, at 12:35, Markus Demleitner wrote:

> On Tue, Nov 05, 2013 at 11:08:17AM +0000, Norman Gray wrote:
>>  * so we must, I think, allow prefixes on those quoted units (or else we have
>>    to write "'martianDay'" for those units, but remember to drop the quotes and
>>    write "kmartianDay" when we talk about 1000s of them (and then how do we
>>    parse the unit of 1000 days on Io, the "kioDay"?)).
> 
> That's the one I'd dispute.  Quoted units aren't independent of the
> prefixes, they're there *exactly* because of the prefixes.

True...

> Now, the kioDay (or, equivalently, the dafternoon -- deci-afternoon
> or deka-fternoon) is just a repetition of the horrible deka-mess and
> should be resolved in a similar way. In this case, I'd say the binary
> prefixes should only be allowed on a controlled vocabulary (probably
> bit and byte -- anything else?).  We'll need that anyway, just as
> with deka.

That's an interesting suggestion (possibly _slightly_ fiddly to implement, but that's my problem).  If we permit binary prefixes only on a nominated set of units, and declare 'da' to be firmly deprecated and unspecified, then ...

> [...] all prefixes are just a single char, you'll save
> all the complicated explanations what the STRING might or might not
> be by just saying, wherever you introduce quoted units: 
> 
>  Unknown units with SI prefixes must not be quoted.

Nice and simple.  I'm nearly persuaded.

My only qualification, now, is to wonder if this isn't putting some burden on the writer of a units string, who has to remember to quote 'problematic' unknown units (ie, those beginning with a (now) single-character SI prefix), but remember _not_ to quote them if they want to include a prefix.

I'm struggling to think of the circumstances in which this is a real practical problem for a system, but mention it in case something occurs to someone here.

>> But as Rick says, the motivation here is to ensure robust parsing of
>> unusual units.  If we forbid prefixes on quoted units, then we're saying
>> that quoted units are very significantly different from unquoted but
>> unknown ones.  That means that we forbid for example M'jupiterMass' --
>> that looks pretty harmless to me, and so forbidding it doesn't sound
>> like a great idea.
> 
> See, that's where we disagree.  To me, M'jupiterMass' is quite a bit
> of a disgrace, and I really think there's no use case for that.

To me, the goodness of the symmetry of M'jupiterMass' is greater than the badness of the ugly quotes being there, but this might now be reducing to aesthetics.

>> Quoted function names
>> 
>> I _think_ that the idea of quoted function names was introduced (by me?) largely
>> out of symmetry with the quoted units.  I can't (come to think of it) think of
>> any reason why we'd want to distinguish 
>> 
>> log(Hz)
>> 
>> from 
>> 
>> 'log'(Hz)
> 
> Hm -- may I offer:
> 
> Python 2.7.3 (default, Jan  2 2013, 16:53:07) 
> [GCC 4.7.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> log
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
> NameError: name 'log' is not defined
>>>> 'log'
> 'log'

Are you in _favour_ of allowing quoted function names?!

>>> \section{Indicating dimensionless and unknown units}
>>> 
>>> This specification reserves the unit \texttt{UNKNOWN}, which may not
>>> appear in a VOUnits unit-string except as discussed here.  A unit-string
>>> consisting of the string \texttt{UNKNOWN}, alone, indicates that a
>>> quantity has unknown units.  This string should be recognised
>>> case-insensitively by an application, as a separate step before attempting
>>> any VOUnits parsing.
> 
> Can we drop the thing with case-insensitivity and just say it's
> "unknown"?  And again, I'd make very clear that this indicates the
> case in which "it is known that there should be a unit but due to
> some unspecified mishap it got lost".
> 
> I've vowed to speak out against case-insensitivity whereever it crops
> up; too many time I've made weird hacks to make something case
> insensitive, and the hoops I had to jump to in RegTAP sealed the case
> for me.

In general, I'm not a fan of case insensitivity either, but I put that in at the last moment, because I could imagine someone trying to remember whether the magic string was "unknown" or "UNKNOWN" (I'd prefer the latter, if we're distinguishing), remembering the wrong way, and ending up with a correctly parseable unit called 'unknown'.


>>> A unit string consisting of the string \texttt{-}, alone, indicates that a
>>> quantity is dimensionless.
> 
> That I'd consider a major change, and one that'd been discussed
> before and clearly not liked very much.
> 
> I'd agree to a grammar change that accepts the empty string
> (actually, I'd like that a lot).
> 
> The "-" I really don't like.

The problem with an empty string is that it's not clear that someone has deliberately stated "this quantity is dimensionless", as opposed to "I forgot to put in the units", or  "I couldn't be bothered", or "I don't know what the units are and didn't know I was supposed to write 'UNKNOWN'", all of which are plausible interpretations of "", if one were to find it in a file.

To be specific, I'd like to suggest:

  * adding '-' (or maybe 'dimensionless' or '000' or something else) to the grammar as a top-level production for 'input', so that the string "-" is a validly-parsed unit string meaning dimensionless.

  * indicating that 'UNKNOWN' (case-insensitive or not) is _not_ a valid unit string, but is covered by this specification to the extent that it should be spotted by an application before any parse.

>> The current document takes its list of known units from the list in
>> src/grammar/known-units.csv at <https://bitbucket.org/nxg/unity>, and one way of
>> updating the list of units would be to declare that this file has some normative
>> value.
> 
> That's about what I had in mind.  But I shun the effort of specifying
> how these changes would be done that late in the standardization
> process...

Perhaps we should take this to the interop list, and discuss it in the context of the process for updating UCD words.

All the best,

Norman


-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK



More information about the semantics mailing list