VOUnits: _another_ version, based on implementation feedback
Markus Demleitner
msdemlei at ari.uni-heidelberg.de
Tue Nov 5 04:35:52 PST 2013
Dear List,
I'm going to be brief (by the standards of this thread), promised. So:
On Tue, Nov 05, 2013 at 11:08:17AM +0000, Norman Gray wrote:
> * so we must, I think, allow prefixes on those quoted units (or else we have
> to write "'martianDay'" for those units, but remember to drop the quotes and
> write "kmartianDay" when we talk about 1000s of them (and then how do we
> parse the unit of 1000 days on Io, the "kioDay"?)).
That's the one I'd dispute. Quoted units aren't independent of the
prefixes, they're there *exactly* because of the prefixes.
Now, the kioDay (or, equivalently, the dafternoon -- deci-afternoon
or deka-fternoon) is just a repetition of the horrible deka-mess and
should be resolved in a similar way. In this case, I'd say the binary
prefixes should only be allowed on a controlled vocabulary (probably
bit and byte -- anything else?). We'll need that anyway, just as
with deka.
If that's fixed and all prefixes are just a single char, you'll save
all the complicated explanations what the STRING might or might not
be by just saying, wherever you introduce quoted units:
Unknown units with SI prefixes must not be quoted.
> But as Rick says, the motivation here is to ensure robust parsing of
> unusual units. If we forbid prefixes on quoted units, then we're saying
> that quoted units are very significantly different from unquoted but
> unknown ones. That means that we forbid for example M'jupiterMass' --
> that looks pretty harmless to me, and so forbidding it doesn't sound
> like a great idea.
See, that's where we disagree. To me, M'jupiterMass' is quite a bit
of a disgrace, and I really think there's no use case for that.
MjupiterMass is more in line with everything else and works exactly
as well, once we've dealt with two-letter prefixes (which we must do
anyway).
So: I stand by my "*do* you *really* have to do this?" emotions.
> Quoted function names
>
> I _think_ that the idea of quoted function names was introduced (by me?) largely
> out of symmetry with the quoted units. I can't (come to think of it) think of
> any reason why we'd want to distinguish
>
> log(Hz)
>
> from
>
> 'log'(Hz)
Hm -- may I offer:
Python 2.7.3 (default, Jan 2 2013, 16:53:07)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> log
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'log' is not defined
>>> 'log'
'log'
> How about the following:
>
> > \section{Indicating dimensionless and unknown units}
> >
> > This specification reserves the unit \texttt{UNKNOWN}, which may not
> > appear in a VOUnits unit-string except as discussed here. A unit-string
> > consisting of the string \texttt{UNKNOWN}, alone, indicates that a
> > quantity has unknown units. This string should be recognised
> > case-insensitively by an application, as a separate step before attempting
> > any VOUnits parsing.
Can we drop the thing with case-insensitivity and just say it's
"unknown"? And again, I'd make very clear that this indicates the
case in which "it is known that there should be a unit but due to
some unspecified mishap it got lost".
I've vowed to speak out against case-insensitivity whereever it crops
up; too many time I've made weird hacks to make something case
insensitive, and the hoops I had to jump to in RegTAP sealed the case
for me.
Here, it's particularly insiduous: "Unit strings are case-sensitive,
except when they're not." Ugh.
> > A unit string consisting of the string \texttt{-}, alone, indicates that a
> > quantity is dimensionless.
That I'd consider a major change, and one that'd been discussed
before and clearly not liked very much.
I'd agree to a grammar change that accepts the empty string
(actually, I'd like that a lot).
The "-" I really don't like.
> The current document takes its list of known units from the list in
> src/grammar/known-units.csv at <https://bitbucket.org/nxg/unity>, and one way of
> updating the list of units would be to declare that this file has some normative
> value.
That's about what I had in mind. But I shun the effort of specifying
how these changes would be done that late in the standardization
process...
Cheers,
Markus
More information about the semantics
mailing list