VOUnits RFC

Norman Gray norman at astro.gla.ac.uk
Mon Jul 29 05:00:39 PDT 2013


Rick and Markus, hello.

On 2013 Jul 29, at 09:31, Frederic V. Hessman wrote:

> 
> On 29 Jul 2013, at 09:41, Markus Demleitner <msdemlei at ari.uni-heidelberg.de> wrote:
> 
>> On Fri, Jul 26, 2013 at 12:12:34PM +0200, Frederic V. Hessman wrote:
>>> On 26 Jul 2013, at 11:25, Norman Gray <norman at astro.gla.ac.uk> wrote:

>>>> That's an argument I hadn't thought of.  It's certainly a
>>>> hygienically consistent position (so I stand beside you on that),
>>>> but from a practical point of view, I suspect that data providers
>>>> really do want to head columns 'jupMass', and I'm not sure it's
>>>> up to us to say they oughtn't.
>> 
>> "head columns" in the sense of "naming" them?  They'd be welcome to
>> do that.  Have jupMass in the unit string?  I'm pretty sure we should
>> talk them out of that.  I see basically two purposes of defining a
>> grammar for machine readable unit strings:
>> 
>> (1) let clients bring values in data retrieved from multiple sources
>> to common units without user intervention
>> 
>> (2) let clients present query forms (or similar artifacts) to the
>> user in units convenient to them and convert to whatever the service
>> expects the values in on the fly.

No, the purpose of the unit grammars (plural) is to provide the _first step_ in either of those activities, and to establish limited consensus for the second step.

That is (step one of this process), given a character string which purports to describe a 'unit', how should this be turned into a sequence of (multiplier, base-unit, power) tuples, with a scale factor in front.  For that step, the only 'known units' question is whether or not one should regard 'Pa' as the Pascal rather than 10^15 'a' (whatever an 'a' is).  That is, some units would be parsed 'wrongly' if you didn't know to special-case them.

Step two is a subsequent, and independent, step in which we discover that the IVOA has (we assert) decided that 'a' should be interpreted as specifically the year rather than the are, and that 'B' is a byte rather than a bel.

That same step allows us to say that the 'erg' is deprecated or unknown (syntax-dependent), that 'mas' oughtn't to have SI prefixes, and so on.

At that point, an application is free to do the conversions you describe (useful things to do!).  It will, however, presumably do them only if it first checks that all of the units in the string it has parsed are in fact recognised ones.  It not, it can signal an error, or do something more flexible (for example, it could convert "jupMass/hr" into 1/3600 'jupMass'/s without knowing what a 'jupMass' is).

Rick:

> Your user has a table with data having the units "mas" and "M_Jupiter" : do you help her to get this data in or out of the VO universe so that it can be processed further or not?  I'd say you do - this is the whole point of VO - unless such a task is so complicated that it's unmanageable.

Indeed.  The full task of writing a comprehensive conversion service may indeed be nearly unmanageable (this is why numerous people have approached this and then backed off, and we haven't had any real progress on this over the duration of the IVOA), but the basic task of simply making some sense of the units -- in a 'good enough' way -- is quite easy.

If we identify a simple and independent element of a task, and solve it well, then we can build cleverness on top of that later, and think Rick's proposed collection of extensions is excellent:

> 	- always keep track of standard units if you do anything with the data - here are the standard units…..;
> 	- always be able to parse complex units in terms of standard units - here are the rules…..;
> 	- always be prepared to get a scale factor with your units - here are the rules…..;
> 	- if you don't understand a unit, ask someone who does - here is someone who knows…...;
> 	- if possible, keep track of unit metadata - you may not need it, but someone else down the road may.

Moving on...

>>> <skos:Concept rdf:about="vou:units#jupiterMass">
>>> 	<skos:prefLabel>M_jupiter</skos:prefLabel>
>>> 	<skos:definition>1.89813e27 kg</skos:definition>
>>> 	<skos:altLabel>jupMass</skos:altLabel>
>>> 	<skos:altLabel>Mjup</skos:altLabel>
>>> 	<skos:altLabel>M_jup</skos:altLabel>
>>> 	<skos:altLabel xml:lang="en">jupiter mass</skos:altLabel>
>>> 	<skos:altLabel xml:lang="en">jupiter masses</skos:altLabel>
>>> 	<skos:related rdf:resource="iau93:#jupiter"/>
>>> 	<skos:scopeNote xml:lang="en">Case is not important.</skos:scopeNote>
>>> </skos:Concept>
>>> 
>>> That way, your favourite unit parser could always simply ask
>>> vou:units for help.  Maybe I'm slightly misusing skos:definition,
>>> but it works just fine.
>> 
>> While I'd like such a resource, building basic VOUnits mechanism on
>> it opens a whole new can of worms -- who's going to maintain it?

This is actually closer than you think.

The Unity Java library exposes URLs for each of the units it knows about (the C library doesn't because implementing things in C is even less fun than doing so in Java, but it could be added).

These URLs are those of <http://www.qudt.org/>, which is a very comprehensive collection of the messy information about dimensions, definitions, names and so on, which is at the heart of any manipulation of units.  If anyone wants to do stuff with units, we should just get behind the QUDT effort and keep it in one place, rather than reinventing this stuff badly.

Now, using that QUDT information isn't trivial (and this is why I haven't advertised this before), but all the information is there.

For example:

qudt-unit:Ampere   rdf:type   rdfs:Resource
qudt-unit:Ampere   rdf:type   owl:Thing
qudt-unit:Ampere   rdf:type   qudt:Unit
qudt-unit:Ampere   rdf:type   qudt:ElectricCurrentUnit
qudt-unit:Ampere   rdf:type   qudt:ElectricityAndMagnetismUnit
qudt-unit:Ampere   rdf:type   qudt:SIBaseUnit
qudt-unit:Ampere   rdf:type   qudt:SIUnit
qudt-unit:Ampere   rdf:type   qudt:PhysicalUnit
qudt-unit:Ampere   rdf:type   qudt:ScienceAndEngineeringUnit
qudt-unit:Ampere   rdf:type   qudt:BaseUnit
qudt-unit:Ampere   rdfs:label   "Ampere"^^<http://www.w3.org/2001/XMLSchema#string>
qudt-unit:Ampere   skos:exactMatch   <http://dbpedia.org/resource/Ampere>
qudt-unit:Ampere   qudt:quantityKind   qudt-quantity:ElectricCurrent
qudt-unit:Ampere   qudt:symbol   "A"^^<http://www.w3.org/2001/XMLSchema#string>
qudt-unit:Ampere   qudt:literal   "A"^^<http://www.w3.org/2001/XMLSchema#string>
qudt-unit:Ampere   qudt:uneceCommonCode   "AMP"^^<http://www.w3.org/2001/XMLSchema#string>
qudt-unit:Ampere   qudt:abbreviation   "A"^^<http://www.w3.org/2001/XMLSchema#string>
qudt-unit:Ampere   qudt:conversionOffset   "0.0"^^<http://www.w3.org/2001/XMLSchema#double>
qudt-unit:Ampere   qudt:conversionMultiplier   "1"^^<http://www.w3.org/2001/XMLSchema#double>
qudt-unit:Ampere   qudt:code   "0050"^^<http://www.w3.org/2001/XMLSchema#string>
qudt-unit:SystemOfUnits_CGS-ESU   qudt:systemAllowedUnit   qudt-unit:Ampere
qudt-unit:SystemOfUnits_CGS-ESU   qudt:systemUnit   qudt-unit:Ampere
qudt-unit:SystemOfUnits_SI   qudt:systemBaseUnit   qudt-unit:Ampere
qudt-unit:SystemOfUnits_SI   qudt:systemDefinedUnit   qudt-unit:Ampere
qudt-unit:SystemOfUnits_SI   qudt:systemUnit   qudt-unit:Ampere
qudt-unit:SystemOfUnits_SI   qudt:systemCoherentUnit   qudt-unit:Ampere
qudt-unit:SystemOfUnits_Planck   qudt:systemAllowedUnit   qudt-unit:Ampere
qudt-unit:SystemOfUnits_Planck   qudt:systemUnit   qudt-unit:Ampere
qudt-unit:SystemOfUnits_CGS-EMU   qudt:systemAllowedUnit   qudt-unit:Ampere
qudt-unit:SystemOfUnits_CGS-EMU   qudt:systemUnit   qudt-unit:Ampere


>> To me, this seems a high price to pay to solve a problem 80% of which
>> is solved by allowing arbitrary scale factors.  The remaining 20%
>> (telling the user that 1.89813e27 kg really is meant to mean "mass of
>> Jupiter assumed here") are interesting, true, but IMHO it's fine if this
>> kind of -- human-oriented -- information is in the human-oriented
>> pieces of metadata, i.e., the column name and its description.
> 
> No, we shouldn't give up the units metadata without a fight, because that information, once gone, is gone forever (well, until your software asks a human in a pop-up window).


The issue here may be about which 80% is important.

I take it that the reason why we're discussing the parsing of strings like "mm.s**-2" rather than "-3/m/1:0/s/-2" (something like which would avoid many problems) is because we expect that the strings we're discussing will be basically the human-readable ones.  Having an explicit unit-string grammar means that data providers can write the human-readable things in the confidence that the result will _additionally_ be reliably machine-readable.  Or, where it's not machine readable (because someone wants to use 'jupMass') that it is at least partially machine readable, and that that partial readability is non-ambiguous.

----

This conversation is not, I think, really about whether or not we should permit non-round scale-factors (that's merely a minor edit to the grammars), but I can't neatly characterise what it _is_ actually about.

All the best,

Norman


-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK



More information about the semantics mailing list