VOUnits RFC
Norman Gray
norman at astro.gla.ac.uk
Mon Jul 29 11:45:12 PDT 2013
Tom, hello.
[a long one, I'm afraid!]
First, parenthetically, a response to Marco:
On 2013 Jul 29, at 14:17, Marco Molinaro wrote:
> Markus wish seems to be adding or changing the scale-factor with something
> like CDSFLOAT (already allowed) with [e|E] instead of [10x].
> Since scale-factor has already FLOAT and CDFLOAT in it I don't see the
> problem.
In grammar-hacking terms it's a simple change, but the FITS units spec permits only round powers of ten, and though the OGIP spec permits FLOAT, the text of that spec effectively deprecates having this scale-factor present, and if it has to be there suggests that it should be a round power of ten. So it's really only the CDS spec that is fully comfortable with scale factors in that position. So the 'FLOAT' in the OGIP grammar is slightly deceptive.
Thus a unit string like "25.4mm" is non-conformant in FITS and deprecated in OGIP.
Back to Tom:
On 2013 Jul 29, at 15:52, Tom McGlynn wrote:
> I've been a little confused by this discussion in terms of what the various choices imply. From my perspective what matters is that I be able to properly write VOTables (and such) in a way that users can understand what is in them... Suppose I have a table of planets and with columns that are intended to be the mass of the planet in Jupiters or Earths. How do I write that table?
>
> If I understand it, I can't do that in the scenario where I am not allowed to specify a general floating point factor in the units, because I can only get units to within the nearest factor of 10. That seems like a really big loss. We don't want to force people to rewrite the values in tables just so they can fit in the units framework.
>
> In FITS in addition to having the TUNITn columns we have the TSCALn columns. So in FITS I've no problem with writing a table with columns that have units of the mass of Jupiter. The TUNITn='kg' and TSCALn=1.9e27.
That's true, but you have to put that scaling factor in the FITS header, because the FITS unit strings don't permit a (non-power-of-ten) scaling factor (as you know). The VOUnits specification is concerned exclusively with the content of the TUNITn card.
And the specification _is_ concerned with the FITS syntax because (as I've stressed above), it's a goal that any VOUnits-compatible units string would also be a syntactically valid FITS unit string (and the same for CDS and nearly so for OGIP). Permitting a scaling factor would break that.
(The document describes the VOUnits prescriptions as "the intersection of the syntaxes and the union of the 'known units'". )
I'll argue for this compatibility as an important property for general interoperability, but I'm not wedded to it, and if there ends up being a consensus against it that's fine by me.
> In VOTables (and elsewhere) we don't have, AFAIK, a comparable scaling capability nor is it likely any time soon. Since I perceive that astronomers are oft enamored of non-SI units, we'd be requiring wholesale rescaling of values in tables for tables to be able to use this convention. I don't see that happening.
A point of clarification: I'm not positive I follow where the rescaling would be necessary. Do you mean that at present VOTables can use "1.9x10+27kg" as a unit string (because they use CDS-format unit strings), but couldn't if there was an immediate switch to VOUnits strings, and therefore that the content of the VOTable would have to be scaled when it's generated?
> The argument against this is the anticipated complexity both to the reader and in the standard specification if this scaling factor is allowed. In the previous version a scaling factor was described only in the last paragraph of 2.7, which rather promiscuously suggests 3 separate formats in a very short time with limited discussion of these.
Which document are you referring to with 'the previous version'? The 20120522 and 20111216 versions (oh, my heavens...) of the VOUnits spec don't have a section 2.7, so I'm a bit confused.
> None of these formats manifestly allow what I need for my my M_J columns. However a string
> 1.9e27 kg
> could do so and the comparable syntax (especially if the exponential is optional) would cover all comparable cases very nicely.
Putting aside for the moment Rob's list of parsing problems (without discounting them), let's focus on the question of how one might indicate that a thing is 1.5 jupiter masses.
*** Use cases...
In a FITS table, you'd write 1.5 in the table, write TUNITn='kg' and write TSCALn=1.9e27
In a FITS header card, there's no TUNITn to fall back on, so you'd have to write (presuming this convention)
MYMASS = 1.5 // [jupMass] jupiter masses
or
MYMASS = 2.85e27 // [kg] 1.5 jupiter masses
You couldn't/shouldn't write
MYMASS = 1.5 // [1.9e26kg] jupiter masses
...because that's not a valid FITS unit string.
In a VOTable, which you note doesn't have a TUNITn analogue, you could use either the analogue of the second of these, or put "1.9x10+27kg" as the unit, since the format of unit strings in VOTable is mandated to be the CDS spec. The CDS spec, incidentally, recognises solMass as a unit, but not jupMass.
Other places where you might want a unit string are:
* in a structured comment in a RDBMS or other schema, documenting a column;
* in a request to a web service (SOAP or otherwise), indicating the desired units of the result; or
* in an annotation (RDFa-style) to a number in a web page; et cetera.
OK: use-cases complete (yes?).
*** So, alternatives
These seem to leave us with two alternatives for VOUnits:
1. permit numerical scale-factors, and thus units of "1.9e27kg" (or whatever f.p. syntax we choose); or
2. forbid numerical scale-factors, but permit 'unrecognised units', such as 'jupMass'.
(3. expand the set of known units to include all possibilities -- this is surely a non-starter)
This is still supposing we sidestep Rob's list of problems by some suitably vague language about rounding, ... *mumble*.
Option (1) means that we effectively smuggle a TSCALn behaviour into the unit string.
Option (1) also breaks consistency with FITS unit strings.
The problem with (1) is that this loses the information that this is a 'jupiter mass', and leaves it as being some apparently random scaling factor. That's not a problem if the data is going into a pipeline and nowhere else, but it could be a problem in some of the other cases. If I found this 1.9e27kg as a unit column in a structured comment, I'd probably want to strangle someone. If I want my results in units of jupiter masses, and so declare units of "1.9x10+27kg", then I'm going to get different numbers from someone who requests results in units of "1.8986x10+27kg". I don't think that's a good thing -- this higher-level information about the unit is lost.
The problem with (2) is that this loses if the receiving application doesn't know what a 'jupMass' is (though if it _does_ know what a 'jupMass' is, then it can presumably perform better than case 1).
I think 2 is a problem, but I don't think it's a bad problem. That is, I more and more confidently agree with Rick in his message of this morning, that you've lost meaning that might be important down the line, if you lose the link between this unit and the mass of jupiter. OK, there's a problem if the receiver doesn't recognise 'jupMass', but there are mitigation strategies (some listed by Rick), one of which is to look at the documentation, and discover "ah, _this_ exoplanet database recognises the non-standard unit 'jupMass', so I can use that in my requests to it".
*** And finally...
I end up in favour of (2), because it leaves the problem of 'odd' units at the 'application' level (as it were), where I think it belongs.
Two final remarks:
* As I've noted earlier in this thread, the Unity library implements this spec by parsing anything that looks like a unit string. The difference between 'kg' and 'jupMass' is that the library has some extra information about grammes, such as dimensions, a definition string and some usage constraints; but it simply preserves the string 'jupMass' for the other case.
* At present, 'anything that looks like a unit string' means a sequence of /[a-zA-Z%]+/, so 'M_J' would currently cause a lexical error. I'm not _particularly_ inclined to change that, but could be easily persuaded.
Does all this help?
All the best,
Norman
--
Norman Gray : http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK
More information about the semantics
mailing list