VOUnits RFC
Tom McGlynn
Thomas.A.McGlynn at nasa.gov
Mon Jul 29 07:52:22 PDT 2013
I'm responding to Rob's message mostly as a convenient place to get
into the discussion but I will respond a little bit specifically to
his points at the end. Apologies if I'm simply reiterating arguments
that others have made.
I've been a little confused by this discussion in terms of what the
various choices imply. From my perspective what matters is that I be
able to properly write VOTables (and such) in a way that users can
understand what is in them... Suppose I have a table of planets and
with columns that are intended to be the mass of the planet in
Jupiters or Earths. How do I write that table?
If I understand it, I can't do that in the scenario where I am not
allowed to specify a general floating point factor in the units,
because I can only get units to within the nearest factor of 10. That
seems like a really big loss. We don't want to force people to
rewrite the values in tables just so they can fit in the units framework.
In FITS in addition to having the TUNITn columns we have the TSCALn
columns. So in FITS I've no problem with writing a table with columns
that have units of the mass of Jupiter. The TUNITn='kg' and
TSCALn=1.9e27.
In VOTables (and elsewhere) we don't have, AFAIK, a comparable scaling
capability nor is it likely any time soon. Since I perceive that
astronomers are oft enamored of non-SI units, we'd be requiring
wholesale rescaling of values in tables for tables to be able to use
this convention. I don't see that happening.
The argument against this is the anticipated complexity both to the
reader and in the standard specification if this scaling factor is
allowed. In the previous version a scaling factor was described only
in the last paragraph of 2.7, which rather promiscuously suggests 3
separate formats in a very short time with limited discussion of
these. None of these formats manifestly allow what I need for my my
M_J columns. However a string
1.9e27 kg
could do so and the comparable syntax (especially if the exponential
is optional) would cover all comparable cases very nicely. I think
the standard would be better served to allow only a standard integer
or floating point (using the notation in the FITS standard 4.2.4 but
allowing either case e,d). This seems simpler than the previous
suggestion and makes the transformation to/from FITS much easier since
we can just use the prefix in the FITS TSCALn and the following string
in TUNITn.
The discussion in the text might note the relationship with the TSCALn
keywords in fits.
This addresses most of Rob's questions regarding the actual format
limitations, but what about his questions about whether we support
single/double, what limits we need to make, ... about how these are to
be interpreted. The standard should be completely oblivious to these.
In practice when reading these our software will read
1.2
and
1.2345678901234567891234567801233e33
with equal facility and whether the second really has vastly more
precision than the first is unknowable and unaddressed by this standard.
Note that throughout I'm trying to be largely consistent with what
happens in FITS with the single exception that we allow lower case
e/d. I take that as being a critical driver.
Tom
P.S., in my example above, I put an embedded blank between the scale
and unit. My quick perusal of the standard suggests that this is
currently illegal, but I think allowing them there at least would make
things clearer, but that's not essential.
Since d is a valid unit (for days) its use as an exponent would be
disambiguated by have a following +/- or digit.
Rob Seaman wrote:
> On Jul 29, 2013, at 5:27 AM, Markus Demleitner <msdemlei at ari.uni-heidelberg.de> wrote:
>
>> Well, in that case: Unless someone dislikes them very heavily, let's
>> just allow arbitrary floats as prefixes, preferably in the standard
>> floating-point literal form basically every programming language out
>> there uses, shan't we? Simple, quick, does the job, not network
>> access required to parse units, and a very happy me. Perfect!
>
> "Perfect" may be a bit strong. If you actually mean floating-point, is it required of applications to support single or double precision or do they all have to handle arbitrary precision? If some narrowed subset of scientific notation, does IVOA attempt to support both "e" and "d" exponents? (And/or "**" or "^" or even "x"?) Capitalization? Is one style preferred, but others accepted? Ought an application to standardize on output? Might then inputs differ from outputs? What about non-ASCII? How do we truncate if a user supplies more digits than fit in the required precision? FP equality test? (Is this then a special case of a general units equality test?) Can exponents have decimal points? (How about exponents denoting dimensionality?) In that case can they themselves be expressed recursively as a FP literal? How might a special value be denoted or recognized, e.g., pi or a mole (which is a fundamental SI unit)? Does it have to be spelled-out in that case?
Are the
re routines to recognize and convert? How about converting exponents back and forth to SI prefixes? Can there be multiple prefixes? Can prefixes occur in the denominator? Or must they use a negative exponent? Must prefixes actually be prefixed or can they be embedded later in a string? When parsing a units string should an application prefer lumping or splitting to distinguish a units-prefix from a number expressing a value in those units? Every quantity would otherwise become 1.0 units-with-a-prefix ;-)
>
> I'm not saying they shouldn't be there, but including numeric prefixes raises many additional questions and those questions have implications for the pure units issues.
>
> Rob
>
More information about the semantics
mailing list