VOUnits RFC

Mon Jul 29 07:52:22 PDT 2013

I'm responding to Rob's message mostly as a convenient place to get 
into the discussion but I will respond a little bit specifically to 
his points at the end.  Apologies if I'm simply reiterating arguments 
that others have made.

I've been a little confused by this discussion in terms of what the 
various choices imply.  From my perspective what matters is that I be 
able to properly write VOTables (and such) in a way that users can 
understand what is in them...  Suppose I have a table of planets and 
with columns that are intended to be the mass of the planet in 
Jupiters or Earths.  How do I write that table?

If I understand it, I can't do that in the scenario where I am not 
allowed to specify a general floating point factor in the units, 
because I can only get units to within the nearest factor of 10.  That 
seems like a really big loss.  We don't want to force people to 
rewrite the values in tables just so they can fit in the units framework.

In FITS in addition to having the TUNITn columns we have the TSCALn 
columns.  So in FITS I've no problem with writing a table with columns 
that have units of the mass of Jupiter.  The TUNITn='kg' and 
TSCALn=1.9e27.

In VOTables (and elsewhere) we don't have, AFAIK, a comparable scaling 
capability nor is it likely any time soon.  Since I perceive that 
astronomers are oft enamored of non-SI units, we'd be requiring 
wholesale rescaling of values in tables for tables to be able to use 
this convention.  I don't see that happening.

The argument against this is the anticipated complexity both to the 
reader and in the standard specification if this scaling factor is 
allowed.  In the previous version a scaling factor was described only 
in the last paragraph of 2.7, which rather promiscuously suggests 3 
separate formats in a very short time with limited discussion of 
these.  None of these formats manifestly allow what I need for my my 
M_J columns.  However a string
     1.9e27 kg
could do so and the comparable syntax (especially if the exponential 
is optional) would cover all comparable cases very nicely.  I think 
the standard would be better served to allow only a standard integer 
or floating point (using the notation in the FITS standard 4.2.4 but 
allowing either case e,d).  This seems simpler than the previous 
suggestion and makes the transformation to/from FITS much easier since 
we can just use the prefix in the FITS TSCALn and the following string 
in TUNITn.

The discussion in the text might note the relationship with the TSCALn 
keywords in fits.

This addresses most of Rob's questions regarding the actual format 
limitations, but what about his questions about whether we support 
single/double, what limits we need to make, ... about how these are to 
be interpreted.  The standard should be completely oblivious to these. 
  In practice when reading these our software will read
    1.2
and
    1.2345678901234567891234567801233e33
with equal facility and whether the second really has vastly more 
precision than the first is unknowable and unaddressed by this standard.

Note that throughout I'm trying to be largely consistent with what 
happens in FITS with the single exception that we allow lower case 
e/d.  I take that as being a critical driver.

	Tom

P.S., in my example above, I put an embedded blank between the scale 
and unit.  My quick perusal of the standard suggests that this is 
currently illegal, but I think allowing them there at least would make 
things clearer, but that's not essential.

Since d is a valid unit (for days) its use as an exponent would be 
disambiguated by have a following +/- or digit.

Rob Seaman wrote:
> On Jul 29, 2013, at 5:27 AM, Markus Demleitner <msdemlei at ari.uni-heidelberg.de> wrote:
>
>> Well, in that case: Unless someone dislikes them very heavily, let's
>> just allow arbitrary floats as prefixes, preferably in the standard
>> floating-point literal form basically every programming language out
>> there uses, shan't we?  Simple, quick, does the job, not network
>> access required to parse units, and a very happy me.  Perfect!
>
> "Perfect" may be a bit strong.  If you actually mean floating-point, is it required of applications to support single or double precision or do they all have to handle arbitrary precision?  If some narrowed subset of scientific notation, does IVOA attempt to support both "e" and "d" exponents?  (And/or "**" or "^" or even "x"?)  Capitalization?  Is one style preferred, but others accepted?  Ought an application to standardize on output?  Might then inputs differ from outputs?  What about non-ASCII?  How do we truncate if a user supplies more digits than fit in the required precision?  FP equality test?  (Is this then a special case of a general units equality test?)  Can exponents have decimal points?  (How about exponents denoting dimensionality?)  In that case can they themselves be expressed recursively as a FP literal?  How might a special value be denoted or recognized, e.g., pi or a mole (which is a fundamental SI unit)?  Does it have to be spelled-out in that case?  
 Are the
re routines to recognize and convert?  How about converting exponents back and forth to SI prefixes?  Can there be multiple prefixes?  Can prefixes occur in the denominator?  Or must they use a negative exponent?  Must prefixes actually be prefixed or can they be embedded later in a string?  When parsing a units string should an application prefer lumping or splitting to distinguish a units-prefix from a number expressing a value in those units?  Every quantity would otherwise become 1.0 units-with-a-prefix ;-)
>
> I'm not saying they shouldn't be there, but including numeric prefixes raises many additional questions and those questions have implications for the pure units issues.
>
> Rob
>