VOunits draft
Rob Seaman
seaman at noao.edu
Sun May 24 12:32:29 PDT 2009
Hi Anita,
> There are two issues here. Firstly, we need to recognise unit
> strings attatched to published data. My impression is that rather a
> lot of SI prefixes are in common use, from milli to Tera for Hz for
> example.
One imagines all the SI prefixes are used between one set of measures
or another. But a matrix with base units (eg, meters) on one axis and
prefixes on the other would be sparsely filled in real world usage.
Even if the matrix were jam packed, however, there aren't so many
prefixes that it wouldn't be perfectly practical to compile a flat
list of a few thousand entries.
This is especially true since the abbreviations generating the
corresponding unit labels are quite idiosyncratic. It is that usage
we're trying to recognize and replicate, not to popularize a new
scheme of nomenclature.
Users can always provide or applications require the full names of
units - millimeter instead of mm. For that matter, what do you intend
to do with instances like cc as an alias for a milliliter? If you're
not careful, the VO will take over the entire bailiwick of the
Chemical Rubber Company :-)
> Secondly, one of the things which came up from the initial attempt
> to get use cases was that users often need data in relatively short
> floating point numbers with SI prefixes rather than with huge (or
> tiny) exponents, since labelling axes on a plot or tabulating
> results as 9.87 to 345.6 nJy is often much more convenient, and
> intuitive for the human reader to visualise, than 9.78e-9 to
> 3.456e-7 or 0.00000000987 to 0.0000003456 Jy.
But what is the use case here? Are we talking about generating plots
labeled in nJy from some table that contains a column with units of
nJy? Or will VO compliance require that some user who desires nJy has
to load a table in units of Jy with extremely small values just to
create a plot rescaled to nJy?
> Handling data internally using SI prefixes also helps to avoid
> possible loss of precision - see my previous rant - although really
> that should be fixed by making all tools format numbers sensibly,
> but you often don't find out until you try and pass nJy through a
> package written when 100 mJy was the depths of sensitivity...
Indeed, but this seems a numerical computing issue, not a
representation issue. There is no reason that such scaling has to be
quantized to powers of ten.
> We need to make sure that we regonginse SI prefixes to avoid Mcm, etc.
And the most reliable way to recognize correct usage is to have a
vetted vocabulary rather than generating it on the fly.
> 'Decibels' does illustrate the point made by Paddy Leahy I think,
> that we should be able to parse the whole prefic (deci) as well as
> the abbreviation. And if for a few units we have to have special
> rules like 'don't convert to centibels', that is no big deal.
What it suggests to me is that the rules are too complex (and possibly
too expensive) to implement at runtime. Rather, the goal should be to
parse an expression against a static list of all viable combinations
of prefix and base unit. That will be hard enough to get correct.
> either by using existing libraries or by writing our own code for an
> SI parser (but surely there is one already?).
If there is such a parser we might adopt it. If there isn't, the IVOA
likely isn't the appropriate body to create one.
Rob
More information about the dm
mailing list