VOunits draft

Rob Seaman seaman at noao.edu
Sun May 24 12:32:29 PDT 2009


Hi Anita,

> There are two issues here.  Firstly, we need to recognise unit  
> strings attatched to published data.  My impression is that rather a  
> lot of SI prefixes are in common use, from milli to Tera for Hz for  
> example.

One imagines all the SI prefixes are used between one set of measures  
or another.  But a matrix with base units (eg, meters) on one axis and  
prefixes on the other would be sparsely filled in real world usage.   
Even if the matrix were jam packed, however, there aren't so many  
prefixes that it wouldn't be perfectly practical to compile a flat  
list of a few thousand entries.

This is especially true since the abbreviations generating the  
corresponding unit labels are quite idiosyncratic.  It is that usage  
we're trying to recognize and replicate, not to popularize a new  
scheme of nomenclature.

Users can always provide or applications require the full names of  
units - millimeter instead of mm.  For that matter, what do you intend  
to do with instances like cc as an alias for a milliliter?  If you're  
not careful, the VO will take over the entire bailiwick of the  
Chemical Rubber Company :-)

> Secondly, one of the things which came up from the initial attempt  
> to get use cases was that users often need data in relatively short  
> floating point numbers with SI prefixes rather than with huge (or  
> tiny) exponents, since labelling axes on a plot or tabulating  
> results as 9.87 to 345.6 nJy is often much more convenient, and  
> intuitive for the human reader to visualise, than 9.78e-9 to  
> 3.456e-7 or 0.00000000987 to 0.0000003456 Jy.

But what is the use case here?  Are we talking about generating plots  
labeled in nJy from some table that contains a column with units of  
nJy?  Or will VO compliance require that some user who desires nJy has  
to load a table in units of Jy with extremely small values just to  
create a plot rescaled to nJy?

> Handling data internally using SI prefixes also helps to avoid  
> possible loss of precision - see my previous rant - although really  
> that should be fixed by making all tools format numbers sensibly,  
> but you often don't find out until you try and pass nJy through a  
> package written when 100 mJy was the depths of sensitivity...

Indeed, but this seems a numerical computing issue, not a  
representation issue.  There is no reason that such scaling has to be  
quantized to powers of ten.

> We need to make sure that we regonginse SI prefixes to avoid Mcm, etc.

And the most reliable way to recognize correct usage is to have a  
vetted vocabulary rather than generating it on the fly.

> 'Decibels' does illustrate the point made by Paddy Leahy I think,  
> that we should be able to parse the whole prefic (deci) as well as  
> the abbreviation.  And if for a few units we have to have special  
> rules like 'don't convert to centibels', that is no big deal.

What it suggests to me is that the rules are too complex (and possibly  
too expensive) to implement at runtime.  Rather, the goal should be to  
parse an expression against a static list of all viable combinations  
of prefix and base unit.  That will be hard enough to get correct.

> either by using existing libraries or by writing our own code for an  
> SI parser (but surely there is one already?).

If there is such a parser we might adopt it.  If there isn't, the IVOA  
likely isn't the appropriate body to create one.

Rob



More information about the dm mailing list