VOUnits RFC

Norman Gray norman at astro.gla.ac.uk
Thu Aug 1 06:59:43 PDT 2013


Markus, hello.

This response is to Markus's and Marco's emails.

On 2013 Jul 31, at 09:48, Markus Demleitner wrote:

> Hi Norman, hi others,
> 
> On Tue, Jul 30, 2013 at 03:10:18PM +0100, Norman Gray wrote:

> Deferring unit validity (as opposed to well-formedness, which I'd
> define to mean here "parseable with the top-level grammar with
> arbitary unit strings") to the "application level" and at the same
> time disallowing scale factors make that migration goal murky at
> best.

The question of unit validity is perfectly well defined in this document (and straightforwardly implemented).  It's the question of _what to do_ with an invalid unit that's deferred.  'What to do' could be 'object furiously', but it doesn't have to be, and in particular it doesn't have to be that in order to have a useful service that can parse units successfully _and_ which retains the option of not objecting to unknown units.

For example, an application could write:

UnitExpression* my_parse_vounitstring(const char* str)
{
    UnitExpression* rval = unity_parse_string(str, UNITY_SYNTAX_VOUNITS);
    if (rval != NULL) {
        if (! unity_check_expression(rval, UNITY_SYNTAX_VOUNITS, UNITY_CHECK_ALL)) {
            fprintf(stderr, "bad units; you are a bad person\n");
            rval = NULL;
        }
    }
    return rval;
}

That parsing function my_parse_vounitstring parses the unit and fails noisily unless all of the units in the string are known, non-deprecated, and don't have inappropriate SI prefixes.

I don't see where there's anything murky here.

[skipping...]

> Or think of units in VOTables uploaded to TAP services -- there, I'd
> at least like to be able to decide if the units roughly match and
> give warnings if they don't.  And, of course, I'm dreaming of a
> CAST_UNIT(col, dest_unit) function in future ADQL versions, for which
> you'd have to know col's unit.  This is not hard when VOUnit
> satisfies my use case, nigh impossible if it doesn't.

I think this would be pretty easy to layer on top of the Unity implementation.  And, as I illustrated in the other message, you could even do it partially, if some of the units were unknown.

> That right now unit attribute values in VOTables cannot reliably
> parsed is sad and poses a big problem for the VO's promise of
> bringing together easily data from different sources, but I'd like to
> fix that to the extent possible, which is why I really don't like
> arbitrary atomic units.

Fine -- and if that's what your application wants to do, then it can wrap the parsing function as illustrated above, and simply fail at parse time if the unit isn't all-known.  Yes, the VOTable will still have the offending unit in it, but (a) that's presumably because the data provider felt it was necessary to use this 'unknown unit' in order to express what they need to express, and (b) it might very well be that other users of this table will not have such strict requirements, and will be perfectly happy parsing [jupMass/h] and doing something sensible with it.

>>> http://dc.zah.uni-heidelberg.de/__system__/adql/query/form?__nevow_form__=genForm&query=select%20distinct%20unit%20from%20rr.table_column%20where%20unit%20like%20%27%25.%25%27&_TIMEOUT=5&_FORMAT=HTML&submit=Go
>> 
>> Urghh.  I presume that list has been case-folded in some way, since
> 
> Yuck! Bug!  Stomp, stomp, stomp.  Fixed data due in a few minutes.

I'm still seeing all lowercase with that query...

>> Permitting 'unknown unit' strings is a sort of loose provenance,
>> yes, but that's not the motivation.
> 
> But it wouldn't be necessary and thus wouldn't ruin my use case if we
> had suitable provenance...

So this is a use-case for a future provenance effort.

>> However:
>> 
>>  * I think it would be good to include language in the spec that
>>  deprecates this in most cases, as OGIP does, for example; and
> 
> If you absolutely must; however, I'd still much more like easily
> computable units, so I'd much rather deprecate the unknown units.

I think there's no problem in deprecating unknown units -- in the mild sense of advising people writing unit strings that it's obviously better if they use only known units.  However 'deprecation' includes the implication 'but if you feel it's necessary to do X, then you go right ahead'.

Marco responds to this same point with:

> Here's where I don't agree with Markus. I think we can live with both, and
> the below (b) point will be the one that will define in the future which is
> the best solution.
> Maybe data providers will go for full-SI units, maybe jupMass will be
> highly used for exoplanets; excluding now one of these two capabilities
> does not make sense to me.


This is looking dangerously like consensus!

It might be that the remaining points of discussion are to do with the language in the spec.  I'll make some revisions and make another version available.

All the best,

Norman


-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK



More information about the semantics mailing list