SODA, section 4.3

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Fri Nov 11 09:34:47 CET 2016


Hi Pat,

On Thu, Nov 10, 2016 at 08:56:22AM -0800, Patrick Dowler wrote:
> Comments inline...
> 
> On 10 November 2016 at 01:56, Markus Demleitner
> <msdemlei at ari.uni-heidelberg.de> wrote:
> > Hi Pat,
> >
> > On Wed, Nov 09, 2016 at 12:17:18PM -0800, Patrick Dowler wrote:
> >> The reason intervals are treated that way is consistency.  My reading
> >
> > Well, we're going to be inconsistent anyway -- there's no way to
> > reconcile our ugly hacks on CIRCLE and POLYGON with the semantics
> > intended by VOTable in (not only, I claim) my reading.
> 
> Well, the VOTable xsd says that value in MIN/MAX is any string, so
> interpretation is completely up to the parser or application. I
> don't see any reason that  any of the attributes of the enclosing
> PARAM should be ignored when interpreting it.  I think it is clear
> they are intended to be used, and not just datatype. We are just
> taking this to the logical conclusion when  an xtype is specified.

The reason to ignore them is very simple: Implementation sanity.  I
don't need to know the unit, or the utype, or indeed anything except
datatype to interpret MIN and MAX.  I retrieve the value parser
associated with the datatype, parse @value, and can immediately see
whether any value for a field or param is or is not ok.  Of course,
the same goes for VALUES/@null -- it would suck if the rules for
MAX/@value were different from the ones for VALUES/@null.

If a VOTable processor needed to hedge against special rules for
whatever other attributes FIELD or PARAM might have, the
implementation becomes a nightmare.  No, if you want to have
something that *necessitates* a custom value parser, then you must
define a datatype, not an xtype.

Put a bit more mathematically, the question is: what is the signature
of a get_value_parser function in VOTable?  Do we want it to be

get_value_parser(datatype) -> callable

or should it be

get_value_parser(datatype, arraysize, xtype, ...) -> callable.

I argue that the first is not only much more desirable but actually
completely sufficient. Of course, things break down for the
array-abusing CIRCLE and POLYGON; but that's because the modelling
for them is wrong in the first place; it should be cleaned up when we
have a good model for spherical regions.

Conversely, saying

    <PARAM name="BAND" unit="m" ucd="em.wl"
      datatype="double" arraysize="2"
      xtype="interval" value="">
      <DESCRIPTION>The wavelength intervals to be extracted</DESCRIPTION>
      <VALUES>
        <MIN value="3e-7"/>
        <MAX value="8e-7"/>
      </VALUE>
    </PARAM>

is *at least* as expressive as when you have your 
<MAX value="3e-7 8e-7"/>, I'd argue a lot more intuitive, and in
particular it's consistent with the other conceivable uses, both
within SODA metadata declaration and in normal VOTables.  I'm
metioning

    <PARAM name="POL" ucd="meta.code;phys.polarization"
      datatype="char" arraysize="*" value="">
      <DESCRIPTION>Polarization states to be extracted.</DESCRIPTION>
      <VALUES>                                                       
        <OPTION>I</OPTION>
        <OPTION>V</OPTION>
      </VALUE>
    </PARAM>

-- you certainly wouldn't want <OPTION>I V</OPTION> here, even though
there's arraysize="*", right?

And of course it's consistent with, say

    <PARAM name="ATTENUATION" 
      datatype="double" value="">
      <DESCRIPTION>A factor to dampen everything with</DESCRIPTION>
      <VALUES>
        <MIN value="1"/>
        <MAX value="1e-10"/>
      </VALUE>
    </PARAM>

or other scalar parameters or table rows.

Finally, I'd argue that <MAX value="3e-7 8e-7"/> is positively
confusing; even if one buys that you'll have one value per array
element. The (IMHO plausible) guess that array element 0 is bounded
by 3e-7 and array element 1 is bounded by 8e-7 is, of course, wrong.

*Both* are bounded by 3e-7 downwards and by 8e-7 upwards.  That's why
an array is an acceptable representation (it's homogenoeus), and
confusing that fact is something we'll regret later.

> > Hm... no.  Admittedly, VOTable is a bit hazy here, which is why we
> > *might* just get away with what we do to VALUES for CIRCLE and
> > POLYGON. But even talking about minimum and maximum really precludes
> > using array literals (as they are not orderable preserving
> > arithmetic).  Language like "The domain may therefore be defined as a
> > single interval" (VOTable 1.3, p. 16) reinforces this notion.
> 
> That was undoubtedly written before xtype was introduced in VOTable-1.2
> so I'd suggest that the full implications of xtype were not apparent.

Well, perhaps, but as argued above at least *I* don't think xtypes
should have any implication on MIN/MAX, and that there actually are
no implications of xtype for them.  And hence I'm severely unhappy
to, by gentleman agreement, simply re-interpret the standard
language when I really see no good reason to.

> Still, if we do this with circle and polygon then we can do it with
> interval and I that means xtype usage dictates interpreting values
> in MAX. VOTable-2.0?

My opinion is, again, that we (really) shouldn't be doing it for
circle and polygon either.  Until the rest of the VO can tell us how
to sanely do geometries, we dare do an emergency hack here and plead
forgiveness from the VOTable implementors.

Interval modelling with arrays and xtypes, on the other hand, is
sane, and we can confidently say: Dear VOTable crowd, thanks for
providing us with the facilities to properly model what we need.

My bottom line, I guess, is: We should not complicate standards in
order to accomodate emergeny hacks.

          -- Markus


More information about the dal mailing list