VO-DML specification document

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Thu May 8 02:32:26 PDT 2014


Dear DM list,

As this touches one of my continuing concerns -- the lack of usable
STC declarations in most modern VOTables --, let me comment on a few
of François' points:

On Wed, May 07, 2014 at 06:38:46PM +0200, François Bonnarel wrote:
>           +  A large majority of the huge number of columns available in
>             the VO (those of the catalogs) are not associated with a model
>             attribute. Probably many can have one. It has started with
>             PhotDM ones for SED bulding and lot can be associated with STC
>             or others. But as long as we add models the number of VO-DML
>             GROUPS will increase for very partial matching

Well -- this is exactly why I'm in this game.  I claim we need proper
markup of STC instances, *including* relations of what's the
derivative of what, what system(s) are in use, what epoch the
position is for, etc.  We've actually needed that at least five years
ago, for as many of our tables and columns as we possibly can.

We need that to allow current clients to do precision astrometry, and
we need that for our tables to make sense 10 or 20 years from now.

We could kinda do this already using
http://www.ivoa.net/Documents/Notes/VOTableSTC/
-- but STC's XML schema is, well, suboptimal, the sheer number of
utypes is frightening.  Not quite as bad as with Characterisation, I
think, but it's still thousands of them.

The proposed VO-DML serialisation is a sanitation of this.  And given
the complexity of what's represented, it's simply impossible to get
by using the FIELD's attributes, not to mention cases when a FIELD is
part of two instances (e.g., the epoch of observation, which will
ususally (but not always) be pertinent for all coordinate sets (e.g.,
various reductions, both ICRS and galactic, etc) in a table.

Additionally, the fact that all information on the relation between a
DM and the data are concentrated in one group *really* helps coming
up with clean APIs -- the DM code doesn't have to see the full
VOTable and gather pieces of data from all around the document.

>           + Astronomical catalogs are (or will be) distributed with TAP.
>             Tap provides TABLES where the number of columns is variable,
>             dependant of the Actual ADL querry sent. This will  imply that
>             either the VO-DML-groups are also dependant of the QUERY (and
>             not unique for a service implementong a model) OR
>             (alternativly) that the VO-DML GROUPS contain some empty (or
>             absent) FIELDS.

Well, you mark up what you have.  I've implemented that, and I have
to say that's surprisingly straightforward.  Consider, for a
table with these columns:
http://dc.g-vo.org/__system__/dc_tables/show/tableinfo/arihip.main

$ curl -s -FLANG=ADQL -FREQUEST=doQuery -FQUERY="select top 1 raj2000, dej2000 from arihip.main" http://dc.g-vo.org/tap/sync | xmlstarlet fo

[...]
      <GROUP utype="stc:CatalogEntryLocation">
        <PARAM arraysize="*" datatype="char" name="CoordFlavor" 
          utype="stc:AstroCoordSystem.SpaceFrame.CoordFlavor" 
          value="SPHERICAL"/>
        <PARAM arraysize="*" datatype="char" name="CoordRefFrame" 
          utype="stc:AstroCoordSystem.SpaceFrame.CoordRefFrame" 
          value="ICRS"/>
        <PARAM arraysize="*" datatype="char" name="Epoch" 
          utype="stc:AstroCoords.Position2D.Epoch" value="2000.0"/>
        <PARAM arraysize="*" datatype="char" name="yearDef" 
          utype="stc:AstroCoords.Position2D.Epoch.yearDef" value="J"/>
        <PARAM arraysize="*" datatype="char" name="URI" 
          utype="stc:DataModel.URI" value="http://www.ivoa.net/xml/STC/stc-v1.30.xsd"/>
        <FIELDref ref="raj2000" utype="stc:AstroCoords.Position2D.Value2.C1"/>
        <FIELDref ref="dej2000" utype="stc:AstroCoords.Position2D.Value2.C2"/>
      </GROUP>
[...]

-- i.e., just RA and DEC of one coordinate system.  Now add the
derivatives:

$ curl -s -FLANG=ADQL -FREQUEST=doQuery -FQUERY="select top 1 raj2000, dej2000, pmra, pmde from arihip.main" http://dc.g-vo.org/tap/sync | xmlstarlet fo

[...]
      <GROUP utype="stc:CatalogEntryLocation">
        [... same as above]
        <FIELDref ref="raj2000" utype="stc:AstroCoords.Position2D.Value2.C1"/>
        <FIELDref ref="dej2000" utype="stc:AstroCoords.Position2D.Value2.C2"/>
        <FIELDref ref="pmra" utype="stc:AstroCoords.Velocity2D.Value2.C1"/>
        <FIELDref ref="pmde" utype="stc:AstroCoords.Velocity2D.Value2.C2"/>
      </GROUP>
[...]

And now have a partial second coordinate system:

$ curl -s -FLANG=ADQL -FREQUEST=doQuery -FQUERY="select top 1 raj2000, dej2000, pmra, pmde, raLTP, pmraLTP, err_pmraLTP, err_deLTP from arihip.main" http://dc.g-vo.org/tap/sync | xmlstarlet fo

[...]
      <GROUP utype="stc:CatalogEntryLocation">
        [... system info...]
        <FIELDref ref="raj2000" utype="stc:AstroCoords.Position2D.Value2.C1"/>
        <FIELDref ref="dej2000" utype="stc:AstroCoords.Position2D.Value2.C2"/>
        <FIELDref ref="pmra" utype="stc:AstroCoords.Velocity2D.Value2.C1"/>
        <FIELDref ref="pmde" utype="stc:AstroCoords.Velocity2D.Value2.C2"/>
      </GROUP>
      <GROUP utype="stc:CatalogEntryLocation">
        [... system info again...]
        <FIELDref ref="err_deltp" utype="stc:AstroCoords.Position2D.Error2.C2"/>
        <FIELDref ref="raltp" utype="stc:AstroCoords.Position2D.Value2.C1"/>
        <FIELDref ref="err_pmraltp" utype="stc:AstroCoords.Velocity2D.Error2.C1"/>
        <FIELDref ref="pmraltp" utype="stc:AstroCoords.Velocity2D.Value2.C1"/>
      </GROUP>
[...]

Trust me, we've thought hard in the Tiger Team how to communicate
something like this (and that's not rocket science so far, that's the
*absolute minimum* any useful formalism has to be able to express)
with plain utypes.  It just doesn't work unless you come up with
a really complicated utype language (on which there's no string
matching either).  And, as I cannot resist to throw in here,

$  python -c "import this" | egrep -i "complicated|simple"
Simple is better than complex.
Complex is better than complicated.

Conversely, the insanely long utypes in my examples make this stuff
hard to read for humans, and potentially error prone for computers.
Most of what's in them furthermore is repetetive, and while I don't
consider DRY a religion, such repetition tends to be a marker for
where you're not doing it right.  This comes from the author of the
STC-in-VOTable note, where this is coming from.

In the Tiger Team, I've fought hard to keep the hierarchical utypes
nevertheless, exactly because I figured everyone was already used to
them and doing away with them would be a political hazard.

I had to eventually give in -- flat strings and hierarchical
structure don't mix, and the choice beween using XML or an ad-hoc
grammar on flat strings to encode structure is really a no-brainer,
in particular if the flat strings cannot really  do grouping.  In
hindsight I have to say my fellow tigers were right in stubbornly
insisting to not cheat and to not pretend existing practice were
sane.  

Which means, with a bit of STC refactoring and the move to a standard
serialisation ("VO-DML"), the above groups will look *much* better
and make *much* more sense and won't be significantly -- if at all --
harder to handle in software either.


Finally, let me indulge on some vision-like material in conclusion:

What I, as implementor, would most like about VO-DML and a standard
serialisation: No more special code for each data model-like thing --
I implement DM-serialisation and deserialisation once in my VOTable
code, and for anything that may come my way, I at least immediately
have a nicely deserialized object in my hand that I can serialise
right away again without any ambiguities.  No more SDM this way, SSA
that way, STC still differently, Photometry slightly varying again,
not to speak of Datalink.  It'd just be one code doing what's
essentially the same thing for all of them in the first place.

Wouldn't that be great?  Well, I claim the Tiger Team has brought us
within reach of that.

<fanfare> <curtain slowly comes down> <fade out fanfare> <spot off>

Cheers,

        Markus


More information about the dm mailing list