VO-DML specification document

Laurino, Omar olaurino at cfa.harvard.edu
Fri May 9 04:16:40 PDT 2014


Hi Francois,

from your email it is not clear whether you read the mapping document or
not.

I am glad you don’t have any technical objections to the VODML
specification that was the subject of this thread. As mentioned by Gerard
and in the document itself, the introduction needs to be substantially
reviewed, so we’ll take your editorial suggestions into account to further
improve those introductory sections.

As I said, it is hard to reply to your comments on the mapping document
since it is not clear whether you read it or not. From what you write it
appears you didn’t, because that document, along with the public Tiger Team
minutes, the Current Usages Note, and the requirements we collected in
Urbana and Sao Paulo with public discussions, contains counterarguments to
your arguments.

In particular, there are now a number of reference implementations and
prototypes that show that the mapping document:
 - simplifies the interpretation and interoperability of serializations and
applications. You now have a means to allow people to publish to the VO
without having to read the VO standard documents!
 - is backward compatible, allowing *all* of the current usages to be still
valid in a transitional period or for implementing local requirements.
 - meets the requirements it was supposed to meet.

To be perfectly sure you grasp the meaning of the second point: CDS
services and Aladin can keep using the UTYPEs they are using now. The Tiger
Team’s specification is perfectly fine with that. If you want to add one
more level of interoperability, we can provide a point-and-click graphical
user interface that produces the metadata section you need to add to your
service responses. You can make this extra section a mere pointer to the
existing PARAMs and FIELDs, and you are all set. You mentioned that most of
your tables do not have many annotations, so you can probably automate the
process very easily.

Your suggestion to keep the current lack of standardization for UTYPEs
means to throw away two years of work under the Exec mandate and go back to
the pre-Tiger Team state. Why would the Exec ask a Tiger Team to find a
solution if there was no problem to solve?

You also seem to forget that this work was started in order to overcome
some issues in some of the work for the IVOA science priorities. In
particular, there is a general lack of interoperability when building SEDs,
with applications required to ask the user to “import” data coming from the
VO into VO applications. We have been stuck with these issues for three
years, and the solution itself is stuck because you don’t seem to take this
into account in your objections. Now SED is not a priority anymore, but
this problem is coming back with Cubes and Time Series.

Some of your arguments actually support the Tiger Team’s proposal of
matching the local schemata to a global, implementation-independent one. As
Markus has shown you can indeed apply this to TAP, and it’s not rocket
science. So I am not going to argue with those points.

A more general comment about some of your more general points:

“Simple String Matching”. You claim that with the Tiger Team's proposal you
cannot just compare a single string to get the information you need.
This is a very old point that we discussed before, but I am willing to give
one more try in explaining why it’s wrong.
The use of standard libraries is only one of the counterarguments, but
there are plenty of others that we have made in the past and that you are
ignoring in reviving this point.

In particular, you seem to forget that Simple String Matching is something
you cannot do even now with the current standards and the lack of a
standard for UTYPEs.

The most trivial example is that you cannot parse a VOTable by matching
one, static string.

Also, the PhotDM standard requires GROUPs and FIELDrefs, as the spec points
to a Note by Sebastien Derriere after a EuroVO ICE meeting in 2010 in which
the use of GROUPs and FIELDrefs is deemed beneficial and necessary. The
Tiger Team’s proposal is basically generalizing and standardizing that note
and the PhotDM serialization strategies so that all serialization and
models are interoperable through a single specification.

Refer to Sebastien’s presentation in Naples 2011 for the features and the
benefits of this approach.

Admittedly, in order to generalize his note and PhotDM, and include other
Notes and standards (Markus’ note about STC, SimDM) under a single
framework along with all other models, we needed to leverage a standard
VOTable feature: nested GROUPs. Technically, this is far from being a
revolution, but the result is powerful and fixes a number of issues we have
been stuck with for years.

For a real-case example of this I will use one of the production
implementations at CDS.
Consider this snippet from one of your Vizier production services:

<GROUP ID="gsed" name="_sed" ucd="phot" utype="spec:PhotometryPoint">
      <DESCRIPTION>The SED group is made of 4 columns: mean frequency,
flux, flux error, and filter designation</DESCRIPTION>
      <FIELDref ref="sed_freq"
utype="photdm:PhotometryFilter.SpectralAxis.Coverage.Location.Value"/>
      <FIELDref ref="sed_flux" utype="spec:PhotometryPoint"/>
      <FIELDref ref="sed_eflux" utype="spec:PhotometryPointError"/>
      <FIELDref ref="sed_filter"
utype="photdm:PhotometryFilter.identifier"/>
 </GROUP>
[…]
<FIELD ID="sed_freq" name="_sed_freq" ucd="em.freq" unit="GHz"
datatype="double" width="10" precision="E6">
<FIELD ID="sed_flux" name="_sed_flux" ucd="phot.flux.density" unit="Jy"
datatype="float" width="9" precision="E3">
<FIELD ID="sed_eflux" name="_sed_eflux" ucd="stat.error;phot.flux.density"
unit="Jy" datatype="float" width="8" precision="E2">
<FIELD ID="sed_filter" name="_sed_filter" ucd="meta.id;instr.filter"
unit="" datatype="char" width="32" arraysize="32*”>

[I omitted the descriptions for clarity]

I will assume that the “spec:” UTYPEs were defined in some standard. [They
are not as far as I know, but it’s hard to tell because there are several
thousands UTYPEs defined in many documents and their versions, and there is
no mechanism to know what “spec:” is pointing to. These are all issues
fixed by VODML and the mapping strategy we suggested, by the way]

How can you get to the FIELD with a single string match? You can’t, you
need to match one string, parse the parent element according to the VOTable
spec (more conditional string matching, if you prefer), find the “ref”
attribute, match one more string and find the FIELD. Yet, you haven’t
accomplished much because you need to know much more information in order
to get to the data. In general, I don't think this Turing Machine approach
to VOTable is useful, and it's certainly not robust.

You can also refer to the Current Usages document and find that
applications already need to workaround the old UTYPEs, parse them, make
assumptions on them, because the simple string matching is utterly naive
and, in the real life, it just doesn't work.

This statement, in particular:

> For many applications the proposed mechanism will make the recognition of
> model attributes associated with FIELDS in our table a much more complex
> and heavy process than the current one. Instead of simple string matching
> recognition it will require development of a complex hierarchical structure
> which has to be fully created and filled from VOTABLE parsing and explored
> for recognition. The objection to my pôint is that standard libraries could
> do it for the developper, but application developer may want to avoid using
> this and it may also be unsufull (see c for details)


Apart from being wrong, as demonstrated by the implementations and by the
rich literature on the Tiger Team proposal, since they show that the
process is actually simplified and standardized, this statement questions
VOTable itself.

In fact, VOTable is "a complex hierarchical structure which has to be fully
created and filled from VOTABLE parsing and explored for recognition”. If a
developer doesn’t want to use standard libraries they are free to knock
themselves out and reimplement a VOTable parser, to do which they need to
read the specs. The same applies to FITS, to XML, to JSON, etc. So this
applies also to VODML. I don’t see the problem.

Cheers,

Omar.



On Wed, May 7, 2014 at 12:38 PM, François Bonnarel <
francois.bonnarel at astro.unistra.fr> wrote:

>  hi all,
> Starting from the last TCG teleconf time and from that email from Gerard
> below  there has been  a lot of discussions around VO-DML those days and
> there are some aspects which give me some concerns:
>
>    -     I think many people are still mixing two aspects which were to
>    be separated in two different drafts according to the conclusions of our
>    controversal discussion held in Hawai as they were summurized by Jesus here:
>
>
> http://wiki.ivoa.net/internal/IVOA/PlenarySessionsSep2013/DM_Closing_Hawaii2013_JSalgado.pdf(slide 6 and 7)
>
> I see that a lot of work has been done for the update of the first draft
> "VO-DML: A Data Modeling Language for the VO" but nothing new has been done
> about the second draft "Mapping of Complex Data Models". The title of this
> second draft reflected the difficulties appearing in using vo-dml
> description to map the models into VOTable, making an extensive but
> seriously modified usage of the utype attribute. However,  I can read
> several sentences which show that for many people nothing has changed since
> the time where the introduction of the draft VO-DML draft was first written.
>    Here is a quotation of the abstract of the vo-dml document "VO-DML a
> consistent modelling language or IVOA data models"
>
> "Arguably the most important use case for VO-DML is the UTYPE
> specification [2]
> which uses it to provide a translational semantics for VOTable annotations.
> These annotations allow one to explicitly describe how instances of types
> from a
> data model are stored in the VOTable."
>
>   and all the introduction still emphasize that the main use case for
> VO-DML modelling language is utype specification in VOTABLE. It also
> Implies that the VO-DML GROUP mechanism is THE (unique) way to do the
> mapping.
>
>     Last but not least I see nothing like a ""Mapping of Complex Data
> Models" document in the repository.
>
>     I think this is not the spirit of the decision taken in  HawaI. The
> title chosen in Hawai, reflecting the discussion held there was unambiguous
> in pre-deciding no peculiar solution.
>
>    -      The mechanism proposed so far to map the models into VOTABLE
>    present several severe issues which I would like to develop.
>
>      It must be clear to everybody that apart from this, most of the
> effort done under building a consistent modelling language for IVOA looks
> very promising to me. Having a description language with xml serialization
> alllows to share diagrams and models built with different modelling
> softwares and allows to help generating interoperable documentation and
> code. This is a real progress and I appraciate the effort done By Mark (and
> now Arnold) to map various models in the work, done by Gerard, Omar and
> others. For me this is core of "VO-DML a consistent modelling language for
> IVOA data models" and this is a progress. Probably we have things to
> discuss still (the utype attribute stuff and the ivoa datatypes among
> others) but I see no objection in going forward along this path towards
> recommendation
>
>
>    -       So let's talk about "Mapping of Complex data models"
>
>       I see three severe issues in adopting the mapping of VO-DML
> structures to VOTABLE
>
>
>    1.        For many applications the proposed mechanism will make the
>    recognition of model attributes associated with FIELDS in our table a much
>    more complex and heavy process than the current one. Instead of simple
>    string matching recognition it will require development of a complex
>    hierarchical structure which has to be fully created and filled from
>    VOTABLE parsing and explored for recognition. The objection to my pôint is
>    that standard libraries could do it for the developper, but application
>    developer may want to avoid using this and it may also be unsufull (see c
>    for details)
>    2. For probably more than 90% of tables exchanged in the VO the
>    application of this mapping seem to be simply impossible (or at least
>    awkward).
>     -  A large majority of the huge number of columns available in the VO
>       (those of the catalogs) are not associated with a model attribute. Probably
>       many can have one. It has started with PhotDM ones for SED bulding and lot
>       can be associated with STC or others. But as long as we add models the
>       number of VO-DML GROUPS will increase for very partial matching
>     - Astronomical catalogs are (or will be) distributed with TAP. Tap
>       provides TABLES where the number of columns is variable, dependant of the
>       Actual ADL querry sent. This will  imply that either the VO-DML-groups are
>       also dependant of the QUERY (and not unique for a service implementong a
>       model) OR (alternativly) that the VO-DML GROUPS contain some empty (or
>       absent) FIELDS.
>    3. In many actual and current VO use cases  it is more or less
>    useless.
>
>     -   Why ? IT is well admitted that IVOA datamodels are not (in
>       general) internal datamodels of servers and archives. IT is a model for
>       interaction of the archive with the outside world.
>       - What is the situation for applications (desktop client
>       applications I mean)? I think the assomption  of VO-DML to VOTABLE mapping
>       is that application will contain a full implementation of the IVOA data
>       model (this can probably be done by preparing the IVOA model classes when
>       creating/modifying the application code or by creating them dynamically
>       when reading the VOTABLE). Then the parsing of the VOTABLE allows to
>       populate the objects with  values contained in the columns and raws of the
>       VOTABLE. So everything in the IVOA model is mapped one to one to the
>       application model
>       - I don't say that this cannot be used and is not usefull  in some
>       cases. I say it's not usefull in general. Because as allready discussed in
>       the past applications use model attributes (known through current-style
>       utypes) as roles to know what to do with the content of the column.
>       - Let me now try to formalize this a little bit. Let's call it
>       "current life model mapping mechanism". I am proposing a formalization.
>       that means everything SEEMS to work like if it was built like this although
>       I know it is actually NOT TRUE and is more dispersed in the code)
>
>             - Application has its own model (its classes and methods,
> let's say in java)
>             - There is the IVOA model described somewhere. Each attribute
> and class has its fully qualified name (a.b.c.d ...). This is the
> current-style utype.
>              - The application implements a VOTABLE parser. When it runs
> and read a given VOTABLE the current-style utypes are recognized. An action
> is driven by this recognition which is basically to either populate the
> objects of the Application model with values taken frome the VOTABLE cells
> or to launch Application model methods implied by the occurence of this
> IVOA DM attribute. This also a kind of mapping but it is not one to one and
> maybe incomplete. Several (incomplete) IVOA data models can be used in the
> same VOTABLE document.                             I think all the current
> VO applications work like this and I assume that in the future a majority
> of application developpers would like to work like this again. For example
> that would be the case of developpers maintaining existing software and
> eager to connect their application nearly "as is" to the VO world. I don't
> say that nobody would like to use direct implementation of IVOA datamodels
> in their applications but I claim that not all of them will want to do it
> or need to do it.
>
>     ---------------------------------------------
>
> As a matter of conclusion of this mail, I would say I wrote all this to
> (re)open the discussion on the second slot of Jesus' Hawai summary:
> "Mapping of Complex data models"
>     I see an alternative to the so-far-VO-DML proposed mapping mechanism :
>
>    - Let's keep the utypes as java-like fully qualified names (a.b.c.d)
>    more ore less as it is now.
>    - This allows existing and future applications to work according to
>    my little "current life model mapping mechanism"
>    - This doesn't forbid to populate an application model structure
>    exactly mapping the IVOA model. The fully qualified name can be decomposed
>    to find out in which class  and which member the FIELD is associated to. So
>    WE POINT FROM the FIELDS to the model and NOT the REVERSE way.
>    - Transporting the structure of the model in the VOTABLE will not be
>    forbidden (mechanism probably rather similar to the one proposed so far,
>    utype usage  apart), but will not be mandatory and in many cases not
>    usefull or not possible (eg TAP)
>
> Best regards
> François
> Le 24/04/2014 14:31, Gerard Lemson a écrit :
>
> Dear data modelers
>
> After urging from the DM chairs, I would like to direct your attention to
> the VO-DML page on the IVOA wiki:http://wiki.ivoa.net/twiki/bin/view/IVOA/VODML
>
> There you will find links to the VO-DML specification document and the
> associated technical xml schema  and schematron files.
> Notice that the document is one of the three documents that came from the
> UTYPEs Tiger Team. This one, in particular, describes how to express Data
> Models in a standard, machine-readable way.
> Please read the comment at the start of the spec file for info which parts
> are not quite done (mainly some paragraphs in intro have to still be added).
>
> The core part can be commented on freely.
>
> The wiki page needs some updating with links to, and descriptions of, some
> reference implementations.
> Much code has been available since Heidelberg (and before) as part of the
> prototyping effort mandated after that Interop. Stable code includes a bunch
> of XSLT scripts for validatioon, java code generation, and hypertext
> documentation generation which includes DM figures and cross-references
> between elements. For two UML tools (Magic Draw CE 12..1 and Modelio) there
> are also scripts available to generate VO-DML documents from a properly
> designed UML representation. Pointers to the code and to the documentation
> is or will be available on the aforementioned wiki pageasap.
>
> Implementations related to the mapping of VO-DML to VOTable, like the
> "VO-DML Mapper" (http://gavo.mpa-garching.mpg.de/dev/vodml-mapper/) created
> for helping users and data providers through a point-and-click interface and
> the photometry service prototype presented in Hawaii are not included, since
> they target a different document. But the vo-dml mapper in particular shows
> how one can make use of the machine readable DM documents at runtime and
> might be seen as another proof of concept implementation also for this
> specification.
>
> We thank those that, in the past year, have sent comments to the editors
> directly, but we would urge members to address comments directly to the dm
> mailing list.
>
> Best regards
>
> Gerard Lemson
>
>
>


-- 
Omar Laurino
Smithsonian Astrophysical Observatory
Harvard-Smithsonian Center for Astrophysics
100 Acorn Park Dr. R-377 MS-81
02140 Cambridge, MA
(617) 495-7227
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ivoa.net/pipermail/dm/attachments/20140509/8670dd9d/attachment-0001.html>


More information about the dm mailing list