Extending data models: the vo-dml view

Fri Nov 8 05:20:46 PST 2013

Dear All.
I missed the discussions in Hawaii regarding various data model issues and
have not been involved in the thick of the discussions regarding ImageDM.
But I wanted to share with you some work I did, after some prodding by
Omar/Markus, as reaction to Ray Plante's question on what it means,
*technically*, for "ImageDM to extend ObsCore" and to the discussion that
followed that.

When approaching this question from the point of view of VO-DML [1], the
answer is very simple and already hinted at by Omar in his reply.
First, the VO-DML version of ImageDM (i.e. the VO-DML/XML doc, which imho
should be the fundamental representation of any IVOA data model) should
'import' the ObsCore data model. VO-DML defines an 'import' concept for
this, that identifies the VO-DML/XML representation of the target model. 
Second, types in ImageDM may *use* types from ObsCore in their definition. 
This usage may come in two forms. A type in ImageDM may 'extend' a type in
ObsCore, in the sense of usual object-oriented inheritance. Or a type from
ObsCore may be assigned as the 'datatype' of a 'role' (Attribute or
Reference; Collection better not!) defined on a StructuredType (i.e
ObjectType or DataType) in ObsCore. In VO-DML one uses the utype (i.e.
model-name ':' vodml-id, [2]) of the target type to assign the datatype to a
role or the supertype in an extends definition.

Precisely *which* types should be defined in ImageDM and *which* types they
should use from ObsCore and *how* is to be decided in the data modeling
effort itself of course, and I have not attempted that (Mark has a draft). 
But the technical issue should NOT be a problem.

The work I *did* do is to provide a first attempt at a proper VO-DML/XML
representation of ObsCore.
This work "merely" involves translating the existing ObsCore model [3] into
VO-DML/XML.
And sufficient issues show up in that effort to warrant further discussion
(and work!) and could keep us busy for a while as well. 

To facilitate my work I started with my personal favorite UML modeling tool
(MagicDraw CE 12.1), and used an XSLT script to translate the XMI format in
which that tool stores models into VO-DML/XML.
Just to repeat myself from many presentations at IVOA: This approach speeds
up creation of the model once one has the XSLT script. It also automatically
produces a graphical representation of the model, which is more insightful
than an XML file can be.
But the tool is not required. One can always create the VO-DML/XML by hand
(not much different from writing XML schemas). 
A validation tool exists on volute (not particularly well documented yet).
The XSLT script I use works for the XMI version 2.1 representation of UML
2.1 models created by MD CE 12.1. 
Similar scripts can be written for other version of UML/XMI produced by
other modeling tools.
Laurent and I once created a first version for UML 2.1.1+XMI 2.1 produced by
a tool Mireille was using, and I had a version for the UML 2.2+XMI 2.1
produced by MagicDraw CE 16.5. Note, those scripts are NOT up-to-date for
producing VO-DML.

As basis of the translation of ObsCore I used the UML diagram in Fig.2 of
[3]. Some choices had to be made as to the precise interpretation of the
boxes and lines in that diagram as VO-DML concepts. This should clearly be
discussed further. I also tried using the list of utypes in the ObsTAP table
as inspiration for the vodml-id identifiers of the model elements. I.e.
those do NOT depend solely on generation from the model. 
But the big issue, related in an obvious manner to Ray's question, is that
ObsCore uses (extends) the Characterization data model. It does so by the
link between the char:Characterization type and the obs:Observation type
(Note, after discussion with Omar I chose 'obs' as the name for the ObsCore
model that is to be used as the prefix in utypes. This is NOT following the
ObsCore spec, which uses 'obscore' as written in the caption of its Table 7.
This can easily be changed of course). 
The consequence of this for the current formal approach is that we need a
VO-DML representation of the Characterization data model. Luckily (for me) I
had created a first draft of such a representation (based on the CharDM XSD)
during the tiger team days.
As is well known of course, CharDM uses STC, and I even bit the bullet and
made a VO-DML representation of that model (using its XML schema as main
source).

I do not claim those translations are satisfactory. In fact I (and others)
have urged in the past that the original modelers MUST get involved in this
effort. But now there are three models, in UML *and* VO-MDL/XML form,
representing STC, Characterisation and ObsCore.
They are gathered in the obvious folders in the dm/vo-dml/models area in
volute:
https://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/models.

The VO-DML files are
https://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/models/stc/STC.vo
-dml.xml
https://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/models/characteri
zation/Characterization.vo-dml.xml
https://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/models/obscore/Ob
sCore.vo-dml.xml

These are generated form the following (MD 12.1 readable)  XMI files:
https://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/models/stc/STC.xm
l
https://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/models/characteri
zation/Characterization.xml
https://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/models/obscore/Ob
sCore_MD12_1.xml

HTML files are produced that link between the models where necessary, they
ar in: 
https://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/models/stc/STC.ht
ml
https://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/models/characteri
zation/Characterization.html
https://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/models/obscore/Ob
sCore.html

I would appreciate discussions on the translation and I would hope that this
discussion gets some moderation from the DM chairs and participation of the
"owners" of the original models.

I have actually performed one more exercise, for I was interested to see how
the new VO-DML models could be used with the proposed new UTYPE annotation
mechanism. To this end I created a VOTable that has a <TABLE> (without
<DATA>!) representing (through its list of <FIELD>s) the ObsTAP table as
defined in Appendix B (may not be offical ObsTAP table, which I believe is
defined in Table 7 of ObsCore spec, but has clear overlap). This <TABLE> is
annotated with <GROUP>s representing, through their utype-s, elements from
the ObsCore/VO-DML data model. 
The result can be found in
https://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/models/obscore/Ob
sTap.votable.xml
The annotation is not complete, but is a valid UTYPE annotation (NB
according to utype annotation validator I wrote during utypes tiger team and
updated recently). 

What one will notice immediately is that in particular the stc: concepts in
the table require a deep hierarchy for their full annotation. This is NOT an
unavoidable consequence of VO-DML, or of the UTYPE spec.
It is simply a consequence of the structure of the models themselves and can
already be observed in pure XML serialisations (i.e. following the XML
schemas) of say a characterisation "axis" (I can show examples if so
desired).
Moreover the models have quite some redundancy that is hidden in the ObsTAP
table because it removes most of the available model elements. See for
example my comments in the VOTable on the number of units one could use in
the hierarchy towards the s_ra column. 

I had hoped that the ImageDM effort would follow an approach similar to the
one described above: 
create a VO-DML/XML model that if desired imports ObsCore.vo-dml.xml (and
maybe others) and uses some of its types. This will require translating the
models ObsCore depends on, for which a first stab is available in volute. 
But in VO-DML we now have a simple, formal language in which to express
these models all in the same way and with a formal way to "extend"/link
them. 
I personally think that the other models on which ImageDM will depend
directly or indirectly (see how STC concepts show up all over the ObsTAP
table) must be revisited and likely restructured, possibly following
normalisation approaches one can inherit from relational modeling.
But in my opinion this is a good thing and long overdue.

Cheers
Gerard

[1] current draft of VO-DML spec can be found at
https://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/doc/VO-DML-WD-v1.
0.docx/.pdf
[2] current draft of UTYPE spec can be found at .
https://volute.googlecode.com/svn/trunk/projects/dm/vo-dml/doc/UTYPEs-WD-v1.
0.docx/.pdf
[3] ObsCore spec can be found at
http://www.ivoa.net/documents/ObsCore/index.html