Data Model Landscape

Francois Bonnarel bonnarel at alinda.u-strasbg.fr
Wed Jun 23 00:08:35 PDT 2004


This is a joint mail by Mireille Louys and F Bonnarel
People who are not familiar with Strasbourg have to be informed
that we share the same office, and that it is difficult for one
of us to emit an idea in the DM field which was not discussed with 
the other one!

Where are we in The current IVOA data model landscape?
In this short text we want to address three questions
    A) What are the different DM components of our data model efforts?
    B) Where are we for each of these components?
    C) Can we use them together and how?

A) The most basic DM components are probably STC and the data containers
(by container we mean VOTable and Quantity)
   
    At a higher level we have "Observation" which mainly consists of 
Characterization and Provenance with a couple of other smaller components.

    The Resource metadata model (presently implicit) is at a higer level than
Observation in some sense but describes metadata in a coarser way.

    Is there also room for  an implicit DM component for Query datamodel?

    We also have more dedicated (specialized) data models: The Spectrum one, and
the Radio one.

    On top of all this, we can consider the domain datamodel proposed by
 Gerard Lemson and Pat Dowler.

     Martin (Hill) asked for small data models. We considered the point, but we 
think if we really split Observation in Characterization and Provenance (as it 
was suggested by Jonathan), what we will get is a set of "small" datamodels 
almost in his sense. We do not agree with Martin about limiting the scope to 
the only bandpass.
IVOA need at least a full Characterization model we think. That encompasses 
Images, Data Cubes, Spectra, etc ...
    

B)Status  
    a) STC has an UML description and an XML serialization. Probably the most 
achieved.

    b) Quantity has an XML serialization, but no real  UML description
     It is both a model for high level descriptions of datasets and a
transport container for these one. 
     At a lower level (Basic Quantity)it can be seen as a competitor to VOtable 
for catalogues.
     VOTable has an XML serialization (and a lot of applications) but no UML 
description.

     Basic Quantity/Frame  and VOTABLE Param/Field are very similar. They 
contain the same attributes or elements (value, unit, ucd, datatype, etc...)
    Quantity allow more datatypes than VOtable but this is not  a conceptual
difference.
     We can see The whole Quantity DM and VOTable as two alternative paths
 starting from the same basic component : towards table(relational paradigm) 
for VOTable or towards higher level classes (by aggregation) in Quantity.


    c) "Observation" has  description, UML diagrams, no XML serialization,
Even attributes are missing. Two main DM components: Characterization and 
Provenance:
     We will probably split Observation in two efforts, starting by a
completion of Characterization (as Jonathan proposed, because it's
more urgent for use by DAL WG).
     We would just like to reinforce the necessity of the "bounds" class
described in the document but not in the UML diagram, as a way to delimit
coarsely the relevant region, in between the too vague location and the 
fully detailed support (STC described).
      Some other concepts like Curator or Data Collection are probably common 
with Resource Metadata (see d) and some Missing Practical stuff (data format, 
compression, Packaging of missing Science and calibration data) have to be 
formalized : see the mail posted by one of us (FB) to the list two weeks ago.
 

    d) Registry VO resource Metadata  should be  based on a data model. 
      We have to make explicit the underlying model found in the IVOA 
recommandation. We need Coverage, we need  Dataformats, we need Provenance, 
we need Curator and DataCollecton for that too, like in the Observation case
(supposed to be the model of a "single" observation.)
     Actually VO resource and Observation should have a lot of common
concepts that should be cleraly enhanced.



     e) dedicated data models: spectra and radio ...  
The Spectrum DM  is more or less achieved  and will be implemented soon in SSA.
The radio one has been more or less mapped to Observation, but it needs
specific extensions. 

      f) The domain model has a clean UML description, but no direct use case
in the other Working groups (see more on that one in C). Among other things
it offers a framework to start modeling simulated data and make it interact
properly with "Observation".



C) What are the possible relationships between all these datamodel components? 

In the case where we try to use low level DM components in high level ones (eg. , 
STC or BasicQuantity in Characterization) we can probably use the low level 
DM component as an attribute type in the high level one, as was proposed if we 
remember well by Gerard Lemson . For example the 
Characterization.coverage.[].support is of type "STC.region", 
while a Characterization.coverage.[].bounds.? is of type "BasicQuantity".  

In the case where the considered DM component needs to be completed by another 
one and they are consistent we can probably derive (inherit) a class in that 
DM component from classes in the another one. For example, if we use 
" StandardQuantity" as a real container for transferring Observation data, the 
"Metadata" class may inherit from the whole "Observation" class. 
   On the other side if we want to describe the logical organisation of the 
dataset (is it an array , a cube; what are the axes, etc ...) the actual Data 
Set class in "Observation' can be inherited from "StandardQuantity" (or more 
exactly standard "frame").


   When the DM components are less consistent, generally because they have been 
done at different time of the IVOA development or by Working group with 
different concerns, we probabably have to flag attributes and classes in one 
component by "DM_hyperlinks, ExtendedUtypes or ExtendedXpath" from the other 
one.
   It can be the case between the Spectra datamodel and the IVOA observation 
data model , but probably also between the implicit "VO resource Metadata" 
data model and "Characterization" or "Provenance"

   Eventually a specific component could also be mapped on a more general one 
using the "view" mechanism (implemented in xml xith xslt?) described by Gerard 
Lemson for the case of Observation data model and Domain data model. This is to
be kept in mind although we do not see presently how it could work in practice.
In the case of the domain model this gives a broader view very suitable to
generalize DM or at least organize and scope the efforts.

   All these show that we have many ways to make our various DM components 
"collaborate" or interact. That's a good reason to develop further  the 
DM components  we allready have, without waiting a speculative 
"grand-unification".

   We think, as Jonathan proposed,  the most important thing is to have 
achievemnt on Characterization because we need it for SIAP and also probably 
for modelling Resource Metadata and Queries.  The natural second step is 
probably to make explicit the implicit data models in these fields (provide 
textual/UML/ serialization for each of them).

   Not too far from that we should consider "Provenance". We have use cases 
for that.  (Some of them are in common with a discussion started in DAL on 
SIAP extensions)

   
   Mireille Louys and François Bonnarel (ULP and CDS) 

=====================================================================
Francois   Bonnarel               Observatoire Astronomique de Strasbourg
CDS (Centre de donnees          11, rue de l'Universite
astronomiques de Strasbourg)    F--67000 Strasbourg (France)

Tel: +33-(0)3 90 24 24 11       WWW: http://cdsweb.u-strasbg.fr/people/fb.html
Fax: +33-(0)3 90 24 24 25       E-mail: bonnarel at astro.u-strasbg.fr
---------------------------------------------------------------------



More information about the dm mailing list