[CATALOGUE]Starting Data Model Subgroup

Tue Jul 27 04:18:45 PDT 2004

> > VOTable is the XML serialization of whatever it
> is that the Catalogue DM group come up with, so isn't this more a case of
> reverse engineering?
>
> No - the other working groups have made it clear they have requirements
> for a standard model for astronomical source catalogs. The VOTable
> is a serialization of a simple table, but an astronomical catalog
> is more than a table - it will have extra standard metadata linking
> the sources to their parent observations and extraction algorithms,
> for instance. One output of the Catalogue data model effort will
> certainly be a more formal statement of the VOTable model, but another
> output will be recommendations for ways to serialize this extra metadata
> in VOTable (particular PARAM and FIELD values for certain things, for
> example). And yet another output will be an XML schema for those who
> prefer to use generic XML, although in the particular case of catalogs
> I hope that a VOTable-based serialization will be the preferred
> approach. But just saying "write a VOTable" is not a sufficient spec.
> I hope the CDS folks can say a little about how a Vizier README is
> converted to VOTable, and others can comment on how pipeline-generated
> catalogs should be recorded, and what extra metadata (wavelet scales,
> data characterization like wavelength band, etc) are appropriate.
>

I fully agree with Jonathan here, but would like to add some comments.

I think one of the things that is often not realized is the fact that
the DM WG's needs to provide models for the meta-data describing the
contents of some data product, as well as models for the data themselves.
For example, the Observation model is mainly a model for the meta-data
describing
the results of an observation. This is more than describing how the data is
stored
and/or formatted. The latter may be done using the Quantity model, I guess.

Secondly, it still seems that people confuse the act of defining a datamodel
with
that of defining representations/serializations of the data model applicable
to
a particular runtime environment within which one wants to deal with
instances
of the datamodel, be that messaging (XML), Java virtual machine or
relational database.
Defining such serializations is, or should be part of the DM WG's tasks.

In the data modeling effort it *is* extremely useful to look at existing
data models,
even if only implicitly represented in particular serializations, if only to
see
which concepts, entities, attributes and relationships others have thought
of already
and should therefore probably be incorporated into the IVOA data model.
One can however not insist in advance that the data model itself should be
tied to
some existing representation, as this may be unsuitable for representations
that must
work in a different environment.

Even when we interpret some of the comments in the context of the definition
of a
serialization I think we should not predefine *how* exactly to use the
results of existing efforts.
For example I see no a-priori reason why we should follow Roy's suggestion
to "use inheritance".
Inheritance is only one way in which the results of the VOTable/conesearch
can be reused.
Data modeling languages allow many different types of relations between
entities
and in fact inheritance is the one most often abused.

Cheers

Gerard