data modeling issues

Thu Feb 28 09:19:36 PST 2008

Hi
Below a longish list of issues that we have been discussing.
I'd like to send these off to the dm and theory mailing lists.
Maybe not complete, particularly I have no more time to spend on discussing
the issue
of "level of normalisation" in the SNAP model at the end.
Maybe one of you can fill that in.

Please your comments on whether these represent a fair summary of what we
have discussed and please propose additions or changes.
Also, can we send this email in one go, or is it too long?

Thanks

Gerard

Dear colleagues

In the theory group we have been discussing the SNAP data model in a small
group and
come across a couple of "issues" that we want to present to the larger
community.
Some of these are aimed at theorists who hopefully will use the model in
some form,
others to the other data modellers in the DM working group that were not
involved in these
discussions. 
We would like to start discussions on these, though we may already make
some decisions to keep us going until these come to completion.

1. UML as normative data model representation
We had agreed already in the past (Victoria) that a UML model would be the
normative representation 
of the SNAP data model. It is also the agreed upon (Cambridge 2003) form
that DM WG data models should be
represented. We want to push this so far that the UML should be complete, in
the sens taht all other 
required prodcust canbe derived of it. This includes amongst others:
- all elements must have descriptions
- all attributes must have datatypes and multiplicities (0..1 or 1) 
- more ... 

2. Subset of UML syntax 
To make the entrance to UML as simple as possible, and make the possible
modeling choices as
restricted as allowable, we want to settle on a subset of the UML modeling
elements. UML2 allows
one to define a so called profile, which formalises these choices and can be
used by toools 
(such as MagicDraw) to adjust the environment. Such a profile can include
stereotypes for detailing
certain syntax types further, standard tags to be added to elements for
application or other purposes
or a standard set of primitive datatypes for example to be used by us all.

3. XMI as standard serialisation of the UML document
We propose to use the XMI (XML Metadata Interchange,
http://www.omg.org/technology/documents/formal/xmi.htm) serialization of UML
as the standard representation of our UML data models. So far we have been
using the community version of MagicDraw (14.0) and the XMI it generates, so
that we can all actually work on the diagrams. It is doubtfull whether other
tools will be able to use these documents directly, even though that is what
XMI's intended goal was.

4. Standardized and automated mapping from UML to XML schema.
In Cambridge (2003) we (the DM WG) decided that at least also an XML schema
should accompany the UML diagram as the product of a data modeling effort in
the IVOA. Use is obvious (?). We propose that such a schema should be
generated from the UML automatically. 
This requires a set of mapping rules from the proposed subset of UML to XML
schema.
This set of rules can be implemented in an XSLT script that, working on the
XMI representation, can generate
appropriate XSD files. This may be generalised to other representations such
as relational model, Java classes etc. 
There are some open issues with this, in particular how to map shared
associatiosn/references. We can bring these up during the discussion.
It implicitly will also imply a selection of what style of XML schema we
write. 
The Registry and VOTable groups have settled on a suggestion for XML schema
style that was originally derived
from precisely such a mapping. On the other hand STC's schema for example is
deviating from that style.
It was not precisely derived from a UML model either ofcourse.

5. Repository for storing these results
Currently we store SNAP products such as XMI documents generated by
MagicDraw, and XML schemas ("generated by hand" from the UML) on the theory
SNAP DM wiki pages. There has been some discussion of moving this to a
proper source repository. We have started using one provided by google and
already used by Norman Gray in his semantics project. There was some
discussion on respositories independent from the IVOS during the TCG telecon
the other day, but I missed the conclusions.

The above issues were mainly aimed at the community at large (dm, registry,
ivoa). Now some issues with the SNAP data model itself.

6. Normalisation
...

7. Need for semantic vocabularies
...

8. more?

Best regards

Gerard Lemson for
Laurent Bourges, Norman Gray, Rick Wagner