[utypes] Wild edit towards more principled definitions

Gerard Lemson lemson at MPA-Garching.MPG.DE
Fri May 18 03:08:26 PDT 2012


Dear Omar 

 

My apologies for jumping into this utype discussion so late. 

Markus had pointed me to this thread on UTYPEs, which I had seen, but not
read in detail. 

 

I want to give some comments from both Laurent Bourges and me.

It's also a preview of points I will make in my presentation in the UTYPEs
session.

I CC the DM mailing list as I think it deserves a broader context.

 

Both Markus and you do seem to realize the need for a more abstract, common
language for describing data models. 

We agree with that. We also agree with you that Markus' proposal for "data
model as DAG" is too

limited to describe real data models. 

We disagree with you though that such a language must still be developed.  

 

Instead extensive work in this direction has been performed already.

You acknowledge "the theory guys" (i.e. us) have done work on this, but
misrepresent our approach.

(Btw, the girls who were leading the DM WG when we started working on this
have been made aware of our effort

>From the beginning and also Patrizia Manzato has contributed to the theory
work).

 

The work I refer to is VO-URP, which stands for ‘Virtual Observatory - UML
Representation Pipeline‘

and is a spin-off of the theory simulation data model effort.

VO-URP was developed to simplify data modeling efforts like the one we did
for simulations, 

but it is independent of any theory-related concepts.

First of all it defines what we call  the .vo-urp representation of a data
model.

This can be seen as a domain specific language for data modeling within the
VO context

It basically is an abstract data modeling language like the one both you and
Markus are looking for.

 

>From this representation, VO-URP automates the derivation of alternative
representations of the model.

Such representations are generally necessary to work with a model in
particular contexts.

Some of these accompany the SimDM specification, notably XML schemas and a
UTYPEs list 

(which are somehow mandatory for IVOA DM efforts).

The automation was very helpful during the SimDM effort as it greatly
simplified forwarding data model changes 

to the other representations. 

 

VO-URP was also extremely helpful in creating a reference implementation of
the model.

http://galformod.mpa-garching.mpg.de/dev/SimDM-browser/

is a web application implemented using generated Java class,

that allows one to browse a database designed according to the (automated) 

relational mapping of SimDM. This mapping is represented using a (generated)
TAP_SCHEMA and the web app allows

one to query the database with SQL (full TAP support can be added).

The webapp allows up- and downloading XML documents following the
(generated) XML schema representation of SimDM.

Documentation of the data model is available in the form of a (generated)
cross linked

HTML file that contains a full list of UTYPEs identifying each element in
the data model and

as an extra a clickable model diagram produced using GraphViz from generated
'dot' files.

 

VO-URP has been worked on mainly by Laurent and me, but with lots of
feedback and use by the “theory people” in France, San Diego and Italy. It
has its own GoogleCode project in http://vo-urp.googlecode.com.

VO-URP has been presented at many interops, either explicitly (starting in
Trieste, 2008), or as part of 

presentations on SimDM. It also has been explicitly mentioned in the SimDM
spec itself. 

 

All of this we thought would be of interest to the data modeling efforts in
the IVOA in general.

But though we have asked for contributions from the VO community, VO-URP has
mainly been ignored outside the SimDM effort.

Only the part of VO-URP dealing with UTYPEs, a BNF defining UTYPEs in terms
of data model constructs, 

has been used in Mireille's first drafts of the UTYPE document. That
document pulls it out of context,

so much that the core concepts underlying VO-URP were apparently lost on
subsequent editors.

 

We think that the VO-URP project contains most features required for a
domain specific data modeling language.

It contains proven, workable, implemented mappings to other useful
representations.

We produce UTYPEs, but think (with you and Markus and likely others) that a
simple grammar to produce a list of words

is not sufficient for a proper (re)use of data models in general.

And we'd love to finally be able to further develop VO-URP with those
concepts in the context of the IVOA DM WG.

 

Looking forward to discussing all of this with you all in Urbana

 

Best regards

 

Gerard Lemson

also for Laurent Bourges

 

 

From: utypes-bounces at ivoa.net [mailto:utypes-bounces at ivoa.net] On Behalf Of
Omar Laurino
Sent: 20 December 2011 16:27
To: Mireille Louys
Cc: utypes at ivoa.net
Subject: Re: [utypes] Wild edit towards more principled definitions

 

Hi Markus,

 

Again, thank you for moving this effort forward.

 

I like your formal approach very much, I think we need it. And I like your
efforts to abstract the data model description.

 

However, I have some concerns:

1. Your formalization (and the previous draft, to some extent), starts from
the XML schema to define the DM graph and then the Utypes. However, I think
that the XSD should be a final product of the Data Modeling effort. 

 

I think this for several reasons. First of all, you need to be sure that the
XSD is properly representing your Data Model, and this looks more like a
"trial and error" process to me [write the XML schema, see how the graph
looks like, then go back to the XSD and iterate]. In other words, I think
the Data Model comes first, and then you produce all the data products (you
need the Data Model at least in your mind to describe it using XSD, right?).
To be honest, I even think that the DM description document should be
written automatically as a stub, and then the human should simply add the
human relevant information. So, I would go the other way, turning over the
algorithm to go from the Graph to the XSD.

 

I think this was the approach of the Theory guys, by the way: they start
from the Data Model UML description (in XMI) and then use it to derive all
the "products" (XSD schema, utypes, documentation). I don't think this
approach can work for a standard specification, though, because it assumes
the use of tools (XMI compliant Data Modeling tools) that are hardly
interoperable. I still think the approach is great, but it is an approach
that *uses* a standard definition of Data Models and Utypes, not being the
standard itself. It can be considered a reference implementation that
validates the standard employing a particular set of tools, though. In other
terms, we still need an interoperable description for IVOA Data Models
(which is a subspace of the infinite data modeling space).

 

2. It seems to me that your model doesn't capture instantiation and
inheritance. In your sample XSD you define both TimeAxisType and
SpectralAxisType, even though SpectralAxis and TimeAxis are two instances of
the same class: your algorithm reflects this, if I am not mistaken. I am
sure you can complicate your algorithm to adapt, but this requires modeling
the Data Model itself, not its representation, so that you work in a "meta
Data Model" space. Also, in your schema there is no room for extensions,
which is quite a hot topic. 

 

3. In your description, you can drop the prefix without losing any
information. Also, let's say I have got two Data Models: they both employ a
SpectralAxis and a TimeAxis to describe the very same information. They will
have two different Utypes, which means that they point to different
concepts! Unless you ignore the prefix, which, again, makes it redundant.

 

So, I think your formalization works for building the Utype paths, but this
is only part of the deal. We need to define a meta space in which the Data
Models live, we need to provide mechanisms for allowing the clients to spot
instances of known classes in the datasets, and to allow data models to
correctly include and/or extend classes defined in other models (which is
what Data Models already do in a non standard nor formal way).

 

Another couple of minor concerns: we still don't include the DataTypes into
the picture. Again, I think this is due to the fact that the document
focuses on building the Utype path, so we need to take a step back and fit
the path in the bigger picture of Data Model instances serialization. Also,
there is still no room for versions. You make reference to it, but you seem
to offload the problem.

 

That said, I would like to send you guys the draft I have been working on
ASAP. I'm validating it by building a reference implementation in the form
of a java library, which helps a lot the design.

 

Here is, very briefly, the big picture: I am trying to address the three
main problems we have in order to make utypes *much more* useful (they are
useful already) and to match our requirements:

1) Data Model design specification (data modeling of the IVOA Data Model
space, reusable components in terms of inheritance and instantiation,
collections of objects, versioning).

 

2) Data Model description (utypes, automatic generation of code, etc.)

 

3) Standardization of abstract serialization strategies based on utypes. The
serialization must allow clients to spot the instances of the objects they
know and to build a tree representation of those they don't know.

 

The prototype library looks in pretty good shape. As soon as I've got
something useful to show I will send you my results.

 

Cheers,

 

Omar.

 

 

 

On Mon, Dec 19, 2011 at 11:34 AM, Omar Laurino
<olaurino at head.cfa.harvard.edu> wrote:

Hi Markus, All,

 

I wanted to come up with a draft last week, but I didn't make it. Also I
haven't had time to review your draft. I will work on the Utypes document
today and provide you with some feedback.

 

Thanks for your contribution!

 

Cheers,

 

Omar.

 

On Wed, Dec 14, 2011 at 11:52 AM, Mireille Louys <mireille.louys at unistra.fr>
wrote:

Hello, Markus, all,

Thanks for pushing this effort forward.
I 'll have a close look next week.
I currently have deep teaching commitments.

best wishes , Mireille

Markus Demleitner <msdemlei at ari.uni-heidelberg.de> a écrit :

Dear colleagues,

As those of you who followed my talk in Pune
(http://docs.g-vo.org/talks/2011-pune-utypes.pdf) may have feared (or
hoped; there were some approving gestures back at the talk), I've now
spent some quality time rewriting the chapters 2-4 for the utype
specification.

...



If, on the other hand, you think I'm mad and you'll never agree to
such extensive changes to the utypes document (or to such a
dumbing-down of data modelling, or whatever), *please* by all means
speak up now.  I'll not be cross.  Promised.  You'll be saving me a
lot of work.

Finally, of course, if you think only parts of what I've tried are
dead wrong, you're of course welcome to complain (or even fix), too.

Cheers,

         Markus


-- 
Mireille Louys, assistant professor at  UDS: ENSPS, Laboratoire ICube et CDS
Observatoire de Strasbourg
mail to: mireille.louys at unistra.fr
Tel: +33 3 68 85 24 34 <tel:%2B33%203%2068%2085%2024%2034> 
Adress 1: CDS/Observatoire de Strasbourg
11, rue de l'Université
67000 STRASBOURG




_______________________________________________
utypes mailing list
utypes at ivoa.net
http://www.ivoa.net/mailman/listinfo/utypes

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ivoa.net/pipermail/dm/attachments/20120518/51e5ea3e/attachment-0001.html>


More information about the dm mailing list