[SimDB] theory plans

Thu May 15 01:31:27 PDT 2008

Dear colleagues

We have split up the former SNAP effort of the theory interest group in two
separate ones: 
SimDB(=Simulation Database) and SimDAP(=Simulation Data Access Protocol). 

This mail deals with SimDB only:
what it is supposed to be, what its current state is, and what further work
is needed and its organisation. 

1. SimDB is a specification for a Simulation (meta-)Database (could be
called Simulation Registry, -Portal)

2. SimDB is an online service offering query capabilities to a database
containing meta data describing results of simulations and their
post-processing as well as about the codes used in these algorithms.
Currently the simulations are still supposed to be those that produce a
representation of 3+1D space, (possibly reduced spatial dimentsions through
assumptions of symmetry). This is open for discussion.

3. A SimDB also contains information about web services giving access to the
simulation results themselves. The more detailed specification of such
services is the goal of the SimDAP-specification.

4. SimDB is based on a (logical) data model, fully specified in UML. 

5. From the UML data model we derive physical models for use in their
respective contexts:
 - The "public tables" of a relational data model, for implementing the
database 
   in a RDBM system so that SQL (c.q. ADQL) queries can be easily
implemented.
 - XML schema, defining valid XML documents containing SimDB meta data
descriptions
 - UTYPEs for the elements of the model.

6. We present XSLT scripts that derive these physical models directly from
the UML model according to predefined mapping rules.

7. We also derive Java classes with JPA and JAXB annotations to make it easy
to implement a SimDB from the specification.

8. We suggest an implementation path to transform an existing relational
database to SimDB. 

We think this effort is evolved far enought that it can be moved onto the
recommendation track.
There some issues have to be resolved that require input from a number of
working groups, among which are:

 -- We need to confront scientists with our ideas and extract feedback. 
    This includes defining use cases.
   => This is a task for the TIG

 -- We need to iron out the wrinkles from the data model and check its
relevance. 
   => We need input from domain experts (i.e. theorists) to see whether
their results fit in the model.
   => We need input from the DM WG (i.e. people with data modelling
experience) on various aspects of the data model.

 -- Should a SimDB be stand alone, or should it be possible to have
relations between SimDB/Resource-s in different SimDB-s?
    The answer to this has repercussions for the XML and RDB
representations. It may have repercussions for the functioning of
    a SimDB in case it is supposed to be stand alone (mirroring through
harvesting of external resources may be required).
   => We would like to discuss these issues with the Registry WG who have
experience with similar issues.

 -- Should a SimDB be read-only, i.e. represent the work of the groups
publishing their own results?
    Or can external parties register their results in a nearby SimDB? 
    I.e., is SimDB S*AP like, or Registry like (an extension registry?), or
something in between?
   => We'd like to discuss this with Registry and DAL WGs.

 -- We need a common vocabulary for certain attributes in the model that
refer to astrophysical concepts and are likely prime targets for queries. We
want to use existing vacobularies where possible, but may need to define new
ones.
   => We would like to discuss this with the Semantics WG.

 -- We need a way to express formally that SimDB services offer a particular
ADQL query interface. 
   => We think we need to discuss these issues with Registry and DAL(TAP)
WGs.

 -- SimDB is a service that ADQL queries can be sent to and may(?) be
thought of as a TAP service. 
    However we'd rather not wait until TAPs specification is done before
continuing. 
   => We would like to discuss the issues arising from this within the
context of TAP and ADQL, i.e. need discussions with DAL and VOQL. 

We suggest forming a focus group to tackle these and other issues. This
group should contain members from the mentioned WGs, together with the
current developers of SimDB ("members" from the TIG).

It is unclear what the formal organisation should be, as there is not a
single WG that could most obviously be given responsibility for the further
development of this standard and the theory INTEREST group can not move a
working draft through the recommendation track. Might it be possible for
cross-WG focus groups like the one proposed here to get such responsibility?
Or should intereste groups get this responsibility after all? Note that
similar questions may come up based on the corss-WG focus group on UTYPE
proposed by Mireille Louys.

We expect to discuss all these issues in Trieste, but would like to get
input on this from now on. Could this discussion be held on the theory
group's mailing list with the subject as above?

Thanks and best regards

Gerard Lemson 
for Herve Wozniak, Mirelle Louys and members of the SNAP "tiger team".

PS
We have seen that our approach using UML as source for scripts producing
other representations of the model is working, viable and completely
independent of theory. We think it can be of use to other data modelling
efforts and suggest that the DM working group could start efforts to come up
with a set of META-specifications on:
      - a UML profile defining a domain specific language for logical data
models in the IVOA.
      - mapping rules to transform these logical models to physical models.