Advances on Simulation Data Access Layer (SimDAL)

David Languignon david.languignon at obspm.fr
Wed May 7 04:14:04 PDT 2014


Dear colleagues,

Several discussions about SimDAL took place these last weeks by email
and following a meeting in Paris with Carlos and Marco.
To prepare next InterOp and inform any person interested in the
development of SimDAL, conclusions are summarized below.


We have general agreements on :
1 - The division of SimDAL in 3 parts :
       * Simulation Registry
           - search for metadata on projects/protocols/services
           - access SimDM serializations of project/protocols
       * Simulation Search & Discovery
           - search for simulations satisfying some constraints
           - get quantities of the matching simulations
           - provide metadata on queryable axis for constraints
                    (like votable range, values etc...). Axis are mapped
             to protocol found in SimulationRegistry.
           - provide simdm serialization of experiment package
       * Data Access (Cutout)

2 - Simulation Registry :
       - Based on description of services using protocol.xml,
            poject.xml
       - REST Interface
       - Similar to Registry Interface with some additional
         functionality as pagination
       - Optional collection externalization for large xml files
         (ex : input parameters and statistical summaries)

3 - Data Access
       - Multi-space cutout
       - No specific data format


For the Simulation Search & Discovery, we have the broad lines but
need to decide how to do discovery on both input parameters &
statistical summaries & the input/output contract of the service.

Possible solutions are :

a. Define 2 atomic services, with well defined task/perimeter. One will
search for experiments based on input parameters, the other will search
for datasets based on properties (search will be based on statistical
summaries for that properties).
This 2 API services will be composed client side to find the experiments
in the intersection of that 2 result sets.


b. Define a "macro" service which do all in one, you pass input
parameters & properties constraints to it, and it searches for matching
experiments. (see a. for ±).


c. (Gerard's suggestion) Consider defining a simplified, denormalized
version of SimDM, designed according to a set of special use cases. Each
of these cases may be represented as a "view" on the original model,
representing a single "table", that we can access using simpler queries
like the one we are looking for in case 1. above.
We could define an ExperimentalCharacterizedDataSets as a join between
(experiment, dataset,statisticalsummary).
Say something like:
Define view ExperimentalCharacterizedDataSets
Select ip.name,p.value,s.*
       From experiment e, parametersetting p, inputparameter ip, dataset
d, statisticalsummary s    Where d.experimentId=e.id and
s.datasetid=d.id and p.experimentId=e.id and ip.id=p.parameterId
They would have to define the table, annotate it using VO-DML/UTYPE
manner, but query it in the simplified way as in type 1.
I.e. pretty much as we have been discussing in the past it seems.


d. define a macro service which just know "axis", withtout wondering if
they are input param, properties etc... It just know "elements" that are
defined in a system of that axis.
It self describes that axis, including utype pointing to protocol.xml.
That axis serves as arguments to the 2 main parameters of the service:
- view/select = which element components (axis) I want the service to return
- filters = which constraints on components (axis)
Thus, experiment_publisherdid and dataset_publisherdid can be specific
axis making it possible to search on any criteria we need in our use
cases : input parameter, properties, experiment, dataset ....
We can see that as a "virtual table" whose columns are described in a
way similar to S3 self describtion, except that here we don't self
describe service parameters by system axis that can be passed to
predefined "view" and "fitlers" service parameters. Can also be seen as
an sql "select *view* where *filters*" on a flat table.
This could later be extended with a vodml/utype self describtion & seems
quite easy to integrate in the existing SVO architecture.


The details of the service contract/api & parameter passing is still to
discuss.


VO-DML has lots of potential but is maybe too new to be used in version
1.0 of SimDAL, if we wish to get a WD for the end of this year. This
approach may be implemented in SimDAL 2.0.



--
Franck & David





More information about the theory mailing list