New ProvenanceDM working draft released

Laurino, Omar olaurino at cfa.harvard.edu
Fri Oct 13 16:04:26 CEST 2017


Francois,

On Fri, Oct 13, 2017 at 1:46 AM, François Bonnarel <
francois.bonnarel at astro.unistra.fr> wrote:

> That's the point. We had the impression that VO-DML mapping into VOTable
> will not help us to define a TAP schema for Provenance. The version
> annouced recently by Laurent is not, as far as I understand, seems to
> confirm that.
>
>
We are happy to have more people participating in the process so we can get
more done in the same amount of time. About three poeple for the past six
months have been working (very, very part time) on data models (STC, Cube,
TimeSeries, plus I know a couple of other people worked on CAOM as well),
their serializations for a number of different cases, the mapping
specification and document, along with several iterations of prototype
implementations, tools, documentation, and demo software. Actually, I think
that's a lot of stuff accomplished given the incredibly limited resources.

The more, the merrier.

That being said, Gerard indeed has been working on the VODML-TAP mapping
for years (again, part time and along a whole bunch of other things), and
he actually presented something at a couple of Interops and an ADASS, so
maybe he should chime in on this topic.


>
> If ProvenanceDM has a valid VODML representation, VOTable representation
> of its instances will be defined by the standard mapping mechanism, without
> requiring an ad-hoc serialization definition.
>
> If we could find a standard way to go from "vo-dml xml" represntation of
> any IVOA data model to TAP schema that would be better than ad hoc
> serialisation.
>

I think this is where we are not on the same page, really, because I am not
even sure I understand your point. However I think/hope we agree on the
goals.

PROV-VOTable is a serialization for *instances* of the model: it shouldn't
be ad-hoc and defined in the model, it should be standardized, because it
makes for a better architecture, reducing the burden on both producers and
consumers.

So PROV-VOTable is an exchange mechanism for representing a Provenance
Database, which by itself has a number of applications. It is not supposed
to represent the result of a (Prov)TAP query, which is a different
conversation.

ProvTAP is not a serialization, but a representation of the model with a
different (kind of) schema. Mapping VODML to ProvTAP should be easier,
because VODML maps directly to the relational model, so Types and Roles in
VODML directly map to Entities and Relationships in a relational schema,
with some complications typical of the Object Relational Mapping domain,
like inheritance, but nothing that wasn't solved in industry decades ago.
In this case instances are just rows in the tables. While it's certainly
useful to standardize this mapping to make it more software-friendly.

While I did quickly read the Provenance WD I certainly didn't dive into the
details, so I might well be missing something.


> We would be interested to see that. Seems to be a bit of hard work however.
>
>
>>
The mapping is not that hard, what's hard is modeling STC in such a way
that simple things are simple (in any syntax) and complex things are
possible (in any syntax).

You'll see the examples in Santiago, and I am packaging them in such a way
that they can be browsed independently from the Interop talks, annotated,
and commented.

I think the complexity of VODML and its mappings is more of an assumption
used to explain the complexity of the requirements on the models. STC is
really complex, we are striving to simplify it for everybody. The current
serialization, which was really simple to implement (the same level of
complexity of any XML serialization, including VOTable 1.3), is helping to
drive the feedback loop back into the models, improving them a lot.

The syntax is certainly perfectible and, as it's usually true for syntax,
subject to heavy bike-shedding, so that's going to be fun, as are
perfectible the models, but we are getting there, and the complexity of the
mapping implementations was never a factor in my experience.


> Again "as far as I understand it", VO-DML mapping into VOTABLE provides a
> set of TAGS in an xml schema extension of VOTABLE defined to describe
> instances of a model. At some point it integrates references to elements of
> a classical VOTable where the actual data/metadata values are stored.
>

This definition would also apply to the note Sebastien wrote and presented
in Naples, where GROUPs, FIELDrefs, and optionally PARAM-refs are used to
describe what's in the FIELDs and PARAMs themselves.

>
> I said "complex" because by looking at the examples it took me a while to
> understand what the structure is really doing and I was reluctant to strat
> writing such a structure for Provenance.


Some indirection is introduced by the fact that we want current VOTables to
remain unchanged for backward compatibility if that's what providers want:
in this case you would just *add* annotations that old clients can just
ignore. That's quite a feature if you think about it. The price to pay is
the level of indirection in the new annotation.

This is why I am working on examples that do not use that kind of
indirection, because for humans that's distracting. Machines don't really
care about that. I think examples have circulated with too much indirection
and that's unfortunate. I agree.

Along with the examples and the modeling, I am also working on
documentation on what clients need in order to make sense of the data, in
the context of STC, and implementations that can help data providers
annotating their tables. That's a lot of work for a single person who is
paid to do other stuff!


> I said "complex" because by looking at the examples it took me a while to
> understand what the structure is really doing and I was reluctant to strat
> writing such a structure for Provenance.
>
Moreover, this doesn't help me to build this classical VOTable itself. I
> don't think the object/relational mapping of a data model is currently done
> in the proposed spec. This is a real missing point.
>

Two answers here. First, ORM is indeed in the current spec. Is something
missing/wrong/hard-to-implement? Please let us know!

Secondly, that's where collaboration is key. It's as if we were building
HTML5, a complex standard that needs to be implemented by people with
different skills in different domains for different use cases, and while
you build it you can't ask questions on stack overflow, because nothing is
there yet, we are building it from scratch!

[And actually, if you compare the specs, VODML and the mapping document are
orders of magnitude simpler/shorter than HTML5, so we shouldn't even need
the sheer amount of collaboration that it took to get HTML5 right.]

Especially given the limited resources we have we can't really afford
working in isolation and just a few weeks before the Interops, with the
interaction during the Interop being limited to contemplating the
complexity of the world.


> Responses variation is almost infinite. The number and content of possible
> involved columns is very large. So the model VO-DML mapping to the VOTable
> response must be done dynamically, which  probably forbids easy usage of
> public libraries.
>

This point has been made several times, and I believe Markus has responded
several times that to him this is not a big deal and he's implemented that
logic in his services. I don't want to speak for anybody, so I'll leave it
at that.


> While in our approach the "TAP schema" representation of the model
> contents everything to describe the model and identify it in the response
> as long as we consider it is a valid relational transcription of the vo-dml
> xml  representation of the model. Can we standardize that ?
>

Yes! Your contributions to the work we've done so far will be invaluable!

Thanks,

Omar.


> PROV-VOTable is nothing else than the transcription in VOTable of the
>> tables and relationships of the *vodml-xml model*.
>> [...]
>> PROV-VOTable format can allow input/output operations of a whole project
>> provenance metadata in a database system.
>
>
> Note my own edit in bold face.
>
> It's the same goal as your PROV-VOTable, really, just standardized for all
> data models rather than ad-hoc for a single model.
>
> We are definitely going to discuss suggestions on the simplification of
> the syntax for the Mapping WD, so if you have any such suggestions please
> send them our way!
>
> By currently focusing on important data models like STC, Cube, and
> TimeSeries (and Provenance asap) we can finally see everything coming
> together. With such implementation experience, actual serializations of
> actual models to look at and compare we can then go back to discuss about
> the syntax more pragmatically.
>
> Best,
>
> Omar.
>
> --
> Omar Laurino
> Smithsonian Astrophysical Observatory
> Harvard-Smithsonian Center for Astrophysics
> 100 Acorn Park Dr
> <https://maps.google.com/?q=100+Acorn+Park+Dr&entry=gmail&source=g>.
> R-377 MS-81
> 02140 Cambridge, MA
> (617) 495-7227
>
>
>


-- 
Omar Laurino
Smithsonian Astrophysical Observatory
Harvard-Smithsonian Center for Astrophysics
100 Acorn Park Dr. R-377 MS-81
02140 Cambridge, MA
(617) 495-7227
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20171013/f021ea44/attachment.html>


More information about the dm mailing list