New ProvenanceDM working draft released

François Bonnarel francois.bonnarel at astro.unistra.fr
Fri Oct 13 23:28:42 CEST 2017


Hi Omar, all


Le 13/10/2017 à 16:04, Laurino, Omar a écrit :
> Francois,
>
> On Fri, Oct 13, 2017 at 1:46 AM, François Bonnarel 
> <francois.bonnarel at astro.unistra.fr 
> <mailto:francois.bonnarel at astro.unistra.fr>> wrote:
>
>     That's the point. We had the impression that VO-DML mapping into
>     VOTable will not help us to define a TAP schema for Provenance.
>     The version annouced recently by Laurent is not, as far as I
>     understand, seems to confirm that.
>
>
>
> We are happy to have more people participating in the process so we 
> can get more done in the same amount of time. About three poeple for 
> the past six months have been working (very, very part time) on data 
> models (STC, Cube, TimeSeries, plus I know a couple of other people 
> worked on CAOM as well), their serializations for a number of 
> different cases, the mapping specification and document, along with 
> several iterations of prototype implementations, tools, documentation, 
> and demo software. Actually, I think that's a lot of stuff 
> accomplished given the incredibly limited resources.
Sure. But we have been a little bit more. Provenance is a collective 
effort of at least six people trying to keep consistent with othe DM work.
Another group here in Strasbourg has been working on TimeSeries 
representation trying to merge ideas from Jiri and structures defined by 
Cube DM and STC. This work will be presented in Santiago.
For those reasons we are looking carefully  at the problem of mapping 
Models into relational structures. This led us to have a look to VO-DML 
mapping.
>
> The more, the merrier.
>
OK
> That being said, Gerard indeed has been working on the VODML-TAP 
> mapping for years (again, part time and along a whole bunch of other 
> things), and he actually presented something at a couple of Interops 
> and an ADASS, so maybe he should chime in on this topic.
That, I didn't remember. I apologize for that. But the curent VO-DML 
mapping into VOTable effort has not been tackling this in the last years 
as far as I understand. I am probably ready to contribute.
>
>>
>>     If ProvenanceDM has a valid VODML representation, VOTable
>>     representation of its instances will be defined by the standard
>>     mapping mechanism, without requiring an ad-hoc serialization
>>     definition.
>     If we could find a standard way to go from "vo-dml xml"
>     represntation of any IVOA data model to TAP schema that would be
>     better than ad hoc serialisation.
>
>
> I think this is where we are not on the same page, really, because I 
> am not even sure I understand your point. However I think/hope we 
> agree on the goals.
>
> PROV-VOTable is a serialization for *instances* of the model: it 
> shouldn't be ad-hoc and defined in the model, it should be 
> standardized, because it makes for a better architecture, reducing the 
> burden on both producers and consumers.
>
> So PROV-VOTable is an exchange mechanism for representing a Provenance 
> Database, which by itself has a number of applications. It is not 
> supposed to represent the result of a (Prov)TAP query, which is a 
> different conversation.
almost agreed. Only point that the standard definition is done if we 
have a standard generation of the TAP schema because we want to resuse 
tables and columns definition of this TAP schema for PROV-VOTABle 
orgnanisation in TABLES, PARAMS and FIELDS.
>
> ProvTAP is not a serialization, but a representation of the model with 
> a different (kind of) schema. Mapping VODML to ProvTAP should be 
> easier, because VODML maps directly to the relational model, so Types 
> and Roles in VODML directly map to Entities and Relationships in a 
> relational schema, with some complications typical of the Object 
> Relational Mapping domain, like inheritance, but nothing that wasn't 
> solved in industry decades ago. In this case instances are just rows 
> in the tables. While it's certainly useful to standardize this mapping 
> to make it more software-friendly.
I probably agree.

Something below
>
> While I did quickly read the Provenance WD I certainly didn't dive 
> into the details, so I might well be missing something.
>
>     We would be interested to see that. Seems to be a bit of hard work
>     however.
>>
>>
>
> The mapping is not that hard, what's hard is modeling STC in such a 
> way that simple things are simple (in any syntax) and complex things 
> are possible (in any syntax).
>
> You'll see the examples in Santiago, and I am packaging them in such a 
> way that they can be browsed independently from the Interop talks, 
> annotated, and commented.
>
> I think the complexity of VODML and its mappings is more of an 
> assumption used to explain the complexity of the requirements on the 
> models. STC is really complex, we are striving to simplify it for 
> everybody. The current serialization, which was really simple to 
> implement (the same level of complexity of any XML serialization, 
> including VOTable 1.3), is helping to drive the feedback loop back 
> into the models, improving them a lot.
>
> The syntax is certainly perfectible and, as it's usually true for 
> syntax, subject to heavy bike-shedding, so that's going to be fun, as 
> are perfectible the models, but we are getting there, and the 
> complexity of the mapping implementations was never a factor in my 
> experience.
>
>     Again "as far as I understand it", VO-DML mapping into VOTABLE
>     provides a set of TAGS in an xml schema extension of VOTABLE
>     defined to describe instances of a model. At some point it
>     integrates references to elements of a classical VOTable where the
>     actual data/metadata values are stored.
>
>
> This definition would also apply to the note Sebastien wrote and 
> presented in Naples, where GROUPs, FIELDrefs, and optionally 
> PARAM-refs are used to describe what's in the FIELDs and PARAMs 
> themselves.
>
>
>     I said "complex" because by looking at the examples it took me a
>     while to understand what the structure is really doing and I was
>     reluctant to strat writing such a structure for Provenance.
>
>
> Some indirection is introduced by the fact that we want current 
> VOTables to remain unchanged for backward compatibility if that's what 
> providers want: in this case you would just *add* annotations that old 
> clients can just ignore. That's quite a feature if you think about it. 
> The price to pay is the level of indirection in the new annotation.
>
> This is why I am working on examples that do not use that kind of 
> indirection, because for humans that's distracting. Machines don't 
> really care about that. I think examples have circulated with too much 
> indirection and that's unfortunate. I agree.
>
> Along with the examples and the modeling, I am also working on 
> documentation on what clients need in order to make sense of the data, 
> in the context of STC, and implementations that can help data 
> providers annotating their tables. That's a lot of work for a single 
> person who is paid to do other stuff!
>
>     I said "complex" because by looking at the examples it took me a
>     while to understand what the structure is really doing and I was
>     reluctant to strat writing such a structure for Provenance.
>
>     Moreover, this doesn't help me to build this classical VOTable
>     itself. I don't think the object/relational mapping of a data
>     model is currently done in the proposed spec. This is a real
>     missing point.
>
>
> Two answers here. First, ORM is indeed in the current spec. Is 
> something missing/wrong/hard-to-implement? Please let us know!
>
> Secondly, that's where collaboration is key. It's as if we were 
> building HTML5, a complex standard that needs to be implemented by 
> people with different skills in different domains for different use 
> cases, and while you build it you can't ask questions on stack 
> overflow, because nothing is there yet, we are building it from scratch!
>
> [And actually, if you compare the specs, VODML and the mapping 
> document are orders of magnitude simpler/shorter than HTML5, so we 
> shouldn't even need the sheer amount of collaboration that it took to 
> get HTML5 right.]
>
> Especially given the limited resources we have we can't really afford 
> working in isolation and just a few weeks before the Interops, with 
> the interaction during the Interop being limited to contemplating the 
> complexity of the world.
>
>     Responses variation is almost infinite. The number and content of
>     possible involved columns is very large. So the model VO-DML
>     mapping to the VOTable response must be done dynamically, which 
>     probably forbids easy usage of public libraries.
>
>
> This point has been made several times, and I believe Markus has 
> responded several times that to him this is not a big deal and he's 
> implemented that logic in his services. I don't want to speak for 
> anybody, so I'll leave it at that.
>
>     While in our approach the "TAP schema" representation of the model
>     contents everything to describe the model and identify it in the
>     response as long as we consider it is a valid relational
>     transcription of the vo-dml xml  representation of the model. Can
>     we standardize that ?
>
>
> Yes! Your contributions to the work we've done so far will be invaluable!
OK for a discussion on this oRM point.

Cheers
François
> Thanks,
>
> Omar.
>
>
>>         PROV-VOTable is nothing else than the transcription in
>>         VOTable of the tables and relationships of the *vodml-xml model*.
>>         [...]
>>         PROV-VOTable format can allow input/output operations of a
>>         whole project provenance metadata in a database system.
>>
>>
>>     Note my own edit in bold face.
>>
>>     It's the same goal as your PROV-VOTable, really, just
>>     standardized for all data models rather than ad-hoc for a single
>>     model.
>>
>>     We are definitely going to discuss suggestions on the
>>     simplification of the syntax for the Mapping WD, so if you have
>>     any such suggestions please send them our way!
>>
>>     By currently focusing on important data models like STC, Cube,
>>     and TimeSeries (and Provenance asap) we can finally see
>>     everything coming together. With such implementation experience,
>>     actual serializations of actual models to look at and compare we
>>     can then go back to discuss about the syntax more pragmatically.
>>
>>     Best,
>>
>>     Omar.
>>
>>     -- 
>>     Omar Laurino
>>     Smithsonian Astrophysical Observatory
>>     Harvard-Smithsonian Center for Astrophysics
>>     100 Acorn Park Dr
>>     <https://maps.google.com/?q=100+Acorn+Park+Dr&entry=gmail&source=g>.
>>     R-377 MS-81
>>     02140 Cambridge, MA
>>     (617) 495-7227
>
>
>
>
> -- 
> Omar Laurino
> Smithsonian Astrophysical Observatory
> Harvard-Smithsonian Center for Astrophysics
> 100 Acorn Park Dr. R-377 MS-81
> 02140 Cambridge, MA
> (617) 495-7227

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20171013/21e1f979/attachment-0001.html>


More information about the dm mailing list