IVOA Provenance DM -RFC- answers to comments

Ole Streicher ole at aip.de
Mon Nov 5 09:57:23 CET 2018


Hi all,

I shorten the text to the points where I have comments:

On 04.11.2018 20:26, Mireille LOUYS wrote:
> Le 29/10/2018 à 17:26, Markus Demleitner a écrit :
>> TL;DR: let's only have the core model in 1.0.  We can always add 
>> extensions in 1.1.

> we need the ActivityDescription class and Parameter class to be able 
> to search for some specific processing type on the data. Activity is 
> only the process launched for the computation. It does not hold the 
> details of the methods , because those details are factorised in the 
> ActivityDescription class.

We could move (only) the ActivityDescription as a prov:Plan into the
core model.

Parameters are generally questioned as being special thingies that carry
only configuration. Entities already may have the properties needed for
Parameters.

There is an ongoing discussion about Parameters; however it is stalled
since two weeks, since nobody responded to the latest critics on this
concept.

>> So, let me plead again: Are the shortcuts *really* so valuable to 
>> you that it's worth burdening our implementors with them?

> The wasDerivedFrom relation is a straightforward link when you want 
> to list the progenitors entities for one/some datasets. In the 
> Triplestore implementation for instance it really speeds up the 
> search. In the relational DB it avoids table joins.

The problem here is always that a user or a client does not know whether
wasDerivedFrom was used in the specific case. So, he always has to
additionally query via wasGeneratedBy+used. This means that
*addditionally* to the (still required) table joins you need to query
another link. Much more complicated than without wasDerivedFrom.

>> (m) Sect 2.1.4 Activity -- what's the rationale for making 
>> startTime and endTime mandatory?

> The time stamps are a way to check the order/sequence of Activities 
> and chain them.

That is also possible if they are optional. And, the order of Activities
(and chaining them) should be determined by their *logical* order (so,
which Activity used an Entity generated by another Activity) and not by
their temporal order.

> 'wasInformedBy' is only an optional relation and is not required. 
> These can be null if this has not been recorded.

Question is why we *need* it. What is the additional information
(instead of Markus' proposal to put an additional Entity here)

> It allows to search for long or short activities and reorganize some 
> of the re-computing steps for instance.

What is the use case to search for long or short activities for an
external user? Reorganizing computing steps is (internal) workflow, but
not provenance. And, again, it does not need to have these attributes
mandatory.

>> (o) Sect. 2.1.5 Used/@time, WasGeneratedBy/@time -- are there 
>> really important use cases in which these couldn't be replaced by 
>> the activity's startTime and endTime (operationally, not 
>> concenptually)?

> Agreed to store the main time information into Activity 
> StartTime/EndTime. when we got interest for an Entity/Data , we can 
> check time details corresponding on the associated dataset given by 
> for instance the Obscore view which already contains date of
> creation etc.

prov:Entity already has a prov:generatedAtTime attribute.

>> I've also done some very minor changes that I hope are 
>> uncontroversial in rev. 5209.

> Please refrain from editing the document. We have a dedicated editor 
> for it and the document is open for review and comments. It will not 
> speed up the process if everyone changes the document according to 
> her/his own view. Consensus first, and then updates by the editor is 
> more efficient.

Well, the current PR is not based on a consensus in the working group.

Best regards

Ole



More information about the dm mailing list