IVOA Provenance DM -RFC- answers to comments

Mathieu Servillat mathieu.servillat at obspm.fr
Mon Nov 5 15:16:28 CET 2018


Hi Markus,

thanks for your comments, as Mireille answered, I would follow most of your
advices that would clarify the standard.

However, one important point: the IVOA core model is just the W3C core
model, so this would not stand as an IVOA recommendation, maybe a note
stating that the IVOA should restrict its provenance to W3C provenance...
But the "flesh" of the model in the astronomy domain is in the extended
model (Descriptions, Configuration and Context of activities). We simply
follow the door open by the W3C model by using the "extensibility points"
defined in the W3C PROV DM (e.g. specifying type and role attributes), and
linked this core structure to relevant provenance information within the
IVOA concepts.

As for the wasInformedBy and wasDerivedFrom relations: they are not just
shortcuts! and they are not presented as such in the document.
"wasInformedBy" can be seen for example as a signal that triggers another
activity (or any form of signal, which are extremely common in our
instruments and in the web in general). "wasDerivedFrom" is a general
relation that can also express specialization and revisions. Moreover, it
is not clear by just following used+wasGeneratedBy that an entity is
derived from another (in particular if we include many configuration
parameters as general entities, hence the proposition to separate the
handling of configuration parameters). For example, for an activity that
applies a flatfield to a science image: the output image is derived from
the input science image, but not from the flatfield image. Moreover, we
could redo the processing with a regenerated flatfield, and in that case
this second output image is also a revision of the first output image,
which can be stored as a wasDerivedFrom relation of type "revision". This
is not developed in the current model, but those are extensions included in
the W3C PROV DM, that may become relevant to us in future versions.

I agree with you that some sentences need to be more normative (e.g. the
possibility to store prov serializations in FITS, or some optional
attributes...), this is also because we need to check with the DM group
what is relevant or not. For example, should we really fix the way to
transport provenance in FITS files?

Cheers,
Mathieu


Le lun. 5 nov. 2018 à 10:04, Ole Streicher <ole at aip.de> a écrit :

> Hi all,
>
> I shorten the text to the points where I have comments:
>
> On 04.11.2018 20:26, Mireille LOUYS wrote:
> > Le 29/10/2018 à 17:26, Markus Demleitner a écrit :
> >> TL;DR: let's only have the core model in 1.0.  We can always add
> >> extensions in 1.1.
>
> > we need the ActivityDescription class and Parameter class to be able
> > to search for some specific processing type on the data. Activity is
> > only the process launched for the computation. It does not hold the
> > details of the methods , because those details are factorised in the
> > ActivityDescription class.
>
> We could move (only) the ActivityDescription as a prov:Plan into the
> core model.
>
> Parameters are generally questioned as being special thingies that carry
> only configuration. Entities already may have the properties needed for
> Parameters.
>
> There is an ongoing discussion about Parameters; however it is stalled
> since two weeks, since nobody responded to the latest critics on this
> concept.
>
> >> So, let me plead again: Are the shortcuts *really* so valuable to
> >> you that it's worth burdening our implementors with them?
>
> > The wasDerivedFrom relation is a straightforward link when you want
> > to list the progenitors entities for one/some datasets. In the
> > Triplestore implementation for instance it really speeds up the
> > search. In the relational DB it avoids table joins.
>
> The problem here is always that a user or a client does not know whether
> wasDerivedFrom was used in the specific case. So, he always has to
> additionally query via wasGeneratedBy+used. This means that
> *addditionally* to the (still required) table joins you need to query
> another link. Much more complicated than without wasDerivedFrom.
>
> >> (m) Sect 2.1.4 Activity -- what's the rationale for making
> >> startTime and endTime mandatory?
>
> > The time stamps are a way to check the order/sequence of Activities
> > and chain them.
>
> That is also possible if they are optional. And, the order of Activities
> (and chaining them) should be determined by their *logical* order (so,
> which Activity used an Entity generated by another Activity) and not by
> their temporal order.
>
> > 'wasInformedBy' is only an optional relation and is not required.
> > These can be null if this has not been recorded.
>
> Question is why we *need* it. What is the additional information
> (instead of Markus' proposal to put an additional Entity here)
>
> > It allows to search for long or short activities and reorganize some
> > of the re-computing steps for instance.
>
> What is the use case to search for long or short activities for an
> external user? Reorganizing computing steps is (internal) workflow, but
> not provenance. And, again, it does not need to have these attributes
> mandatory.
>
> >> (o) Sect. 2.1.5 Used/@time, WasGeneratedBy/@time -- are there
> >> really important use cases in which these couldn't be replaced by
> >> the activity's startTime and endTime (operationally, not
> >> concenptually)?
>
> > Agreed to store the main time information into Activity
> > StartTime/EndTime. when we got interest for an Entity/Data , we can
> > check time details corresponding on the associated dataset given by
> > for instance the Obscore view which already contains date of
> > creation etc.
>
> prov:Entity already has a prov:generatedAtTime attribute.
>
> >> I've also done some very minor changes that I hope are
> >> uncontroversial in rev. 5209.
>
> > Please refrain from editing the document. We have a dedicated editor
> > for it and the document is open for review and comments. It will not
> > speed up the process if everyone changes the document according to
> > her/his own view. Consensus first, and then updates by the editor is
> > more efficient.
>
> Well, the current PR is not based on a consensus in the working group.
>
> Best regards
>
> Ole
>
>

-- 
Dr. Mathieu Servillat
Laboratoire Univers et Théories, Bât 18, Bur. 221
Observatoire de Paris-Meudon
5 place Jules Janssen
92195 Meudon, France
Tél. +33 1 45 07 78 62
--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20181105/68a60d9b/attachment-0001.html>


More information about the dm mailing list