IVOA Provenance DM -RFC- answers to comments

Tue Nov 6 14:11:43 CET 2018

Hi all,

I have to recenter the discussion, it becomes unnecessarily complex and
confusing. A derivation is a derivation, no need to extract more meaning
out of it (for the W3C definition, see here:
https://www.w3.org/TR/prov-dm/#term-Derivation). A generic client should
answer the question: does this entity has a wasDerivedFrom relation ?
yes/no, which ones if they exist. The sense of it comes from the people or
project that recorded this relation. A dedicated client for a project may
make more sense out of a given derivation, not a generic client.

Here is the paragraph on derivation in the PR:
"Note that the \class{WasDerivedFrom} relation cannot always automatically
be
inferred from following existing \class{WasGeneratedBy} and \class{Used}
relations alone.
If there is more than one input and more than one output to an activity, it
is
not clear which entity was derived from which. Only by specifying
the descriptions and roles accordingly, or by adding a
\class{WasDerivedFrom}
relation, this direct derivation becomes known."

Here is a preliminary diagram of what can be the calibration data flow for
CTA:
https://banshee.obspm.fr/index.php/s/BRuf26L1sdX085u
Please let me have derivations, at least between data levels (e.g. DL0 to
DL1, so I don't have to dig in all the complex relations to find the main
progenitors. Also, I don't want the parameters, the descriptions, the
context or other side entities of my activities to be exposed automatically
as progenitors. Used+wasGeneratedBy does not mean wasDerivedFrom all the
time. The precise derivations can be explained textually in the
descriptions, but the derivation relation helps to find automatically
relevant provenance information in the mass of provenance data.

Here is the page of the working group with discussions, probably not
everything is contained in the minutes of the discussion, but this gives a
good idea of the topics discussed, e.g. on derivation sometimes. Sorry if
the draft does not contain all those discussions, for obvious reasons, but
the paragraph in the PR does not come from nothing.
http://wiki.ivoa.net/twiki/bin/view/IVOA/ObservationProvenanceDataModel

I would also like to discuss a more important topic: what is relevant
provenance information? The W3C structure allows anyone to store a huge
mass of provenance *data*, however, only part of it is relevant provenance
*information*. The proposed extended model for the astronomy domain aims at
guiding projects to store the information that is relevant in astronomy.
But it is not sufficient, a project should then select precisely the
relevant provenance information for their application, i.e. maybe not
everything should be recorded, just the minimum relevant information.

Cheers,
Mathieu

Le mar. 6 nov. 2018 à 10:53, Ole Streicher <ole at aip.de> a écrit :

> On 06.11.2018 01:13, Mathieu Servillat wrote:
> > Le lun. 5 nov. 2018 à 16:45, Ole Streicher <ole at aip.de
> >     > However, one important point: the IVOA core model is just the W3C
> core
> >     > model, so this would not stand as an IVOA recommendation, maybe a
> note
> >     > stating that the IVOA should restrict its provenance to W3C
> >     > provenance...
> >
> >     What is the problem with that? If we can state that our data model is
> >     just the general one, then we did a good job, didn't we?
> >
> > you seem to forget easily all the discussions we had on the necessity to
> > include descriptions, configuration and context for our use cases. This
> > information is simply *relevant* to assess the quality, reliability and
> > usefulness of entities in several use cases. This should be acknowledged
> > and respected.
>
> I don't; however a number of the discussions is not settled yet: There
> are quite many points in my RFC comments that remain unanswered since
> two weeks, for example about how to handle Context and Configuration
> (they are basically roles, but not inherent properties of an Entity).
>
> You can't ignore these points and then just argue that I "forgot" the
> discussion.
>
> >     Even within the author's, it is therefore unclear what the use of
> >     wasDerivedFrom is:
> >
> > seriously... Is this constructive? I wrote that they are not *just*
> > shortcuts. Why do you want to see this as an opposition? this shortcut
> > is simply not obvious, and can carry more information than just a
> shortcut.
> >
> >     * Mireille see them as a shortcut to speedup the search,
> >
> >     * You see this as additional information
> >
> >     When even we do not agree here, what should a potential client
> software
> >     assume?
> >
> > Sorry, this is not relevant. A client software for what objective?
>
> A generic client. The idea of Provenance as an IVOA standard is that
> anyone can take the standard and by just using it write a client (or a
> query) for any compliant server. So, if we have alternative ways to
> express the same thing, a generic client/query needs to handle all
> cases, since it does not know what is actually used. You can't rely on a
> "wasDerivedFrom" shortcut, since it may be implemented by CDS, but not
> in MuseWISE.
>
> > to make sense of a single wasDerivedFrom relation? The provenance of an
> > entity is not transported by just the wasDerivedFrom relation. What is
> > extracted is a graph, which contains different relations to other
> > entities (and activities and agents). If this graph contains
> > wasDerivedFrom relations, then it brings additional information to
> > assess the qaulity/relaibality/usefulness of something: maybe a shortcut
> > to the main progenitor, maybe a derivation from another entity, which
> > may be further explored... There is a user and probably a project behind
> > all this.
>
> There currently seem to be at least three different purposes of
> "wasDerivedFrom":
>
> 1) /just/ as a shortcut in an existing entity-activity-entity graph,
> 2) when the activity is unknown,
> 3) providing additional information beyond the role in the "used" relation
>
> The first point is just an (questionable) implementation optimization,
> as Mireille pointed out.
>
> The second may be replaced by an ad-hoc Activity and used/wasGeneratedBy
> relations, as suggested by Markus D.
>
> For the third, you didn't bring an example yet; and it may be that it is
> not so important to have it *in the first version* of the standard.
>
> >     > Moreover, it is not clear by just following used+wasGeneratedBy
> >     that an
> >     > entity is derived from another (in particular if we include many
> >     > configuration parameters as general entities, hence the
> proposition to
> >     > separate the handling of configuration parameters). For example,
> >     for an
> >     > activity that applies a flatfield to a science image: the output
> image
> >     > is derived from the input science image, but not from the flatfield
> >     > image.
> >
> >     This example shows IMO that it is not so simple: If you find some
> >
> >     artefacts on the final image, you may want to ask "what was the
> >     progenitor" to get this -- but the artefact may come from either the
> >     flatfield, or the science image. So, to investigate the artefact both
> >     are progenitors. If you really want to have the science progenitor,
> you
> >     can always ask for the "used" relation with the according role and
> don't
> >     need that shortcut.
> >
> > not bad, but not to the point. The derivation brings additional
> > provenance information, not possible to carry by a used relation (that
> > may even not exist). You can twist the examples if you want, but the
> > argument is still here.
>
> It was *your* example, not mine. It is something that is already carried
> by the role of the "used" relation, and not a use case that requires
> wasDerivedFrom.
>
> >     There are use cases where we would need them; but before we introduce
> >     them into our model, we should make *ourself* clear what they are and
> >     when they should be used. And in the meantime, we could probably live
> >     without them.
> >
> > sure, I remember deep discussions on wasDerivedFrom, accumulating
> > examples and external references, and we couldn't get rid of it because
> > it brings information that cannot be carried by the other relations.
>
> It would be great if you could point to these examples. The suggestion
> was brought up from Markus, and he should get a comprehensive answer;
> not just a "it didn't work for us".
>
> And the point here is: we could start with a simple model, collect
> experiences how far we (and others) *really* come, and then add the
> required extensions.
>
> For example, what is currently distributed as provenance (in the FITS
> files, f.e. from ESO) is usually just input files and values, recipe
> name and version of the last processing step. Obviously this already
> helps the majority of people  to understand how the file was generated.
> If we would implement *just that*, we would already serve a huge part of
> our community, and this would be *simple*.
>
> And then we can think of which of our use cases are *really* relevant
> for the standard. For example, Mireilles "re-organize some computing
> steps" is an internal requirement of CDS and could be supported by their
> internal structures. Why should this be a relevant use case for an IVOA
> provenance standard?
>
> Having a simple standard first (like our core model, or even just
> activities, entities, used, wasgeneratedby) would be easy to adopt on
> both server and client side. We therefore could have soonish
> implementations not only from the working group, both server and client
> side. It also would help people to get used to the terminology -- just
> the four words "Entity", "Activity", "used" and "wasGeneratedBy" help to
> understand already most of the provenance, and they are unambiguously
> used in such a simple model.
>
> That would be a typical "80/20 solution": it solves 80 percent of the
> use cases, but with only 20 % of the efforts/complexity.
>
> > This is about the same for the other parts of the model, which was
> > progressively acknowledged by the group to reach the current PR.
>
> Many parts of the PR were *not* acknowledged "by the group". You
> continue to try keeping us as not being part of the group. To make that
> clear: Both Anastasia and me are part of the provenance working group,
> and our arguments should be respected as well (and not ignored, as you
> did in the last two weeks). You can't speak "for the group" when
> announcing or discussing the PR.
>
> Best regards
>
> Ole
>

-- 
Dr. Mathieu Servillat
Laboratoire Univers et Théories, Bât 18, Bur. 221
Observatoire de Paris-Meudon
5 place Jules Janssen
92195 Meudon, France
Tél. +33 1 45 07 78 62
--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20181106/33391a11/attachment.html>