IVOA Provenance DM -RFC- answers to comments

Ole Streicher ole at aip.de
Tue Nov 6 10:53:17 CET 2018


On 06.11.2018 01:13, Mathieu Servillat wrote:
> Le lun. 5 nov. 2018 à 16:45, Ole Streicher <ole at aip.de
>     > However, one important point: the IVOA core model is just the W3C core
>     > model, so this would not stand as an IVOA recommendation, maybe a note
>     > stating that the IVOA should restrict its provenance to W3C
>     > provenance...
> 
>     What is the problem with that? If we can state that our data model is
>     just the general one, then we did a good job, didn't we?
> 
> you seem to forget easily all the discussions we had on the necessity to
> include descriptions, configuration and context for our use cases. This
> information is simply *relevant* to assess the quality, reliability and
> usefulness of entities in several use cases. This should be acknowledged
> and respected. 

I don't; however a number of the discussions is not settled yet: There
are quite many points in my RFC comments that remain unanswered since
two weeks, for example about how to handle Context and Configuration
(they are basically roles, but not inherent properties of an Entity).

You can't ignore these points and then just argue that I "forgot" the
discussion.

>     Even within the author's, it is therefore unclear what the use of
>     wasDerivedFrom is:
> 
> seriously... Is this constructive? I wrote that they are not *just*
> shortcuts. Why do you want to see this as an opposition? this shortcut
> is simply not obvious, and can carry more information than just a shortcut.
> 
>     * Mireille see them as a shortcut to speedup the search,
> 
>     * You see this as additional information
> 
>     When even we do not agree here, what should a potential client software
>     assume?
> 
> Sorry, this is not relevant. A client software for what objective?

A generic client. The idea of Provenance as an IVOA standard is that
anyone can take the standard and by just using it write a client (or a
query) for any compliant server. So, if we have alternative ways to
express the same thing, a generic client/query needs to handle all
cases, since it does not know what is actually used. You can't rely on a
"wasDerivedFrom" shortcut, since it may be implemented by CDS, but not
in MuseWISE.

> to make sense of a single wasDerivedFrom relation? The provenance of an
> entity is not transported by just the wasDerivedFrom relation. What is
> extracted is a graph, which contains different relations to other
> entities (and activities and agents). If this graph contains
> wasDerivedFrom relations, then it brings additional information to
> assess the qaulity/relaibality/usefulness of something: maybe a shortcut
> to the main progenitor, maybe a derivation from another entity, which
> may be further explored... There is a user and probably a project behind
> all this.

There currently seem to be at least three different purposes of
"wasDerivedFrom":

1) /just/ as a shortcut in an existing entity-activity-entity graph,
2) when the activity is unknown,
3) providing additional information beyond the role in the "used" relation

The first point is just an (questionable) implementation optimization,
as Mireille pointed out.

The second may be replaced by an ad-hoc Activity and used/wasGeneratedBy
relations, as suggested by Markus D.

For the third, you didn't bring an example yet; and it may be that it is
not so important to have it *in the first version* of the standard.

>     > Moreover, it is not clear by just following used+wasGeneratedBy
>     that an
>     > entity is derived from another (in particular if we include many
>     > configuration parameters as general entities, hence the proposition to
>     > separate the handling of configuration parameters). For example,
>     for an
>     > activity that applies a flatfield to a science image: the output image
>     > is derived from the input science image, but not from the flatfield
>     > image.
> 
>     This example shows IMO that it is not so simple: If you find some
> 
>     artefacts on the final image, you may want to ask "what was the
>     progenitor" to get this -- but the artefact may come from either the
>     flatfield, or the science image. So, to investigate the artefact both
>     are progenitors. If you really want to have the science progenitor, you
>     can always ask for the "used" relation with the according role and don't
>     need that shortcut.
> 
> not bad, but not to the point. The derivation brings additional
> provenance information, not possible to carry by a used relation (that
> may even not exist). You can twist the examples if you want, but the
> argument is still here.

It was *your* example, not mine. It is something that is already carried
by the role of the "used" relation, and not a use case that requires
wasDerivedFrom.

>     There are use cases where we would need them; but before we introduce
>     them into our model, we should make *ourself* clear what they are and
>     when they should be used. And in the meantime, we could probably live
>     without them. 
> 
> sure, I remember deep discussions on wasDerivedFrom, accumulating
> examples and external references, and we couldn't get rid of it because
> it brings information that cannot be carried by the other relations.

It would be great if you could point to these examples. The suggestion
was brought up from Markus, and he should get a comprehensive answer;
not just a "it didn't work for us".

And the point here is: we could start with a simple model, collect
experiences how far we (and others) *really* come, and then add the
required extensions.

For example, what is currently distributed as provenance (in the FITS
files, f.e. from ESO) is usually just input files and values, recipe
name and version of the last processing step. Obviously this already
helps the majority of people  to understand how the file was generated.
If we would implement *just that*, we would already serve a huge part of
our community, and this would be *simple*.

And then we can think of which of our use cases are *really* relevant
for the standard. For example, Mireilles "re-organize some computing
steps" is an internal requirement of CDS and could be supported by their
internal structures. Why should this be a relevant use case for an IVOA
provenance standard?

Having a simple standard first (like our core model, or even just
activities, entities, used, wasgeneratedby) would be easy to adopt on
both server and client side. We therefore could have soonish
implementations not only from the working group, both server and client
side. It also would help people to get used to the terminology -- just
the four words "Entity", "Activity", "used" and "wasGeneratedBy" help to
understand already most of the provenance, and they are unambiguously
used in such a simple model.

That would be a typical "80/20 solution": it solves 80 percent of the
use cases, but with only 20 % of the efforts/complexity.

> This is about the same for the other parts of the model, which was
> progressively acknowledged by the group to reach the current PR.

Many parts of the PR were *not* acknowledged "by the group". You
continue to try keeping us as not being part of the group. To make that
clear: Both Anastasia and me are part of the provenance working group,
and our arguments should be respected as well (and not ignored, as you
did in the last two weeks). You can't speak "for the group" when
announcing or discussing the PR.

Best regards

Ole


More information about the dm mailing list