Astro-WISE perspective on provenance, Was: WasDerivedFrom vs. WasGeneratedBy

Kristin Riebe kriebe at aip.de
Tue Nov 7 09:21:41 CET 2017


Dear Hugo,

> There is one preference I have though. In Astro-WISE there is no real 
> difference between a workflow to create a new data product and the 
> provenance of an existing data product. A to-be created data product is 
> just like a created one without having the make() method called 
> (recursively if necessary). So what I'd like is a mechanism that 
> (somehow) supports this workflow-provenance duality. For example that 
> you could easily reuse the provenance of an existing data product to 
> create a new data product (after changing a parameter or so).
> 
> (If it were up to me, I would not use past tense like 'wasDerivedFrom', 
> 'wasGeneratedBy' and 'used', but nouns like 'progenitor', 'generator', 
> 'dependency', that way the same terminology can be used for provenance 
> as well as workflows. But this is just cosmetics and philosophy.)

We used the terminology from the W3C Provenance Data Model, where they
state explicitly that it is meant to describe the past - therefore the
past tense is used.
The 'verb' form instead of using nouns makes it easier to distinguish
relations from objects. But yeah, it's a matter of taste.

For creating a new data product based on the provenance of an existing
entity, one could build 'templates' by defining the appropriate
description-classes. If another product needs to be created, the same
description classes can be used and just the activities/entities and the
parameter-values need to be adjusted.
Mathieu is doing something similar with his UWS OPUS implementation,
using ActivityDescriptions to define which parameters are available etc.

>     Hmmm... maybe we should have one of the next provenance work group
>     meetings in the Netherlands. :-)
> 
> That would be great. We are not that active in the IVOA at the moment, 
> so on the one hand such a meeting would be a good opportunity to get us 
> more involved, but on the other hand might make it hard to create 
> momentum to actually organize it.

We wouldn't need much - just a meeting room for one or two days and
W-LAN. And we are usually only about 6 - 10 people. Would you have time
to join a meeting at the end of November or beginning of December?

> What I ultimately would like is to have the tools be able to 
> combine/split entities/activities automatically. E.g. that zoomed out 
> you'd only see the major branches of the provenance graph, and that 
> branches split into smaller and smaller activities and entities if you 
> zoom in. (Where this 'zooming' and 'splitting/combining' would not just 
> be a representational thing, but actually represents how the system 
> works internally.) Some day I'll write this down, it doesn't have to be 
> hard :-).

Yes, we also wanted that. Nice to see that other people agree. :-)
We introduced ActivityFlow especially for this, for being able to 'hide'
a part of provenance in a big activity-like thing (e.g. a
pipeline-activityFlow with its steps as sub-activities). The
input/output of an activityFlow could be generated from the input/output
of its sub-activities. So as soon as the relation between an activity
and its activityFlow is defined, all other relations for the
activityFlow could be generated automatically. We had some discussions
about introducing a 'viewLevel' or 'detailLevel' or similar, because the
sub-activities of an activityFlow may be activityFlows as well, and it
would be convenient to have the means to just extract the activityFlows
from the uppermost-level or the most detailed level.

But we got stuck defining the 0-level and the direction in which the
integer should increase, because you may want to define more details or
more combined activityFlows later on.
However, we are becoming stricter now anyway, so probably it would be
best to make it a requirement that activities need to be defined first
and one needs to make sure that this is the most detailed level and no
further sub-activities will be defined. These are then the level-0-items.
An activityFlow constructed from these activities then has level 1, if
many of such level-1-activityFlows are combined to another activityFlow,
it gets level 2 etc. Then at least there cannot be anything < 0. And the
activityFlows with the same level kind of have the same degree of detail.

Cheers,
Kristin

-- 
-------------------------------------------------------
Dr. Kristin Riebe
Press and Public Outreach,
Web development

Email: kriebe at aip.de, webmaster at aip.de
Phone: +49 331 7499-377
Room:  Bib/3
-------------------------------------------------------
Leibniz-Institut für Astrophysik Potsdam (AIP)
An der Sternwarte 16, D-14482 Potsdam
Vorstand: Prof. Dr. Matthias Steinmetz, Matthias Winker
Stiftung bürgerlichen Rechts
Stiftungsverzeichnis Brandenburg: 26 742-00/7026
-------------------------------------------------------


More information about the dm mailing list