<div dir="ltr">Dear Kristin, DM,<div><br></div><div>Some replies to your questions, showing my general view on provenance (and data models); maybe it is useful to you as another perspective. So not really applicable to the Provenance DM draft directly; I might get back to that later.</div><div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 23, 2017 at 9:04 PM, Kristin Riebe <span dir="ltr"><<a href="mailto:kriebe@aip.de" target="_blank">kriebe@aip.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Hugo, DM,<br>
<br>
thanks a lot for your use case and explanations! It's so great that people from different projects are joining in the discussion. That's really helpful.<br></blockquote><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">I'm curious and I'd like to make use of your experience and ask some more questions:<br></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
What does the provenance looks like when you retrieve it via your tools? I.e. for a given processed image, using your tools and Astrowise, what does the user get? Just a list of entities? Or parameters for the activities?<br>
It's all stored in a database, right? But users don't do direct database queries, do they?<br></blockquote><div><br></div><div>The provenance is an integral part of the system (Astro-WISE) so normally there is no specific action a user takes to 'get' the provenance. The default interface is through Pyhton: every data product corresponds to a Python object with (lazy) properties that refer to its dependencies. Also, the activity is implicit in the entity: that is, each Python object that represents a data product has a make() method that (re)creates the data that corresponds to the product by (re)processing the dependencies. All parameters are also properties of the object.</div><div><br></div><div>The other main interface we have is a web-based database viewer that also links all objects to their dependencies through normal html hyperlinks. Users can enter free form SQL there as well chaining dependencies through table joins. (Normally, users would at best alter SQL that was automatically generated, not type them from scratch.)</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Would it be useful for you to exchange the retrieved provenance metadata with other tools/services? What kind of exchange format would you prefer? (E.g. one of the W3C serialisation formats PROV-JSON etc. or would you prefer something else?)<br></blockquote><div><br></div><div>That is actually why I'm on this list :-). I've written some crappy XML serializations for some proof of concept work (using SAMP), but that was not sufficient and didn't follow any real standard. So I'm here to learn more.</div><div><br></div><div>There is one preference I have though. In Astro-WISE there is no real difference between a workflow to create a new data product and the provenance of an existing data product. A to-be created data product is just like a created one without having the make() method called (recursively if necessary). So what I'd like is a mechanism that (somehow) supports this workflow-provenance duality. For example that you could easily reuse the provenance of an existing data product to create a new data product (after changing a parameter or so).</div><div><br></div><div>(If it were up to me, I would not use past tense like 'wasDerivedFrom', 'wasGeneratedBy' and 'used', but nouns like 'progenitor', 'generator', 'dependency', that way the same terminology can be used for provenance as well as workflows. But this is just cosmetics and philosophy.)</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hmmm... maybe we should have one of the next provenance work group meetings in the Netherlands. :-)<br></blockquote><div><br></div><div>That would be great. We are not that active in the IVOA at the moment, so on the one hand such a meeting would be a good opportunity to get us more involved, but on the other hand might make it hard to create momentum to actually organize it.<br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
One more question for one of your points:<br>
You are saying "There are no unimportant activities." and I get your point here. Would you say the same for entities?<br>
Or are there activities for which the intermediate entities are unimportant?<br>
For example, image a pipeline, where you want to mention the substeps and all it's parameters explicitly, but the intermediate image is not stored (permanently) and thus it makes not much sense to create an entity for it. How do you model this?</blockquote><div><br></div><div>What we did with KiDS (and other data in Astro-WISE) is to combine steps together if we didn't want to keep intermediate data. E.g. we have a 'ReducedScienceFrame' that is created with a single activity, that has as input the RawScienceFrame and all relevant calibration data, MasterFlatFrame, BiasFrame, IlluminationCorrection, etc. That is, the activities are scoped such that we'd always want to keep the resulting entities (and we do so).</div><div><br></div><div>My personal opinion is that this approach of combining things together is a mistake, exactly because of the provenance. I'd prefer to have separate activities and entities for all the intermediate steps, and have those objects stored in the database, but only store the actual pixels when desirable. The pixels can be (re)generated if necessary because full provenance is available: storing pixels simply becomes a useful optimization.</div><div><br></div><div>What I ultimately would like is to have the tools be able to combine/split entities/activities automatically. E.g. that zoomed out you'd only see the major branches of the provenance graph, and that branches split into smaller and smaller activities and entities if you zoom in. (Where this 'zooming' and 'splitting/combining' would not just be a representational thing, but actually represents how the system works internally.) Some day I'll write this down, it doesn't have to be hard :-).</div><div><br></div><div>Cheers,</div><div><br></div></div></div></div></div>