Necessity of ActivityDescription [was: IVOA Provenance DM -RFC- answers to comments]

Laurent Michel laurent.michel at astro.unistra.fr
Wed Nov 28 16:22:56 CET 2018


Hello,

Le 26/11/2018 à 13:20, Ole Streicher a écrit :
> Hi Markus,
> 
> On 26.11.18 13:01, Markus Demleitner wrote:
>> On Sun, Nov 04, 2018 at 08:26:09PM +0100, Mireille LOUYS wrote:
>>>>> http://wiki.ivoa.net/twiki/bin/view/IVOA/ProvenanceRFC
>>>> I've posted the following to the Wiki, but I thought having it on the
>>>> list might be more conducive to discussions, so here's what I my
>>>> thoughts were while reviewing this.
>>>>
>>>> TL;DR: let's only have the core model in 1.0.  We can always add
>>>> extensions in 1.1.
>>> we need the ActivityDescription class and Parameter class to be able to
>>> search for some specific processing type on the data.
>>> Activity is only the process launched for the computation.
>>> It does not hold the details of the methods , because those details are
>>> factorised in the ActivityDescription class.
>>
>> You mean "Find me all source extractions being done on the images of
>> this data collection"?  That *does* sound like a fairly basic thing to
>> want to do, yes, and from what I see in the current Activity model, it
>> would, indeed, seem to be impossible just with what's there.
> 
> I do not see the point here. We do not have a common vocabulary on
> activities (or activity descriptions) (yet), so to find out all source
> extractions, you need some domain knowledge about the activity -- like
> its name, input and output roles.

You are touching here a general issue.
We can not index pieces of software (or Activities) by what that they can do.
When we look for a specific function in source code, we either  explore the doc (string matching) or the code + comments itself 
(string matching again).
I guess that this global issue won't be solved by Provenance.

We can expect a provenance DB to be able to answer one the these 2 requests:

1- If I know the activity names:
     Give me all activities named  'SourceExtractor" involved in the processing of my image collection
2- If I'm dicovering the activities
     Give me all activities whith a description matching i/source\s+extract.*/  involved in the processing of my image collection

But the following request will never be supported whatever the model is:

1- Give me all activities doing source extraction

The model can define a coarse grain vocabulary (calibration, filtering ...) but the exact description of what an activity is 
doing will always require free text ans then fuzzy queries (RegExp, ElasticSerarch..)

Laurent
> 
> But if you have the name an, you may just query "give me all activities
> which have [an activity description with] this name and where my images
> were used as 'input'".
> 
> These vocabularies are not planned for (the first version of)
> VO-Provenance, and I guess they would take quite some time to develop,
> given that there are so many possible special activities out there. Just
> have a look to the ESO pipelines; it seems difficult to impossible to
> classify them in a manner that your query could be done without domain
> knowledge.
> 
>> I'm sure the W3C has a plan for this -- do you know what it is?  Can't
>> we just follow them or is there a use case we have they don't?
> 
> W3C does not deal with queries. They try to describe what is there, and
> they assume a domain specific model on top.
> 
> Cheers
> 
> Ole
> 

-- 
jesuischarlie/Tunis/Paris/Bruxelles/Berlin

Laurent Michel
SSC XMM-Newton
Tél : +33 (0)3 68 85 24 37
Fax : +33 (0)3 )3 68 85 24 32
Université de Strasbourg <http://www.unistra.fr>
Observatoire Astronomique
11 Rue de l'Université
F - 67200 Strasbourg



More information about the dm mailing list