about #calibration (VEP-006) : ----> IMPORTANT for DataLInk EXTENDED USAGE

Wed Oct 13 17:48:37 CEST 2021

Hi all, Markus
As I said, I am ready to let VEP-006 go because it's trying to solve a 
real use case if nobody's reluctant to it in the TCG.
But I am confident that the other use case (applied) will get out very 
soon, if its not already somewhere in some DataLink response and not 
noticed by our "radars"
(there is no possibility to check which terms are used in DataLink 
services or capabilities at the registry level)
So I will have warned : by not trying to find a consistent and global 
solution now we may encounter problems in a near future
But However I cannot keep silent when I read some inconsistences in the 
rationale. see below
Le 11/10/2021 à 14:19, Markus Demleitner a écrit :
> François,
>
> On Mon, Oct 11, 2021 at 09:58:42AM +0200, BONNAREL FRANCOIS wrote:
>>> On Fri, Oct 08, 2021 at 07:06:31PM +0200, BONNAREL FRANCOIS wrote:
>>>> Le 07/10/2021 à 15:24, Markus Demleitner a écrit :
>>>>> Based on this, could you then explain as clearly and concisely as you
>>>>> can why VEP-006 impedes that use case?
>>>> A user discovers a calibrated image (HST, ESO, etc...) . With DataLink
>>>> (#this or #preview) she has a look to the image and want to see how the
>>>> uncalibrated data and the flat field looked like to understand some of the
>>>> features. DataLink provides  a link to the #progenitor and also (by some
>>>> record the semantics of which cannot be anymore "calibration or #flat) to
>>>> the flat field, etc... used to calibrate this progenitor.
>>> ...but for this use case there is no need to distinguish between what
>>> you call a progenitor (i.e., non-calibration part of provenance) and
>>> calibration files applied.  Right?
>> Of course it's needed to make this distinction. Even to obtain the right
>> caption for the display.
> A datalink client will obviously take the caption from datalink's
> description field, no?  I frankly cannot see what role the semantics
> field could have in this.
>
> What else are you thinking of?  As I said, it helps to use the "A
> user wants... the computer does... using..." template when stating
> such things so other people can follow.
>
>> Not to speak about possible reprocessing
> I think we all agree that datalink metadata is *far* too weak to
> support this; I suspect even full provenance will not usually let a
> computer work out a reprocessing chain by itself.  You know, workflow
> engines are the complex beasts they are for a reason.  So, datalink
> may help selecting artefacts mentioned in a provenance instance, but
> for that, a "Part-of-Provenance" concept is enough.  Agreed?

About the two answers above

If the "description" display is enough for "applied" why is it not the 
case for "applicable" (VEP-006 definition for #calibration) ?

What should the client do more in the case of applicable than for

You have anyway no idea in general what kind of process you will really 
apply with these calibration data  ?

You write that reprocessing is too complex for datalink in the case of 
"already applied" but I imagine it's excatly the same for applicable.

So if "description" are enough why don't we follow Laurent, Ada and many 
other before to relax the defintion and allow both use cases ?

It is not my prefered solution as you know but it would be more 
consistent with what you are writing there.

Cheers

François

>>> Plus: A client can already do that, no?  If you think not: What do
>>> you see missing?
>> What would be the semantics term able to drive that? Progenitor alone  is
>> not : this, at least, as been discussed extensively (see below references)
> I do not see why it would not be.  "A user wants to debug a data
> product.  The computer takes all #progenitor links and displays them
> together with their descriptions, offering to download them for
> inspection or possible use with a full description of the provenance."
>
>>>> Client software is intended to display all these images (science and
>>>> calibration) together for checking and comparison. Moreover an advanced
>>>> version could poropose some kind of reprocessing of progenitor.
>>> Not that that has any relationship to VEP-006 at all, but we have
>>> provenance for a detailed description of how the various pieces of
>>> the provenance chain play together; we certainly do not want to
>>> re-model that in the datalink vocabulary.  It's been compicated
>>> enough to do that modelling once.
>> Of course it is a very interesting use case of DataLink to provide a link
>> towards a full (or last step) ivoa provenance record.
> Yes.  But that doesn't mean we have to re-build provenance in
> datalink.  On the contrary: we can have a nice, clean separation of
> concerns, where datalink says how to get things and provenance says
> how they fit together.
>
>> What #calibration-applied provides is a kind of  "poor-lady" provenance
>> which only links used datasets without any insight on the activity and
>> agents involved
>>
>> DataLink in itself has a poor but efficient way to characterize relationship
>> between #this item and the target of the link
> Yes -- it's enough to filter links, which is what we want in datalink
> semantics.  And VEP-006 plus the current state does exactly this.
>
>>> Second, the current #progenitor is clear that if there were any
>>> "Calibration applied" links, they would be covered by its concept; see
>>> its description: "data resources that were used to create this
>>> dataset (e.g. input raw data)".  You may not like the concept or its
>>> label, but we have VEP-009 to discuss that.
>> Let's go back to VEP-009
> Sure, but can we *please* do that outside of the VEP-006 discussion?
> I'm very sure we're not yet proficient enough in this kind of
> discussion that we can have multiple of them at the same time.  And
> I think you still have not argued why VEP-006 and VEP-009 could not
> be treated separately, i.e., how my elaboration of how we still have
> all reasonable options even after accepting VEP-006.
>
>> Some references
>>
>> Paul Harrison May the 5th
>>
>> Mireille , March the 23rd
>>
>> Stephane Erard March the 25th
>>
> All these persons have been at meetings in the meantime, and (at
> least that's what I took away from these meetings) they were
> satisified that their concerns were taken into account in the current
> form of VEP-006.
>
> Paul, Mireille, and Stéphane: If I'm misrepresenting you, please
> correct me.
>
>> Not to speak about the  solution Pat proposed me in a private email (see my
>> email last monday for details). I have some, concerns about it but this is
>> the part I fooly agree with
>>
>> Recursive usage of DataLink to provide both science data and
>> calibration-used data
>>
>> #progenitor link followed by #this link to get science data
>>
>> #progenitor link followed by #calibration to get calibration data associated
>> to these rawr science data
> While I don't believe this belongs into a discussion of VEP-006, this
> is one reason why I'm rather skeptical of your VEP-009: With current
> #progenitor, the link from the reduced to the raw datalink document
> would reasonably be #progenitor.  With VEP-009, that is quite
> certainly no longer the case (unless your definition of "science
> data" took a surprising turn later).  But let's discuss that with
> VEP-009.
>
>> The consequence of this is that #progenitor itself are science data
> No, in that case it would be a mix of "science data" and all kinds of
> other things that went into the reduction.  Which, mind you, is fine,
> and I think it's the way such things should be done.  It's just not
> within the concept your description in VEP-009 seems to try to
> define.
>
>>> If you disagree on this assessment: How would VEP-006 influence this
>>> deliberation?
>> VEP-006 is not proposing new terms it's changing the definition of old terms
>> in a sense that calibration-applied is now forbidden.
> Well, the concept "pre-VEP-006-#calibration minus
> post-VEP-006-#calibration" apparently is not populated in the current
> VO, and as I argued in in two mails back, when members of that
> concept come around, all the options are still around whether or not
> we accept VEP-006; so, again, I don't see why we can't at least get
> VEP-006 off the table.
>
>
>>> If not, François, can you at least agree to: "I think VEP-006 is
>>> wrong, but I'll not veto it"?
>> Exactly this : if nobody interested I give up. But I think we will encounter
>> consistency issues in a near future if we don't discuss the consequences of
>> this major change of definition for #calibration.
> Can you speculate what consistency issues you expect to see?  Because
> you see, the only reason VEP-006 is there is that without it there's
> the problem that current #progenitor and #calibration have a nonempty
> intersection and a nonempty difference, which is really bad in a
> formal vocabulary of this sort.
>
> Now, since the only point of the VEP is to fix an inconsistency, it
> would defeat the purpose if new ones came up.
>
>
> Anyway, now that Fançoise has chimed in -- what do we do?
>
> For me, it would still be helpful to see what problem you François,
> are trying to solve or you, Françoise, see as well.  And please try
> to be as concrete as possible and to limit things to VEP-006.  And if
> you feel that is *reayll* impossible, at least make a strong and
> reproducible case for why we have to solve the question of the
> "Part-of-Provenance" concept together with it.
>
>                -- Markus