about #calibration (VEP-006) : ----> IMPORTANT for DataLInk EXTENDED USAGE

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Mon Oct 11 14:19:08 CEST 2021


François,

On Mon, Oct 11, 2021 at 09:58:42AM +0200, BONNAREL FRANCOIS wrote:
> > On Fri, Oct 08, 2021 at 07:06:31PM +0200, BONNAREL FRANCOIS wrote:
> > > Le 07/10/2021 à 15:24, Markus Demleitner a écrit :
> > > > Based on this, could you then explain as clearly and concisely as you
> > > > can why VEP-006 impedes that use case?
> > > A user discovers a calibrated image (HST, ESO, etc...) . With DataLink
> > > (#this or #preview) she has a look to the image and want to see how the
> > > uncalibrated data and the flat field looked like to understand some of the
> > > features. DataLink provides  a link to the #progenitor and also (by some
> > > record the semantics of which cannot be anymore "calibration or #flat) to
> > > the flat field, etc... used to calibrate this progenitor.
> > ...but for this use case there is no need to distinguish between what
> > you call a progenitor (i.e., non-calibration part of provenance) and
> > calibration files applied.  Right?
> 
> Of course it's needed to make this distinction. Even to obtain the right
> caption for the display.

A datalink client will obviously take the caption from datalink's
description field, no?  I frankly cannot see what role the semantics
field could have in this.

What else are you thinking of?  As I said, it helps to use the "A
user wants... the computer does... using..." template when stating
such things so other people can follow.

> Not to speak about possible reprocessing

I think we all agree that datalink metadata is *far* too weak to
support this; I suspect even full provenance will not usually let a
computer work out a reprocessing chain by itself.  You know, workflow
engines are the complex beasts they are for a reason.  So, datalink
may help selecting artefacts mentioned in a provenance instance, but
for that, a "Part-of-Provenance" concept is enough.  Agreed?

> > Plus: A client can already do that, no?  If you think not: What do
> > you see missing?
> What would be the semantics term able to drive that? Progenitor alone  is
> not : this, at least, as been discussed extensively (see below references)

I do not see why it would not be.  "A user wants to debug a data
product.  The computer takes all #progenitor links and displays them
together with their descriptions, offering to download them for
inspection or possible use with a full description of the provenance."

> > > Client software is intended to display all these images (science and
> > > calibration) together for checking and comparison. Moreover an advanced
> > > version could poropose some kind of reprocessing of progenitor.
> > Not that that has any relationship to VEP-006 at all, but we have
> > provenance for a detailed description of how the various pieces of
> > the provenance chain play together; we certainly do not want to
> > re-model that in the datalink vocabulary.  It's been compicated
> > enough to do that modelling once.
> 
> Of course it is a very interesting use case of DataLink to provide a link
> towards a full (or last step) ivoa provenance record.

Yes.  But that doesn't mean we have to re-build provenance in
datalink.  On the contrary: we can have a nice, clean separation of
concerns, where datalink says how to get things and provenance says
how they fit together.

> What #calibration-applied provides is a kind of  "poor-lady" provenance
> which only links used datasets without any insight on the activity and
> agents involved
> 
> DataLink in itself has a poor but efficient way to characterize relationship
> between #this item and the target of the link

Yes -- it's enough to filter links, which is what we want in datalink
semantics.  And VEP-006 plus the current state does exactly this.

> > Second, the current #progenitor is clear that if there were any
> > "Calibration applied" links, they would be covered by its concept; see
> > its description: "data resources that were used to create this
> > dataset (e.g. input raw data)".  You may not like the concept or its
> > label, but we have VEP-009 to discuss that.
> 
> Let's go back to VEP-009

Sure, but can we *please* do that outside of the VEP-006 discussion?
I'm very sure we're not yet proficient enough in this kind of
discussion that we can have multiple of them at the same time.  And
I think you still have not argued why VEP-006 and VEP-009 could not
be treated separately, i.e., how my elaboration of how we still have
all reasonable options even after accepting VEP-006.

> Some references
> 
> Paul Harrison May the 5th
> 
> Mireille , March the 23rd
> 
> Stephane Erard March the 25th
> 

All these persons have been at meetings in the meantime, and (at
least that's what I took away from these meetings) they were
satisified that their concerns were taken into account in the current
form of VEP-006.

Paul, Mireille, and Stéphane: If I'm misrepresenting you, please
correct me.

> Not to speak about the  solution Pat proposed me in a private email (see my
> email last monday for details). I have some, concerns about it but this is
> the part I fooly agree with
> 
> Recursive usage of DataLink to provide both science data and
> calibration-used data
> 
> #progenitor link followed by #this link to get science data
> 
> #progenitor link followed by #calibration to get calibration data associated
> to these rawr science data

While I don't believe this belongs into a discussion of VEP-006, this
is one reason why I'm rather skeptical of your VEP-009: With current
#progenitor, the link from the reduced to the raw datalink document
would reasonably be #progenitor.  With VEP-009, that is quite
certainly no longer the case (unless your definition of "science
data" took a surprising turn later).  But let's discuss that with
VEP-009.

> The consequence of this is that #progenitor itself are science data

No, in that case it would be a mix of "science data" and all kinds of
other things that went into the reduction.  Which, mind you, is fine,
and I think it's the way such things should be done.  It's just not
within the concept your description in VEP-009 seems to try to
define.

> > If you disagree on this assessment: How would VEP-006 influence this
> > deliberation?
> VEP-006 is not proposing new terms it's changing the definition of old terms
> in a sense that calibration-applied is now forbidden.

Well, the concept "pre-VEP-006-#calibration minus
post-VEP-006-#calibration" apparently is not populated in the current
VO, and as I argued in in two mails back, when members of that
concept come around, all the options are still around whether or not
we accept VEP-006; so, again, I don't see why we can't at least get
VEP-006 off the table.


> > If not, François, can you at least agree to: "I think VEP-006 is
> > wrong, but I'll not veto it"?
> 
> Exactly this : if nobody interested I give up. But I think we will encounter
> consistency issues in a near future if we don't discuss the consequences of
> this major change of definition for #calibration.

Can you speculate what consistency issues you expect to see?  Because
you see, the only reason VEP-006 is there is that without it there's
the problem that current #progenitor and #calibration have a nonempty
intersection and a nonempty difference, which is really bad in a
formal vocabulary of this sort.  

Now, since the only point of the VEP is to fix an inconsistency, it
would defeat the purpose if new ones came up.


Anyway, now that Fançoise has chimed in -- what do we do?

For me, it would still be helpful to see what problem you François,
are trying to solve or you, Françoise, see as well.  And please try
to be as concrete as possible and to limit things to VEP-006.  And if
you feel that is *reayll* impossible, at least make a strong and
reproducible case for why we have to solve the question of the
"Part-of-Provenance" concept together with it.

              -- Markus


More information about the dal mailing list