VEP6: blurry definition for the term #calibration

Paul Harrison paul.harrison at manchester.ac.uk
Sat Mar 27 13:01:23 CET 2021


Hi Markus,

> On 2021-03 -26, at 15:30, Markus Demleitner <msdemlei at ari.uni-heidelberg.de> wrote:
> 
> Hi Paul,
> 
> On Fri, Mar 26, 2021 at 02:21:07PM +0000, Paul Harrison wrote:
>> here are my thoughts on this, and derive from my definition of of the terms.
>> 
>> * Fundamentally a “calibrator” is not a progenitor, but it is a
>> modifier of the progenitor(s) to create the progeny
> 
> Hm -- that's certainly not what our current definition says:
> 
>  data resources that were used to create this dataset (e.g. input
>  raw data)
> 
….
> 
> But in general: When considering VEPs (or assigning terms), don't
> look at the term literals ("what's behind the hash").  They, in the
> end, talk to the computer.

Ok, but I think the slight problem is that progenitor and calibrator are already fairly precise
technical terms in english, and that’s going to drive an individual's ’natural’ interpretation quite strongly.

I think that the sentiment that I expressed was really the same as Mireille 

—8<-----
One more reason : 
#progenitor should be reserved to designate the data in transformation through various steps within a pipeline.
this applies to the data stream...
calibration, configuration, parameter sets have a distinct nature with respect to the data processing.
The two categories should not be mixed, in my view. 
—8<——---

However, from now on let’s just consider them “classifier terms” and not worry about their exact definition in natural english for the rest of the email.

> So, you're basically proposing in addition to the four options at the
> foot of
> 
>  http://mail.ivoa.net/pipermail/semantics/2021-March/002778.html
> 
> to have 
> 
> (e) declare that #progenitor and #calibration are disjunct by
> changing #progenitor's definition to "it's only... hm... 'science
> data'", and any sort of calibration data is outside of the provenance
> chain.
> 

No - I would go for a modification of your option b) and add another child of #progenitor, perhaps #antecedent - though in natural english
I think that they are virtually exact synonyms - that expresses that the file is a direct “less processed data” #progenitor in the sense of my distinction that #calibration is a modifier of rather than a “direct ancestor”, so that #calibration and #antecedent are disjunct.

I think that is a fairly ‘backwards compatible’ change. 

>> * the distinction between raw and calibrated data is made with
>> another piece of metadata.
> 
> I, frankly, don't think that distinction is even possible, as one
> analysis' calibrated data is the next analysis' raw data (if you, for
> the sake of this argument, identify "raw" with "earlier in the
> processing chain" and "calibrated" with "later in the processing
> chain").

well agreed, but I think that the part of VEP6 wording that prompted me into this assertion is the suggestion

-8<---
 This VEP tries to make it clear that the
  "has been used" interpretation is for #progenitor, wheras #calibration
  is for "can be used".
-8<—

I think it is better that the link tagging reflects the “kind” of the data resource rather than its state, and it is the mixing of the
two that is causing problems. 


> 
>> I think that this is sufficient, because, as has been pointed out
>> earlier in this thread, in order to really redo calibration with
>> most modern instrumentation would require more detailed provenance
>> anyway.
> 
> But that is not what the use case is here.  The use case is to
> filter out rows you're not interested in in various situations (I've
> called the situations "use" vs. "debug", and I'm still convinced
> these *are* different situations).
> 

I still think that the basic classification that you want in this use case is between the #antecedents and the
#calibration - the data provider could be offering alternative #calibration that was not actually used to create #this and whether it was actually used needs some orthogonal metadata.

Of course it is possible to have #alternative-calibration as a tag, but then you run into unwanted repetition of child categories #alternative-flat etc. which makes it an ugly way to go.

I will write this up into an alternative VEP

Cheers,
	Paul.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/semantics/attachments/20210327/4225def6/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2893 bytes
Desc: not available
URL: <http://mail.ivoa.net/pipermail/semantics/attachments/20210327/4225def6/attachment.p7s>


More information about the semantics mailing list