VEP-006: Discussion summary

Wed Jun 16 12:47:52 CEST 2021

Dear all,

I think I never publicly gave my opinion on this VEP, although I 
discussed it in private probably.
I will do it with my DataLink author's hat.

When a use-case requires modification/addition of/in the "semantics" 
field vocabulary I think it's wise to adopt following attitude for  
(because compatibility purposes:

1 ) minimize term discrepancy
2 ) minimize term definition changes
3 ) minimize tree re-arranging (because clients may select links on tree 
head element instead of simple terms)

The best thing for a new need is to create a new term (and even a new 
branch)

For these reasons I don't agree with Markus proposal and I also 
partially disagree with Mireille counter-proposal (for reason 3 above) 
although I follow most of her argumentation against the VEP proposal as 
it is now.

1 ) The current definition of #calibration (and child elements) is 
unambiguous I think.  They currently read "resource used to calibrate 
the primary data" , "used to subtract the detector offset level" (bias), 
"used to subtract the accumulated detector dark current" (dark), "used 
to calibrate variations in detector sensitivity" (flat)
To me this looks unambiguous and means that the link's target HAS been 
used to calibrate this. And I think the use-case for that is quality 
checking as Mireille an Paul already enhanced it.

2 ) I don't think the proposal to merge calibration stuff already used 
inside progenitor is valid for two reasons
       a ) this is not the usage I know for this term. I think it's used 
for "rawer" dataset than #this. The current definition may be a little 
bit ambiguous and could be slightly modified "data resources that were 
used to create this dataset (e.g. input raw data)" --> the example 
should become the definition
       b ) I don't think we can find anybody in the VO who used 
#progenitor for calibration files ... just because we had #calibration 
branch beside from the beginning !!!

3 ) push calibration stuff inside #auxiliary (which is currently too 
loosely defined "auxiliary resources") . I don't think it's a good idea, 
because the child terms (weight, noise, error) look like byproducts of 
the processing

4 ) For all these reasons I think the need expressed by Markus 
(calibration stuff which CAN BE APPLIED to #this) needs a new branch. 
Something like #calibration-applicable (or #applicable-calibration). IN 
addition we could refine the definition of #progenitor (to bind it more 
closely to rawer science data) and maybe of #auxiliary

5 ) the main drawback of this proposal is that we will have to repeat 
the calibration branch children in parralel into calibration-applicable 
(eg : #flat and #flat-applicable, etc...)
      I see another possible solution following Ada Nebot thoughts 
(https://github.com/ivoa-std/DataLink/issues/44) . "applied" or 
"applicable" is a "relationship" to #this. "calibration" belongs to 
"information". If we COULD admit to combine terms we COULD avoid 
duplicating the branch. Just create #applicable and #applied terms and 
then write
               #calibration;#applied, #dark;#applicable etc ....
       In addition for upgrade compatibility we could imagine that 
#calibration terms are "#applied" by default and then have addition of 
#applicable (when appropriate) MANDATORY.

Cheers
François

Le 10/06/2021 à 17:54, Mireille LOUYS a écrit :
> Hi every one ,
>
> Sorry for this late answer .
>
> As shown in the latest paragraph , as it has been discussed in 
> examples on the list , and at interop splinter meeting recently,
> it seems we should have a split between two notions and do not live 
> for an intermediate fuzzy annotation , that would be difficult to 
> change afterwards.
>
> Confusion today for some datalink services would mean confusion 
> tomorrow, when we reuse existing examples to build new services .
>
> So better clarify *now* the split between progenitor and calibration.
>
> this is the datalink's  semantics labels tree I would go for :
>
> #data
>     #progenitor  data used to create #this
>
>     #auxiliary data use to facilitate the interpretation of #this: for 
> understanding data quality or reprocessing , etc .
>         #calibration-applied
>         #calibration-applicable
>
> and it would solve the pb of mixing  #progenitor and #calibration 
> labels, which is a situation we want to avoid.
>
> Best , Mireille
>
> Le 08/06/2021 à 10:05, Markus Demleitner a écrit :
>> Dear Semantics community,
>>
>> At the interop, we had a side meeting on VEP-006 (#calibration
>> definition).  I *think* we reached a sufficient consensus here, as
>> usual with some reservations.  I have tried to summarise the
>> discussion in VEP-006,
>> https://volute.g-vo.org/svn/trunk/projects/semantics/veps/VEP-006.txt
>>
>> I'm also reproducing it below.  Do people feel their contributions
>> sufficiently considered and represented?  If not, what changes would
>> you like to see (direct commits to Volute cordially invited)?
>>
>> Thanks,
>>
>>               Markus
>>
>> And here's VEP-006 as of rev. 5976:
>>
>> Vocabulary: http://www.ivoa.net/rdf/datalink/core
>> Author: Markus Demleitner <msdemlei at ari.uni-heidelberg.de>
>> Date: 2020-09-09
>>
>> Term: #calibration
>> Action: Modificiation
>> Label: Applicable Calibration
>> Description: Data products that can be used to remove instrumental
>>    signatures from #this.  Note that the calibration steps such data
>>    products feed have not been applied to #this yet.   To link
>>    calibration data already reflected in #this, use #progenitor.
>> Used-in: 
>> http://dc.g-vo.org/kapteyn/q/dl/dlmeta?ID=ivo%3A//org.gavo.dc/~%3Fkapteyn/data/fits/POT015_000317.fits
>>
>> Term: #bias
>> Action: Modification
>> Description: Data products that can be used to remove detector offset 
>> levels
>>    from #this.
>>
>> Term: #dark
>> Action: Modification
>> Description: Data products that can be used to remove detector dark
>>    current from #this.
>>
>> Term: #flat
>> Action: Modification
>> Description: Data products that can be used to remove the signature of
>>    non-homogeneous detector sensitivity from #this.
>>
>> Rationale:
>>    In a discussion on the semantics mailing list (see
>> http://mail.ivoa.net/pipermail/semantics/2020-June/002735.html
>>    and follow-ups) it was found that the existing descriptions of
>>    #calibration and its narrower terms are ambiguous; "resource used
>>    to calibrate" could mean both "resource that has been used" or
>>    "resource that can be used".  This VEP tries to make it clear that 
>> the
>>    "has been used" interpretation is for #progenitor, wheras 
>> #calibration
>>    is for "can be used".
>>
>> Discussion:
>>    On the Semantics mailing list
>> (http://mail.ivoa.net/pipermail/semantics/2021-March/002774.html and
>>    followups), concerns were brought forward that excluding calibration
>>    data already applied would unnecessarily complicate the vocabulary;
>>    the temporal aspect ("has been applied" vs. "can be applied") should,
>>    if possible, be kept out of it.  Against that it was put forward that
>>    doing this would leave parts of #calibration within #progenitor (the
>>    "has been applied" part), other parts essentially in what some people
>>    suggested is #auxiliary (the "can be applied" part").  This violates
>>    the conditions for keeping the concepts organised in a tree, which
>>    was considered undesirable.
>>
>>    On the other hand, it was recognised that being able to trace 
>> "science
>>    data" (as opposed to auxiliary resources like calibration data)
>>    through the provenance chain is valuable.  A method proposed to 
>> effect
>>    this, given that with VEP-006 #calibration is not available for this,
>>    could be to narrow the definition of #progenitor to "less calibrated
>>    science data".  But even if this step is not taken and #progenitor
>>    remains "anything upstream in the provenance chain", a new term
>>    #calibration-applied would seem useful (an example given was: when
>>    fusing 50 images, people want to tell those apart from, for instance,
>>    a master PSF that also went into the fusion).  Parties having use for
>>    such a concept are encouraged to author a VEP for it.
>>
>>    In the end, after a side meeting at the May 2021 Interop consensus 
>> was
>>    found that #calibration should certainly not contain elements both in
>>    and outside of #progenitor; it was agreed that while, if we started
>>    again today, we would call the VEP-006 #calibration something like
>>    #calibration-applicable.  However, given the label is there, and that
>>    the level of detail below #calibration (with #bias, #dark, and #flat)
>>    probably mainly is useful (as far as datalink with its focus on
>>    actionable semantics is concerned) when a client wants to
>>    semi-automatically perform the calibration itself, it was decided 
>> that
>>    #calibration is kept with its label changed to "Applicable
>>    Calibration" and a corresponding definition.
>>
>>    As we sharpen the definition of #auxiliary ("resources aiding the
>>    scientific exploitation of #this"), #calibration should probably
>>    become a child of it.  This, however, would be part of a VEP on
>>    #auxiliary.
>