VEP-006: Discussion summary

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Thu Jun 17 10:38:59 CEST 2021


Dear colleagues,

On Wed, Jun 16, 2021 at 12:47:52PM +0200, BONNAREL FRANCOIS wrote:
> 1 ) minimize term discrepancy
> 2 ) minimize term definition changes
> 3 ) minimize tree re-arranging (because clients may select links on tree
> head element instead of simple terms)

I totally agree with these principles, but of course my conclusions
are substantially different.  Which again shows that semantics is
hard.  To make progress nevertheless, I'd strongly suggest to take
one step at a time -- and only when we actually have a concrete use
for a concept, together with clear pragmatics ("what should a client
do with this term?").  Pat's discussion of #calibration and children
in the sibling of this mail is a good example for an approach
informed by pragmatics.

> The best thing for a new need is to create a new term (and even a new
> branch)
> 
> For these reasons I don't agree with Markus proposal and I also partially
> disagree with Mireille counter-proposal (for reason 3 above) although I
> follow most of her argumentation against the VEP proposal as it is now.

> 1 ) The current definition of #calibration (and child elements) is
> unambiguous I think.  They currently read "resource used to calibrate the

Well, we wouldn't be here if it were this unambigous.   And, as Pat
has pointed out, the existence of #bias and friends would suggest
that the "can be used" (rather than "has been used") is pragmatically
preferable, because...

> to calibrate this. And I think the use-case for that is quality checking as
> Mireille an Paul already enhanced it.

...when doing "quality checking" (I'd prefer the the label
"Debugging"), it's a human looking at things anyway, and they can
read and understand the descriptions (which the computer can't).

On the other hand, teaching a computer how to do the calibration of
raw products for a specific data collection looks like a realistic
use case, and that makes use of #flat and friends -- indeed, it is
probably hopeless without them.

This I only understood during this discussion; this means that I am
now rather convinced that VEP-006 is the right thing to do (over
sticking existing #calibration into "earlier in the provenance
chain") .

I hence retract my earlier statement that I don't really care either
way.

> 2 ) I don't think the proposal to merge calibration stuff already used
> inside progenitor is valid for two reasons
>       a ) this is not the usage I know for this term. I think it's used for
> "rawer" dataset than #this. The current definition may be a little bit

Well, it's not what the current definition says, so by your principle
(2) I'd say let's avoid that.  But more to the point: We can decide
on VEP-006 without having to solve the #progenitor question -- and
hence we should simply do it.

>       b ) I don't think we can find anybody in the VO who used #progenitor
> for calibration files ... just because we had #calibration branch beside
> from the beginning !!!

Well, I, for one, think #progenitor not only is (by its current
definition) but also should be the identifier for the concept
"earlier in the provenance chain".  Also, I wouldn't do too much
exegesis of the existing hierarchy -- it was created when we
understood a lot less both about datalink and the applications of the
semantics column.

But again: Let's keep this discussion out of the VEP-006
deliberations; it's orthogonal, and mixing it all up will mean we'll
never get anywhere.

> 3 ) push calibration stuff inside #auxiliary (which is currently too loosely
> defined "auxiliary resources") . I don't think it's a good idea, because the
> child terms (weight, noise, error) look like byproducts of the processing

I personally believe #auxiliary should end up as "Things useful for
scientific expoloitation of #this".  But that's not clear yet, and
that's why I've kept #calibration out of #auxiliary for VEP-006.  It
might or might not go there later, depending on how we end up
defining #auxiliary -- and again, we'll never get anywhere if we try
to solve all of these questions in one go.

> 4 ) For all these reasons I think the need expressed by Markus (calibration
> stuff which CAN BE APPLIED to #this) needs a new branch. Something like
> #calibration-applicable (or #applicable-calibration). IN addition we could
> refine the definition of #progenitor (to bind it more closely to rawer
> science data) and maybe of #auxiliary

This could be one way to go forward, but again it's orthogonal to
VEP-006, and in the interest of getting anywhere, we should keep the
discussion out of VEP-006 review.

> 5 ) the main drawback of this proposal is that we will have to repeat the
> calibration branch children in parralel into calibration-applicable (eg :
> #flat and #flat-applicable, etc...)

If we did it this way: What would be the pragmatics of the existing #flat and
friends that would then be in "earlier in the provenance chain"?  In
a debugging session, how would a client use the extra semantics of
"Flat field applied" (over just: it's some file used in debugging,
and the description (plus probably pipeline documentation) tells the
debugging person what it is)?

It's this lack of pragmatics of "Flat flield applied" that convinced
me that current #calibration shouldn't be put into "earlier in the
provenance chain".


After all this, let me try to state what I think we have a consensus
on, and how that ought to help delineate what's part of VEP-006 and
what's not:

* There are concepts "earlier in the provenance chain", "something
  useful for scientific exploitation", "calibration applied",
  "calibration applicable", and "rawer science product".
* "calibration applicable" is a subset of "something useful for
  scientific exploitation"
* "calibration applied" is a subset of "earlier in the provenance
  chain", as is "rawer science product"

Does everyone agree up to here?

If so, all that's left to do is writing up proper definitions for
these concepts and mapping them to the existing datalink identifiers.
I suspect (or perhaps: hope) that it's that what we've been
discussing all this time.

All that VEP-006 does is provide such a refined definition for the
concept "calibration applicable", and again, to make some progress
I'd ask everyone to try hard and keep it that way if we can.  I'm
happy to mention everything else in the VEP's discussion (feel free
to commit amendments to volute), but that should not hold up VEP-006
as such.

Seeing where #progenitor, #auxiliary, and perhaps *#rawer,
*#calibration-applicable  (where the * means "identifier doesn't
exist yet") fit and whether to have them at all should be the subject
of other VEPs.  Stride too far in one step and you'll trip.


One last point: That the identifiers of these concepts
("#calibration") appear to be "speaking" may make it a bit harder
here (though I still think it's a good idea *in the VO* in general).
But for this particular conundrum I'd ask all of you to imagine we'd
have opaque identifiers like, say, wikidata.  There, they use
identifiers of the form Q1044693 (which happens to mean Strasbourg
University) -- imagine #calibration looked like #datalink-3201; if
you'd then accept VEP-006, then please try hard to overcome your
reservations, considering that in actual presentation what people
would see is "Calibration applicable" (the label) and the definition
we'll be giving (cf., again, things like
http://dc.g-vo.org/kapteyn/q/dl/dlmeta?ID=ivo%3A//org.gavo.dc/~%3Fkapteyn/data/fits/POT015_000317.fits).

Thanks,

         Markus

PS: I can't resist mentioning that even wikidata has some speaking
identifiers; Q42 happens to be Douglas Adams...


More information about the semantics mailing list