VEP6: blurry definition for the term #calibration

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Fri Mar 26 16:30:28 CET 2021


Hi Paul,

On Fri, Mar 26, 2021 at 02:21:07PM +0000, Paul Harrison wrote:
> here are my thoughts on this, and derive from my definition of of the terms.
> 
> * Fundamentally a “calibrator” is not a progenitor, but it is a
> modifier of the progenitor(s) to create the progeny

Hm -- that's certainly not what our current definition says:

  data resources that were used to create this dataset (e.g. input
  raw data)

...would obviously include anything that contributed, including, say,
dark frames for CCD images that included them, or, say, a set of
evolutionary tracks *as well as* images in different filters for a
stellar population analysis.

I'm not too hung up on the current definition (others, who perhaps
already use it, may be, but I'll let them speak for themselves), so
I'm happy to replace it with something else.  Do you have a
suggestion that perhaps could be made into a VEP?

(Disclaimer: I feel the current definition is nicely logical and
easily testable, so I'd be lying if I said I didn't like it).

But in general: When considering VEPs (or assigning terms), don't
look at the term literals ("what's behind the hash").  They, in the
end, talk to the computer.

Look at the labels and the definitions to figure out what things mean
in our semantics, because these are there for human consumption.
Yeah, we should definitely avoid using totally inappropriate term
literals, but I'd say the current definition of #progenitor is
totally defensible based on how people actually speak.

> this rules out any solutions that;
>    - point to calibrator files as “#progenitor”
>    - have #calibration as a child of #progenitor in the vocabulary.

...assuming we change the definition of #progenitor, which I'm not
convinced yet we should.

> the consequences of this are
> 
> * #calibration links do not make the distinction between “can be
> used” and “was used” - either could be true.

So, you're basically proposing in addition to the four options at the
foot of

  http://mail.ivoa.net/pipermail/semantics/2021-March/002778.html

to have 

(e) declare that #progenitor and #calibration are disjunct by
changing #progenitor's definition to "it's only... hm... 'science
data'", and any sort of calibration data is outside of the provenance
chain.

Would that reflect your sentiment?  And of the (a)-(d) in my original
mail, could you perhaps assign pain levels
(https://blog.g-vo.org/building-consensus/#scale) to them?

If (e) is (stays?) your favourite: In my example above with the
stellar population catalogue that uses both stellar tracks and a
bunch of images: would the stellar tracks be a #progenitor or
a #calibration or perhaps still something else?

I also note that if we go for (e) things will break hard as soon as
we do add a concept for "Upstream in Provenance" (which I think
#progenitor is now) or "Files making #this usable" (which I think
would cover Mireille's PSF as well as the VEP-006 meaning of
#calibration), because then we'll somehow have to get rid of
#calibration (see my mail from 11:54 today).

> * the distinction between raw and calibrated data is made with
> another piece of metadata.

I, frankly, don't think that distinction is even possible, as one
analysis' calibrated data is the next analysis' raw data (if you, for
the sake of this argument, identify "raw" with "earlier in the
processing chain" and "calibrated" with "later in the processing
chain").

> I think that this is sufficient, because, as has been pointed out
> earlier in this thread, in order to really redo calibration with
> most modern instrumentation would require more detailed provenance
> anyway.

But that is not what the use case is here.  The use case is to
filter out rows you're not interested in in various situations (I've
called the situations "use" vs. "debug", and I'm still convinced
these *are* different situations).

> This does mean some change to
> https://www.ivoa.net/rdf/datalink/core/2020-09-02/datalink.html
> <https://www.ivoa.net/rdf/datalink/core/2020-09-02/datalink.html>
> but different wording to that proposed in VEP6

I'm really grateful for suggestions and submissions -- my current
role as VEP author and, hopefully, moderator is not too comfortable,
(although I do admit that two of the outcomes I'd find terrible either
way).

Oh, and: While this may be a case where you really want version-sharp
references to the vocabulary, in general please make it a habit of
citing these vocabulary by vocabulary URI, which in general is not
what you're redirected to.

I'll try to make this a bit more prominent on these pages, but they
already give the vocabulary URIs, which in this case is
http://www.ivoa.net/rdf/datalink/core.  Only that will point future
readers to whatever the vocabulary is then (and that's the reason for
my reservation above: In this case, we may be talking about a
concrete *version* of the vocabulary -- which normally we shouldn't).

          -- Markus


More information about the semantics mailing list