VEP6: blurry definition for the term #calibration
Markus Demleitner
msdemlei at ari.uni-heidelberg.de
Fri Mar 26 11:54:29 CET 2021
Hi Stéphane, Dear Semantics WG,
On Thu, Mar 25, 2021 at 10:36:58AM +0100, Stéphane Erard wrote:
> The difference between #calibration and #progenitor in this context
> (calibration data for raw vs calibrated) seems unnecessarily
> complicated to me, and possibly misleading.
Admittedly, (semi-) formal semantics is sometimes a bit tedious, but
that's because computers are tedious, and what we're doing here is
explaining things to computers.
So, the question we answer with VEP-006 (or an alternative) is, in
practical terms: should
datalink_result.bysemantics("#progenitor")
in pyVO (with https://github.com/astropy/pyvo/pull/241 applied)
return #calibration links or not? Since we're talking to computers,
the answer can't (usefully) be "maybe".
This has a mathematical background. As explained in the Vocabularies
spec (shameless plug: It's in RFC, review now!), our terms correspond
to concepts, that is, subsets of our universe of discourse (well,
actually, these subsets are called "extensions"; cf.
https://ivoa.net/documents/Vocabularies/20210114/PR-Vocabularies-2.0-20210114.html#tth_sEc5.2.4)
Datalink is a tree-like vocabulary, and that means that concepts
either need to be disjunct, or one needs to be a subset of the other.
Hence, the root of the matter is to figure out whether
* #calibration is disjuct with #progenitor or
* #calibration ⊂ #progenitor
– or we'll have to scrap one of the concepts, since I'm sure
#calibration ⊃ #progenitor is not an option
> In any case, I would certainly reserve #progenitor to identify
> calibrated products used to build a derived product. If this also
Did you mean "uncalibrated products used to build #this"? If that is
true, that's a possibility, but we would then have to fix
#progenitor's definition (anyone up for a VEP?).
> I would also certainly expect the calibration process to be
> complex, instrument / experiment dependent, evolving with time,
> with inclusion of extra steps and alternative
Right -- but that's not in Datalink's purview any more, that's hard-core
Provenance.
> In short, I think datalink alone cannot always provide all the
> information needed to describe the actual calibration process, and
> I wouldn’t rely on that.
Exactly.
> When accessing calibrated data, what I really need is a detailed
> description of the calibration, and this goes beyond a list of
> calibration files in the general case. Listing the calibration
Right. It would be an interesting exercise to use ProvDM to annotate
a Datalink response with that extra information, but that's far
beyond our current question (but something I'd consider exceedingly
useful as a way to furnish things with provenance information without
changing them).
> Therefore, I don’t see any compelling reason to use anything else
> than #calibration in both cases, as the concept « can be used » vs
> « was used » is given by the status of the file (raw vs
> calibrated). Using different tags when a calibration file has been
> applied or not (eg for calibrated and raw data files) is not
> helpful, but is not a show-stopper to me either. But please don’t
> mix observations with calibration files under the same #progenitor
> tag - that would become difficult to entangle.
I understand (and half-heartedly support) this use case; but the
non-destructive (to the vocabulary) way to deal with this (unless we
want to defer it to ProvDM annotation) is to define terms that are
children of #progenitor that make this distinction. I'm happy to
assist in writing a VEP to do that.
But saying a #calibration file sometimes is a #progenitor and
sometimes not is subverting the whole scheme, and that's a big step
to take.
So, the situation as I see it is that we'll have to decide between
one of the following options:
(a) We keep thing as they are and we just forget about datalink
semantics being a tree. You'll understand that I'd be seriously
unhappy with that outcome.
(b) We make #calibration a child of #progenitor ("#calibration
⊂ #progenitor"). That's a fine solution, except I'd ask the
proponents of that to convince Pat, who has, in effect, proposed
VEP-006.
(c) We accept VEP-006, perhaps with some fixes to labels or
definitions (I'm totally open to suggestions); we can then have
additional terms to tell apart "science data" and "calibration files"
below #progenitor.
(d) We deprecate #calibration and children, saying the concepts
cannot be properly defined (and it'd take quite a bit of reasoning to
wear down my resistance against that).
I think that's about it -- or have I forgotton some additional option?
I'd be grateful if people could voice their preferences...
Thanks,
Markus
More information about the semantics
mailing list