About the Plea Re: about #calibration (VEP-006) : ----> IMPORTANT for DataLInk EXTENDED USAGE
BONNAREL FRANCOIS
francois.bonnarel at astro.unistra.fr
Wed Oct 13 19:01:52 CEST 2021
Hi again,
I repeat that if TCG has no concern with VEP-006 consequences I will not
block it.
Again I would prefer the rationale (or plea) not be based on wrong
statements as I can read one below
Le 12/10/2021 à 09:16, Markus Demleitner a écrit :
>
> GENERAL PLEA
> ============
>
> Perhaps we should have sent around summaries of the off-mailing list
> discussions we had on VEP-006; this might have saved some cycles in
> these discussions. Anyway, if considering to enter the fray, please
> carefully read the following exposition to avoid unnecessary
> repetitions of arguments, and in particular make sure you state where
> you disagree and what your dissenting position is.
>
> You see, VEP-006 isn't a matter of taste, it's fixing a bug. A bug
> that's currently not biting us, but only because a certain class of
> links hasn't been used yet.
There was no "bug" for #calibration only maybe a too loose definition
for #progenitor (the reason why I proposed #VEP-009) which could
encompass #calibration at first reading.
But nobody would have liked to use #progenitor for "calibration applied"
because #calibration existed !!!
The previous #calibration definition was using the past tense.
Most of what Markus write below is interesting and clarifies the things
apart that it is only proposing a solution for use case "applicable"
VEP-006 rationale : we have a new use case "calibration-applicable". I
write it's rather "new" because it's related to contexts wher the
services distribute raw data. Which I think was less the case 7 years
ago than, it is now.
VEP-006 : redefine #calibration to match #calibration-applicable
VEP-006 consequences : I'm confident that use cases for
calibration-applied will be implemented very soon (private discussions /
not public at the moment). They could use #calibration. After VEP-006 is
adopted they cannot anymore.
It would have been better to let #calibration as it was and to create
something new for calibration-applicable : there are some difficulties
with that : seem my email last week
IF we admit the change of definition in VEP-006 (which I did already
last week) what do we do for calibration-applied ?
Extensive discussion has shown that there is great reluctancy in our
groups to use a global #progenitor for that.
As far as I understand it the "complex solution" proposed by Markus is
close to what I called in my email last monday (The 4th) "1 ) the
duplicated tree solution "
It has the drawback to duplicate each child term of #calibration into
#calibration-applied (or whatever we call it)
Do we want to go to a new VEP for that now ? Or do we have a look to the
other possible solutions I listed in that email ?
Regards
François
>
>
> Theoretical Background
> ----------------------
>
> Our formal vocabularies (in the case of datalink, an RDF properties
> vocabulary) are, mathematically, graphs of concepts. A concept is a
> subset of the universe of discourse, which in the case of Datalink is
> the cartesian product of datasets × (URI resources) [1]; said a bit
> less abstractedly: A datalink document assigns labels to pairs of
> pubDIDs and generic URIs.
>
> That Datalink concepts are relations rather than sets of things makes
> things look a bit tricky, but don't let that distract you. Think of
> animal taxonomies if you're confused: The Alpakas are a subset of the
> Camels, which are a subset of the Mammals. Yes, the world usually
> isn't structured like that, but we're building *models* here to make
> *computers* interact with the world in useful ways. Semantics is
> useful only insofar it does this: let computers do useful things.
>
> Anyway, within this graph of the datalink vocabulary, the main
> relationship is rdfs:subPropertyOf. Basically, this relationship
> means that if A is a subproperty of B, then A is a subset of B.
> This, in particular, means that A cannot have elements that are not
> elements of B.
>
>
> The Calibration Problem
> -----------------------
>
> #calibration, as defined pre-VEP-006, covers all kinds of files that
> can somehow be used for calibration. This, in particular, can
> concern files coming with (relatively) raw data -- the classical
> example is a raw CCD frame that comes with flats, bias frames, and
> whatever else; but note that today's real cases tend to be a lot
> trickier, and it's usually nowhere as easy any more to tell "science"
> from "calibration" data -- on the one hand, and similar files people
> may want to attach to the reduced data to aid in debugging on the
> other.
>
> Meanwhile, we have a top concept #progenitor, that, despite its
> current identifier and label, really is "Part-of-Provenance". We may
> want to discuss whether it's a good idea to have the identifier
> #progenitor for this, and I think I agree the Label "Progenitor"
> ought to be changed, but that's a different discussion. The concept
> "Part-of-Provenance" is there, and I don't think anyone disputes that
> it's useful.
>
> As soon as we have this concept, pre-VEP-006 #calibration is a
> problem, because parts of it belong to Part-of-Provenance (although
> nobody has yet spotted any of that in the wild), and other parts
> do not.
>
> As argued above, we can't have that.
>
>
> Separation of Concerns
> ----------------------
>
> The obvious solution is to split up the current concept. This is
> what VEP-006 does, taking away anything that's part of the
> provenance.
>
> One could do it the other way round, taking out all that's *not* part
> of provenance, but there are two reasons why that's rather clearly
> less desirable:
>
> (a) there are links in the wild matching VEP-006's definitions, but
> none that don't.
>
> (b) #calibration has subproperties #bias, #flat, and #dark. It is
> conceivable that, with a bit of care, that semantics is marginally
> enough to enable the "use data" use case for a certain class of raw
> data ("harmless CCD frames", say). The use case of the
> Part-of-Provenance concept is debugging, and hence there's always a
> human figuring out what is what. For them, inspecting the
> descriptions is easy, and hence there's no remotely plausible
> scenario where the subproperties might come in handy.
>
> There's a third option: deprecating #calibration and inventing
> something else.
>
> But that's really it. We'll simply have to choose between one of
> these three options, or we'll knowingly keep a potentially harmful
> bug in the vocabulary.
>
>
> Blocking anything?
> ------------------
>
> Based on these considerations, I'd say VEP-006 is obvious. Again, if
> you disagree, please state clearly what part of this derivation you
> disagree with. If there really is an error in this derivation, I
> can't fix it if you just say "I don't believe you" or "I feel things
> should be different".
>
> In particular, VEP-006 explicitly leaves open the question of what to
> do with calibration-type Part-of-Provenance links once somebody wants
> to have them, in contrast to what François seems to occasionally
> imply.
>
> If they come along, we can do either of
>
> * Stick them into Part-of-Provenance (whatever this will then be
> labeled as)
> * Create a child of that Part-of-Provenance concept somehow trying to
> define what exactly makes calibration data calibration data if we
> find a use case where that's necessary
> * Stick the whole concept somewhere entriely different because when
> we actually understand why someone creates such links we notice
> that's not about debugging at all but about... well, I don't know,
> but if it happens, VEP-006 will certainly not be our problem.
>
> So... Feel free to discuss on. But please do it on the basis of what
> was already worked out regarding VEP-006 in the past 13 months
> (phewy!), and please do not ignore that whatever we do must be
> consistent with RDF and the wider world of semantics.
>
> Thanks,
>
> Markus
>
>
> [1] ok, this is a bit of a simplification, but bear with me here;
> making this more careful wouldn't change the conclusions.
More information about the semantics
mailing list