Vocabulary construction principles [was: #calibration (VEP-006)]

Thu Oct 21 11:38:35 CEST 2021

Hi Markus,
Le 18/10/2021 à 15:13, Markus Demleitner a écrit :
> Dear François,
>
> As this is hard-core semantics pretty unrelated to datalink
> specifically, I'm taking this thread off of DAL.
>
> I will also combine back two of your mails because they both
> basically deal with a very fundamental question: How and why do we
> construct our vocabularies?
>
> On Wed, Oct 13, 2021 at 07:01:52PM +0200, BONNAREL FRANCOIS wrote:
>>> You see, VEP-006 isn't a matter of taste, it's fixing a bug.  A bug
>>> that's currently not biting us, but only because a certain class of
>>> links hasn't been used yet.
>> There was no "bug" for #calibration only maybe a too loose definition for
>> #progenitor (the reason why I proposed #VEP-009) which could encompass
>> #calibration at first reading.
> No, it's not #progenitor's definition that is at fault here, it is
> the pre-VEP-006 #calibration concept, as it comprised links with two
> very different uses: "using data" and "debugging data".

VEP-006 discussion is over. That's a compromise.

But I still think #progenitor has toi be revisited. That's VEP-009 
discussion

>
> We're building our vocabularies not to have some model of reality, or
> even to mimic some person's (or persons') conceptions.  We're
> building them to enable the computer to do things.  In the case or
> datalink, the computer should be able to pick the proper links
> depending on whether you're "using" or "debugging" the data.
Fo this, I agree
>
> There's no basic difference to, say, reference frames: If you want
> the computer to automatically transform between frames, you can't
> have Galacitc and ICRS in one concept (that the computer sees).  And
> it be unwise to have two concepts (ICRS and J2000, say) that almost
> mean the same thing, except sometimes, where the computer can't tell
> when that sometimes is.
>
> Back in VEP-006's vicinity I still seems to me that already have
> concepts corresponding to the two use cases mentioned above in
> datalink/core with #auxiliary and #progenitor; but even if...
>
>> But nobody would have liked to use #progenitor for "calibration applied"
>> because #calibration existed   !!!
> ...we didn't as you're claiming here, these concepts "exist" for
> datalink by virtue of their pragmatics, and we would eventually have
> to define them if datalink/core is to be useful.
>
> Now, pre-VEP-006 #calibration simply has parts of both of these
> concepts.  *That* is the bug, and it cannot be fixed by cleverly
> trying to write definitions or re-defining #progenitor or much
> anything else.  You can only either scrap #calibration entirely -- or
> take out the links corresponding to one of the two use cases.
>
> If you want, the bug is on the level of pragmatics, not of semantics.
>
>> VEP-006 consequences : I'm confident that use cases for calibration-applied
>> will be implemented very soon (private discussions / not public at the
>> moment). They could use #calibration. After VEP-006 is adopted they cannot
>> anymore.
> Yes, and it's good that we caught the problem in time before they did
> that.  If they had, when would a computer have shown #calibration
> links in the future?  When using?  When debugging?  In both cases?
> In neither?

I definetely prefer two terms personally. Just that that could have been 
done the other way (more conservative)

That's the compromise that I gave up on this.

>
>> IF we admit the change of definition in VEP-006 (which I did already last
>> week)  what do we do for calibration-applied ?
> Well: Define a new concept once it's clear why a computer would need
> it and what it will do with it.
>
>> Extensive discussion has shown that there is great reluctancy in our groups
>> to use a global #progenitor for that.
> So we need to figure out where that reluctance comes from.  Preparing
> for the VEP-009 discussion (but let's have VEP-007 before that), it
> would already be useful if you could state what exactly it is you
> don't like about #progenitor: Is it with the whole concept
> "Part-of-Provenance" (and its pragmatics "show when debugging"), is
> it the label "Progenitor" that itches you, or is it the identifier
> #progenitor?

I would be strange to have an identifier different from the Label in 
that context.

The basic "pragmatics" of distinguishing progenitors (science data) from 
calibration is to allow all VO clients to sort out these things in 
different directions automatically.

Then each client (or human user) can do various things.

By the way in the full IVOA provenance model the activity producing 
let's say, the exposed dataset is "using" other entities. This "usage" 
relationship has a "type" (see page 26 of the spec)

It clearly distinguishes the "Main" type from the "Calibration" type.

In dataLink we don't have activities yet and we simply bypass that 
activity to link an exposed dataset to "entities" used by an "unknown 
activity" to produce that.

Progenitor is similar to the "Main" type of usage. And Calibration is 
obviously #calibration_applied (or whatever we call it in the future)

>
>> Do we want to go to a new VEP for that now ? Or do we have a look to the
>> other possible solutions I listed in that email ?
> Well, let's follow the rules (if we find, a year or so down the road,
> that the rules actually create serious problems, we can still revisit
> them, but give them a year, ok?): Create concepts when they're
> needed.  You see, if we invent concepts out of thin air and without
> consumers, I'm rather sure we will only create more cases like
> #calibration and #progenitor that cause a lot of headache later.

OK, but we have to fix  #progenitor anyway even if we don't have yet 
#calibration-applied (see VEP-009)

The idea is to follow the end of your PLEA posted last week

> Blocking anything?
> ------------------
>
> Based on these considerations, I'd say VEP-006 is obvious.  Again, if
> you disagree, please state clearly what part of this derivation you
> disagree with.  If there really is an error in this derivation, I
> can't fix it if you just say "I don't believe you" or "I feel things
> should be different".
>
> In particular, VEP-006 explicitly leaves open the question of what to
> do with calibration-type Part-of-Provenance links once somebody wants
> to have them, in contrast to what François seems to occasionally
> imply.
>
> If they come along, we can do either of
>
> * Stick them into Part-of-Provenance (whatever this will then be
>    labeled as)
> * Create a child of that Part-of-Provenance concept somehow trying to
>    define what exactly makes calibration data calibration data if we
>    find a use case where that's necessary
> * Stick the whole concept somewhere entriely different because when
>    we actually understand why someone creates such links we notice
>    that's not about debugging at all but about... well, I don't know,
>    but if it happens, VEP-006 will certainly not be our problem.
>
> So... Feel free to discuss on.  But please do it on the basis of what
> was already worked out regarding VEP-006 in the past 13 months
> (phewy!), and please do not ignore that whatever we do must be
> consistent with RDF and the wider world of semantics.
The discussion below is now secondary

Cheers

François

>
> Which brings me to your other mail:
>
> On Wed, 13 Oct 2021 17:48:37 +0200 BONNAREL FRANCOIS wrote:
>> If the "description" display is enough for "applied" why is it not the case
>> for "applicable" (VEP-006 definition for #calibration) ?
> I give you that it's not clear if #calibration would make it to a
> term if were proposed today (when would a computer have to tell such
> links apart from other data necessary for using data?).
>
> My gut feeling would be that #auxiliary would have a good chance of
> being good enough.  But I think I can also see a "cut the crap" mode
> where you just fold in anything that's clearly "calibration" as in
> "data not specifically observed for this dataset" (nb you're welcome
> to come up with a better definition -- I've found it really hard to
> reproducibly say what's "science" and what's "calibration" data in
> modern experiments).   "Cut the crap" sounds like pragmatics I could
> find convincing.
>
> I'm pretty sure there's no actual case for #dark, #flat, and #bias --
> that level of description quite certainly isn't enough to let a
> computer confidently reduce raw images of modern instruments.  So,
> I'd not expect these to make it through a VEP, and I wish they
> weren't in datalink/core.
>
>> You write that reprocessing is too complex for datalink in the case of
>> "already applied" but I imagine it's excatly the same for applicable.
> Not quite the same (see below).  But that's not really the question
> here: the concepts do exist, and before we deprecate them, we can as
> well give them the meaning that has the "more likely" pragmatics.
> What I think *might* make them useful is that someone writes a mass
> reducer for raw data *of a certain instrument*, and for that specific
> case having a somewhat more precise annotation might help that
> particular mass reducer.  Also note we don't have any ProvDM
> annotation (yet?) that could confer further information in the case
> of raw data.
>
> For reduced data, you certainly cannot build a "mass debugger";
> there's always a human working on the document, I'd say, and that
> human can read the description.  Plus, for a fainer-grained and
> hopefully machine-readable description there's ProvDM.
>
> So: Yes, there's no really strong case for #calibration, but since
> it's there and we just have to decide for the computer, VEP-006 at
> least defines it so it covers the less implausible use case.
>
>             -- Markus