Vocabulary construction principles [was: #calibration (VEP-006)]

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Mon Oct 18 15:13:58 CEST 2021


Dear François,

As this is hard-core semantics pretty unrelated to datalink
specifically, I'm taking this thread off of DAL.

I will also combine back two of your mails because they both
basically deal with a very fundamental question: How and why do we
construct our vocabularies?

On Wed, Oct 13, 2021 at 07:01:52PM +0200, BONNAREL FRANCOIS wrote:
> > You see, VEP-006 isn't a matter of taste, it's fixing a bug.  A bug
> > that's currently not biting us, but only because a certain class of
> > links hasn't been used yet.
> 
> There was no "bug" for #calibration only maybe a too loose definition for
> #progenitor (the reason why I proposed #VEP-009) which could encompass
> #calibration at first reading.

No, it's not #progenitor's definition that is at fault here, it is
the pre-VEP-006 #calibration concept, as it comprised links with two
very different uses: "using data" and "debugging data".

We're building our vocabularies not to have some model of reality, or
even to mimic some person's (or persons') conceptions.  We're
building them to enable the computer to do things.  In the case or
datalink, the computer should be able to pick the proper links
depending on whether you're "using" or "debugging" the data.

There's no basic difference to, say, reference frames: If you want
the computer to automatically transform between frames, you can't
have Galacitc and ICRS in one concept (that the computer sees).  And
it be unwise to have two concepts (ICRS and J2000, say) that almost
mean the same thing, except sometimes, where the computer can't tell
when that sometimes is.

Back in VEP-006's vicinity I still seems to me that already have
concepts corresponding to the two use cases mentioned above in
datalink/core with #auxiliary and #progenitor; but even if...

> But nobody would have liked to use #progenitor for "calibration applied"
> because #calibration existed   !!!

...we didn't as you're claiming here, these concepts "exist" for
datalink by virtue of their pragmatics, and we would eventually have
to define them if datalink/core is to be useful.

Now, pre-VEP-006 #calibration simply has parts of both of these
concepts.  *That* is the bug, and it cannot be fixed by cleverly
trying to write definitions or re-defining #progenitor or much
anything else.  You can only either scrap #calibration entirely -- or
take out the links corresponding to one of the two use cases.

If you want, the bug is on the level of pragmatics, not of semantics.

> VEP-006 consequences : I'm confident that use cases for calibration-applied
> will be implemented very soon (private discussions / not public at the
> moment). They could use #calibration. After VEP-006 is adopted they cannot
> anymore.

Yes, and it's good that we caught the problem in time before they did
that.  If they had, when would a computer have shown #calibration
links in the future?  When using?  When debugging?  In both cases?
In neither?

> IF we admit the change of definition in VEP-006 (which I did already last
> week)  what do we do for calibration-applied ?

Well: Define a new concept once it's clear why a computer would need
it and what it will do with it.

> Extensive discussion has shown that there is great reluctancy in our groups
> to use a global #progenitor for that.

So we need to figure out where that reluctance comes from.  Preparing
for the VEP-009 discussion (but let's have VEP-007 before that), it
would already be useful if you could state what exactly it is you
don't like about #progenitor: Is it with the whole concept
"Part-of-Provenance" (and its pragmatics "show when debugging"), is
it the label "Progenitor" that itches you, or is it the identifier
#progenitor?

> Do we want to go to a new VEP for that now ? Or do we have a look to the
> other possible solutions I listed in that email ?

Well, let's follow the rules (if we find, a year or so down the road,
that the rules actually create serious problems, we can still revisit
them, but give them a year, ok?): Create concepts when they're
needed.  You see, if we invent concepts out of thin air and without
consumers, I'm rather sure we will only create more cases like
#calibration and #progenitor that cause a lot of headache later.

Which brings me to your other mail:

On Wed, 13 Oct 2021 17:48:37 +0200 BONNAREL FRANCOIS wrote:
> If the "description" display is enough for "applied" why is it not the case
> for "applicable" (VEP-006 definition for #calibration) ?

I give you that it's not clear if #calibration would make it to a
term if were proposed today (when would a computer have to tell such
links apart from other data necessary for using data?).  

My gut feeling would be that #auxiliary would have a good chance of
being good enough.  But I think I can also see a "cut the crap" mode
where you just fold in anything that's clearly "calibration" as in
"data not specifically observed for this dataset" (nb you're welcome
to come up with a better definition -- I've found it really hard to
reproducibly say what's "science" and what's "calibration" data in
modern experiments).   "Cut the crap" sounds like pragmatics I could
find convincing.

I'm pretty sure there's no actual case for #dark, #flat, and #bias --
that level of description quite certainly isn't enough to let a
computer confidently reduce raw images of modern instruments.  So,
I'd not expect these to make it through a VEP, and I wish they
weren't in datalink/core.

> You write that reprocessing is too complex for datalink in the case of
> "already applied" but I imagine it's excatly the same for applicable.

Not quite the same (see below).  But that's not really the question
here: the concepts do exist, and before we deprecate them, we can as
well give them the meaning that has the "more likely" pragmatics.
What I think *might* make them useful is that someone writes a mass
reducer for raw data *of a certain instrument*, and for that specific
case having a somewhat more precise annotation might help that
particular mass reducer.  Also note we don't have any ProvDM
annotation (yet?) that could confer further information in the case
of raw data.

For reduced data, you certainly cannot build a "mass debugger";
there's always a human working on the document, I'd say, and that
human can read the description.  Plus, for a fainer-grained and
hopefully machine-readable description there's ProvDM.

So: Yes, there's no really strong case for #calibration, but since
it's there and we just have to decide for the computer, VEP-006 at
least defines it so it covers the less implausible use case.

           -- Markus


More information about the semantics mailing list