about #calibration (VEP-006) : ----> IMPORTANT for DataLInk EXTENDED USAGE

Mon Oct 4 16:24:21 CEST 2021

Markus, all,

I widen the audience of this discussion to DAL mailing list because I 
think it's really important for an extension of the usage of DataLink. 
Rare have bee people really discussing this VEP. And I don't think all 
those who did supported Markus point of view. So really DataLink 
implementors and users have to participate. There is surely a way to 
make changes in the "semantics" vocabulary which will encompass all 
points of view. To achieve that we have to widen the perspective

The "semantics" FIELD in the DataLink response qualifies the 
relationship between the item identified by the ID value whatever can be 
the way this item has been discovered and the target of the link.
It is intended to help (DataLink client) software to make some actions.

General statement : I think IVOA vocabularies  are not lists of isolated 
terms.
And this is not only the case for ucds with their standardized rules of 
writing and combinations but also for simple IVOA list of terms .
And inside simpler vocabularies  like the DataLink "semantics" one, it's 
not only the case inside a tree but also in between trees. Terms  have 
to be consistent and changes should not break anything without proposing 
a solution.
Vocabularies behave like "systems" with internal relationships and 
interactions. Similarities and differences between use cases have to be 
taken into account.

That's why I am paying attention in semantics and specially for DataLink 
vocabularies, also because I think this has impact on radioastronomy 
services (for which I have a peculiar attention)

I will answer some of Markus points below and after that I will extend 
the discussion.

Despite what you wrote me in a very long and very hard private email I 
don't want to block anything, Markus, I just want to widen the 
discussion to all people who should have a look.

Le 17/09/2021 à 11:29, Markus Demleitner a écrit :
> François,
>
> On Thu, Sep 16, 2021 at 12:07:29PM +0200, BONNAREL FRANCOIS wrote:
>> Le 15/09/2021 à 16:50, Markus Demleitner a écrit :
>> In VEP-006 the new definition moves from "use case A" to "use case B"
>> (calibration stuff we want to apply to #this) and let "use case A" orphan !!
> Perhaps, but that's easily solved once it actually turns up: We'll
> just add another term (in case there are good reasons that
> re-labeling #progenitor as "Part-of-Provenance" won't work, that is:
> otherwise we don't even need a new term).
>
> But for now, nobody wants to publish such datalinks, and so there's
> really no reason to delay VEP-006 because of this concern, and anyway
> it's largely unrelated.

#calibration exists since 2014 and apparently nobody tried or succeeded 
to implement it before you implemented it in gavo one year ago, with the 
meaning "calibration applicable"

Where I differ from you point of view is that we have to let open the 
use case where #calibration is "already applied"

I think many many services only publish calibrated data in their ObsTAP, 
SIA, or other DAL services, that's the most usual use case. Give access 
to progenitor  and calibration stuff is interesting for at least two 
reasons : quality checking, and reprocessing with same material/other 
software.

In that case we are not facing calibration applicable, but calibration 
applied

The fact that nobody used it YET doesn't mean we don't have to take it 
into consideration with the same attention  we pay to "applicable". 
Objectively the two use cases exist and I know several groups in radio 
astronomy who are considering to give access to progenitors and related 
matters.

>
>> So my proposal to modify VEP-006 and tackle both use cases. Can we combine
>> terms in the semantics field ?
>>
>> Can we have a single #calibration branch for calibration stuff and combine
>> it with a relationship term like "#applied", #applicable ?
> As explained several times before: No.  We simply cannot have a
> concept that is partly Part-of-Provenance (a concept I insist is
> useful) and partly not: That would simply break the semantics of
> rdfs:subPropertyOf.
>
> And as usual there's nothing as practical as a good theory, as what
> you'd do then...
Yes, this would probably have this consequence if we want to add an head 
term to calibration. But up to now  calibration was itself a head term 
whatever meaning we gave to it.
>> Instead of having #calibration_applicable and #calibration_applied (and
>> children) as terms to check in the vocabulary list for the client, we would
>> have #calibration;#applied and #calibration;#applicable. And there the client
>> has to check a combination of two terms available in the vocabulary list.
> ...will immediately blow the tree structure that's really the only
> actualy application for the semantics field that we have at this
> point (at least as far as I can see).
>
> Very concretely: Where would your #applicable sit in the trees I'm
> showing in the datalinks at
> <http://dc.g-vo.org/static/datalinks.shtml>?

Well. ucds work like this and they still build trees. the main term 
would still be calibration and would be the only one used for tree building.

But as you will see below I propose now another solution

>
> You'd be breaking the main use case to annotate links that in 10
> years of datalink nobody has found reason to create -- that's
> definitely not a good deal.

Well 2014-2021 is 7 years not ten

And : from 2014 to 2020 the other kind of link (calibration applicable) 
was apparently not implemented either

And I don't know other implementations.

THis doesn't prove these TWO uses cases wil not be important in the 
future. Just that people have priorities and cannot build everything at 
the same time

>
>
>> Is that something that developers of clients could admit ?
> That's, by the way, mainly a DAL (and perhaps Apps) thing, and it
> hasn't found traction there either when it was proposed.  For good
> reasons: As I'm arguing above, it's totally unclear what the
> semantics of a semantics column used in such a way would be.  How
> would clients use such annotation?
>
>
> Sigh... I know I'm sounding like a broken record, but: Let's solve
> the problems we actually have *now*.  Try to build some grand
> description of the works, try to solve many problems at once, and
> we'll never get anywhere.

That's not MANY problems. #Calibration had and still has tow possible 
meanings for astronomers and hence two possible behaviors for client 
software.

How do we solve that ?

>
> So:
>
> (a) What problem we *actually have* with #calibration and children (in
> *existing* datalink documents) is *not* solved with VEP-006?
>
> (b) Is there some *computer* operation that was previously possible
> that is made impossible by VEP-006?
>
> If the answers to (a) and (b) are None (or "None, but I have all these
> other ideas that we could also discuss at the same time"), then let's
> please just move on.
This is  the point where I strongly disagree. Let's look at the 
vocabulary with a wider perspective.
>
> You know, this is really a minor change for a term we likely
> wouldn't even have if we hadn't just taken semantics out of thin air
> when we started datalink (and instead had added them as we went).
>
> And we've been discussing it now for more than a year (date on
> VEP-006: 2020-09-09).  Granted, there have been two improvements in
> the meantime, so it wasn't all wasted time.
>
> But going back to deeply disrupting proposals one year into the
> process, proposals on top that were discussed and rejected multiple
> times in the past, obviously don't solve the problem we're trying to
> solve and on top attempt to solve a (different) problem we don't even
> have at this time is, excuse me for being blunt, frustrating.
Apart from me At least two or three people objected to VEP-006 as it is 
stated now.
>
> Come on, François, give your heart a shove and just make your peace
> with VEP-006.  It's sane, doesn't damage anything, and solves an actual
> problem we've inherited from the olden days.
>
> And if people really start pushing out datalinks that are
> "Calibration applied", let's quarrel on whether or not we need to fix
> anything around #progenitor.  Then.  Not now.  VEP-006 simply is
> totally unconnected with that discussion.
>
>
> Thanks,
>
>          Markus
>
> PS: Incidentally, please edit the subject lines to at least not quote
> the wrong VEP (as here, where François had VEP-007).  This will help
> later when people browse the mailing list archives.

What kind of solutions can we find to solve the two use cases issue ?

Be careful that there are new things there (marked with ++ ahead)

1 ) the duplicated tree solution :

             calibration (as redefined in current VEP-006) with children 
Dark, flat, bias, etc.... Note the evolution I make on this by giving up 
"-applicable" suffix !!!

        and calibration-applied (as defined before) which children 
Dark-applied, flat-applied, bias-applied, etc....

         These are really two parallel sequences where we have to 
duplicate any sub term of calibration (for example "photstandard" may 
come). It's not very elegant and suggests there is some kind of 
combination active behind the scene.

2 ) the ucd-like combination solution : as explained in my previous email

          Although it could work I admit it is a major evolution of 
DataLink which may have other consequences to be considered carefully

++ 3 ) the "relaxed" or fuzzy solution.

           Some people suggested (in private or on the list but a long 
time ago) that  #calibration and children should be valid for both use 
cases (calibration material applied or applicable) .

           Argument for that is that DataLink should not care about the 
calibration status of datasets. this is not what it was intended to do.

            I'm personaly reluctant: because client software would have 
to use another information to know what to do with this #calibration 
material. In the most general case we don't know where the client can 
pick up this information.

++ 4 ) the "two columns" solution (also after some private discussion):

              calibration ( intended as applicable) and 
calibration-applied are the two relationship terms. So they are the only 
semantic terms

              #bias, #dark, #flat (and future other calibration data) 
are more terms giving the intrinsic type of these calibration data.

              They are actually some special type of observations done 
in specific way.

             So they could be described as such using the new 
"content_qualifier" column currently discussed for DataLink 1.1

              So to fully describe the link we need the "semantics" 
FIELD AND the "content_qualifier" one

++ 5 ) In a private discussion Pat suggested that we adopt the 
definition of VEP-006 and describe in a pattern usage document the two 
possible use cases.

              Calibration (applicable) would be straightforward an 
calibration applied would be  rendered by a recursive usage of DataLink.

                  The #progenitor link in the calibrated data DataLink 
response links  itself to  .... a DataLink document .....

                 ....which further links to the progenitor itself 
(#this) and #calibration data attached to it (#calibration applicable to 
the progenitor !!)

             To me, although it seems to work  when we are sure to have 
only #calibration data attached to #progenitor that have been  already 
applied (and are still applicable)

                it may be ambiguous in some use cases

Best regards

François