VEP-009

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Mon Mar 21 08:51:48 CET 2022


Hi,

[limiting the distribution to semantics, as registry is unconcernded
by this]

On Fri, Mar 18, 2022 at 05:17:06PM +0100, BONNAREL FRANCOIS wrote:
> That's a good summary of what we discussed last monday
> My 2 additional cents. The use case I'm considering also come from recent
> discussions within ESCAPE about VO integration of gamma data.
> 
> When we consider a so called DL5 dataset (a gamma source spectrum, or a
> gamma ray map) we know that this has been produced by complex processing of
> event lists with the appropriate "Instrument Response function" (IRF).
> 
> In a DataLink context could we use #progenitor for both ?
> 
> I think not because the event list comes from the observation and will be
> specific to these DL5 datasets.
> 
> On the other side the IRF would be common to plenty of sources or  DL5
> datasets as far as I understand. So they require to be accessed differently.

Hm -- why would a datalink client use different access methods
depending on whether or not some artefact is shared between different
observations?  I mean, in all likelihood the access would be provided
by a simple URL in either case, no?  And even if different access
modalities were necessary, how would that relate to the semantics
column, which, at least so far, has nothing to say about access?

> A smart client would have to take these two "ancestors" of our DL5 dataset
> in a very different way.

Which different ways?  For all I can see both links would simply be
used when users try to figure out an oddity in the reduced data
they're seeing (the "debug use case") -- in which case they'll need
all progenitors.  In general, for all I can see nobody has yet
brought forward a (datalink) use case where a machine could work out
that someone needs one thing but not the other.

> My suggestion for the IRF semantic term is to simply use a new term #irf

Frankly: I shudder to think how many terms we'd end up with if we go
to that level of detail.  But the first question, as usual, is: Why
would *a machine* need to tell IRFs apart from, say, the background
simulations in neutrino observations?

> About the "sorting out" issue of #progenitor, #calibration-applied and #irf
> for display purposes ...
> 
> Having different semantics terms for those different concepts would allow
> the client  to display them in different sections of the tool.

Sure -- but why would it want to?  For all I can see, displaying "all
items I have that help you debug the data set" in one place is what
a client actually *should* do, so I'd say making it disperse these
items has the smell of a bug.

> Descriptions (even if they are well filled) would never allow to separate
> automatically such things.

True.  This statement perhaps is an opportunity to put my request for
a proper case for separating "science" and "calibration" data in
another form.

You see, if you want to automate something, you have to give an
algorithm for how to get from a source state (in this case: a
datalink VOTable) to a desired target state (in this case: a
presentation of the links more easily digestable for a science user).

Our vocabularies let us give such an algorithm -- "anything that's
derived from this dataset goes in bin 1, anything it's derived from
goes to bin 2, and stuff you need to make sense of it goes to bin 3.
Bin 1 has label suchandsuch... organise the bins in a tree..." -- in
a declarative form.  But this only works if the data providers, when
annotating their datalink tables, basically put themselves in the
shoes of a machine doing this classification and assign the semantics
based on what they find the algorithm's result should be.

Hence, we define the algorithm *for them*.

What this means is that when you give or change a concept, you should
be able to give something like an if clause that conceptually can be
executed by a sufficiently sophisticated machine that would tell it
which bin to put the link in.  Since in the end the recipe is being
executed by a human, a certain amount of handwaving is permissable,
but I'm sure you cannot assume "science data" as such has an
interoperable meaning, and hence you just have to explain what that
is and how data providers can tell whether something is that or
rather "non-science data".

Again, I think the best way to come up with this if statement is to
figure out exactly *why* you would want to put different sorts of
progenitors into different bins.

          -- Markus


More information about the semantics mailing list