[Heig] vocabulary update: proposal for dataproduct_type update for high energy data : event-list definition and event-bundle

Tue Apr 29 15:27:35 CEST 2025

Hello Markus,

Thank you Mireille and Markus for your proposals and discussion.

Here, I will give some inputs in the context of the HEIG... in the text 
[...]

Le 25/04/2025 à 09:29, Markus Demleitner via heig a écrit :
> Dear Mireille,
>
> Thanks for your VEP.
>
> On Thu, Apr 24, 2025 at 06:16:36PM +0200, Mireille Louys via semantics wrote:
>> • Proposedefinitionsforaproduct-type *event-bundle:* An event-bundledataset
>> is a complex object containing an event-list and multiple files or
>> other substructures that are products necessary to analyse the event-list.
>> Data in an event-bundle may thus be used to produce higher leveldata
>> products such as images or spectra.
> I think the definition is reasonably clear and applicable in
> practice.  Before merging this, however, I'd have a few requests for
> clarification:
>
> (1) used-in: I really, *really* would like to see actual, published
> data here (always, in all VEPs; it's a pain if we go into all the
> trouble of defining a concept and then nobody's ever using it in
> practice).  I see that CSC onhttp://cda.cfa.harvard.edu/csctap  (or
> http://cda.cfa.harvard.edu/csc21tap  [1]) has an obscore table.  It
> would really be excellent if they could mark up their event bundles
> with the new term, such that we could say:
>
>    used-in: dataset ivo://csc.harvard.edu/scsr2?some-obs-id onhttp://cda.cfa.harvard.edu/csctap
>
> That would help me maintain a clear consciousness when setting up the
> new term[2].
In the VHE domain, CTAO, KM3NeT, SWGO wish to publish their current and 
future data in VO. The current running experiments MAGIC and HESS are 
working toward the publication of their legacy data into the VO.
And this is also in this context that such extensions are of interest 
for us.
> (2) Relationship: That's an operational field, i.e., I need to create
> an RDF triple from this.  The question thus is: is #event-list wider
> than #event-bundle or is it the other way round?  I could conjure up
> arguments for both, so, as usual, I'd approach the question from the
> user side: If I'm looking for #event-bundle, do I want to see
> #event-list, too?  If I'm looking for #event-list, do I want to see
> #event-bundle, too?  Whatever ought to encompass the other is the
> wider term.
event-bundle is e.g. event-list + response (a set of IRFs) 
(+provenance+readme+etc)
In this context, one would need to use DataLink to access to all files

> (3) Rationale: If the answer to both of the two questions in the
> preceding paragraph is "Yes", then it turns out the concepts are
> identical (A ⊂ B and B ⊂ A implies A = B), and hence you really don't
> want a new concept but augement #event-list to be something like,
> say, "Event list, possibly augmented with ancillary information".
> This points to an issue with your rationale: It basically argues that
> there's something you would like to say.
Thank you for asking.

First, our IRFs are not just accessory, they are mandatory for HE 
(photons and neutrinos) to make physics (for data analysis specialists: 
forward-folding likelihood analysis in the Poissonian regime). 
Otherwise, there is no point to make a VO publication.
Then, in term in semantics, ancillary is pretty vague. They are response 
files, or Instrument Response Files, a very specific type of data. They 
should be considered also as data, even if some are coming from 
simulations but not all.
In addition, one has use cases where these IRFs can be used without 
event-list: the case of simulations. Also, one can have one set of IRFs 
for many observations. So a data producer could use a dedicated entry in 
ObsCore for that.

This is why we really wish that the "response" (the IRFs) has their own 
existence, with clear descriptors. And in this way, one can create 
safely bundles, that will be the set of data to download for >99.9% of 
the users.

I hope that it helps to understand and helps to further discuss if you 
are still not yet convinced...

> An aphorism I'm bringing up rather often these days is: "In protocol
> design, don't think about what you want to say.  Think about what
> others want to listen to."  Hence, it's be really great if the
> rationale said why someone would want to look for #event-bundle
> *rather than* #event-list (or for #event-list rather than
> #event-bundle, if the the former is the narrower term).  Could you
> provide that information in the Rationale section?

The note associated to the HEIG creation gives such rationale, but one 
might need to develop a bit more. Thank you for the suggestion.

The best,
Bruno

> Thanks,
>
>               Markus
>
>
> [1] Regrettably, the CSC TAP services seem to be mildly broken at the
> moment.  Coming in with http, they issue https redirects which
> confuse TOPCAT; CXC folks: if you really need the forced redirects
> (see
> <https://blog.tfiu.de/foced-https-redirects-considered-harmful.html>  for
> a better alternative) then please update your registry records to
> point to the https URIs.  Even with https, however, I'm getting a
> "cscrel2.dbo.obscore not found" error from TOPCAT when running
>
>    select top 30 * from ivoa.obscore where dataproduct_type='event-list'
>
> It would be great if you could fix that (and a regular run of stilts
> taplint is good practice anyway)
>
> [2] You see,
> <https://ivoa.net/documents/Vocabularies/20230206/REC-Vocabularies-2.1.html#tth_sEcC.2>,
> while not exactly normative, is clear on:
>
>    In particular, ensure [...] resources mentioned in Used-in can be
>    reached and reflect the proposed term [...]
>
-- 

                          Bruno Khelifi
                   Physicist at CNRS (laboratory APC, Paris)
       Phone: +33.1.57.27.61.58 - Fax: +33.1.57.27.60.71
               APC, IN2P3/CNRS - Universite de Paris Cite
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/semantics/attachments/20250429/26b5b09c/attachment.htm>