[Heig] [EXTERNAL] [BULK] Re: vocabulary update: proposal for dataproduct_type update for high energy data : event-list definition and event-bundle

BONNAREL FRANCOIS gmail francois.bonnarel at gmail.com
Thu May 15 11:48:53 CEST 2025


Dear all,
The discussion on this topic went on on github : 
https://github.com/ivoa/HighEnergyObsCoreExt/issues/5

I tried my own summary there which i copy/paste here below
Cheers
François

> I think we all agree that we have to define new terms for the 
> instrument response functions.
>
> From the discussion I see three different possible vocabulary 
> integration for these terms
>
> Beside this it's also important to distinguish creation/extension of 
> vocabularies from the FIELDs where we use them.
>
> And we have to find a solution which works for two different 
> exposition strategies of the irf :
>
> |- directly as observation in ObsCore - via DataLink, attached to a 
> primary science dataproduct in ObsCore (event-list, spectrum, etc...) |
>
> 1 ) One of the suggestion I have read so far (Bruno) is to refine the 
> #calibration branch of "DataLink core" 
> (https://www.ivoa.net/rdf/datalink/core/2022-01-27/datalink.html) to 
> add your new terms for irf . Let's take the #psf term for example : if 
> we set this term in the "semantics" FIELD of the DataLink response 
> that means that we are facing the "#psf of the primary item in 
> ObsCore". But I understood (from Ian) that sometimes you want to 
> discover irf directly, as observations. And in the current status of 
> ObsCore/SSA/SIA/DataLink it's not possible to use 
> https://www.ivoa.net/rdf/datalink/core/2022-01-27/datalink.html 
> outside the "semantics DataLink response" FIELD. Changing that would 
> be cumbersome.
>
> In addition we could imagine in DataLink that a psf (or any other irf 
> ) link could be used for a material required to calibrate (process) 
> the primary item (child of #calibration) or on the other side for 
> material already used to calibrate the dataset discovered in ObsCore. 
> (currently would be a child of #progenitor although I don't like it 
> personally). So this would require some duplication of all the 
> different irf terms ! one psf-for-calibration and one 
> psf-used-tocalibrate !!! , etc ...
>
> ----> Probably we want to avoid that
>
> 2 ) Data Product type 
> (https://www.ivoa.net/rdf/product-type/2024-05-19/product-type.html) : 
> This vocabulary originally extracted from ObsCore is intended to be 
> used in the dataproduct_type FIELD (in future version of ObsCore) and 
> can also be used elsewhere. It can be used in the registry and in the 
> content_qualifier FIELD of DataLink.
> Considering that IRF are "special" observations, it's possible to 
> create a new "irf" branch in this vocabulary with all irf children.
> It could be used either in ObsCore dataproduct_type or in DataLink 
> content_qualifier depending on the irf "exposition" strategy.
> For the DataLink exposition strategy, this kind of combination 
> (completed by content_type for the media-type/format) is what we 
> experimented with the DataLink core #coderived and #counterpart terms 
> in this context (example):
>
> |primary item : an astronomical source in a catalog link1 : 
> semantics=#coderived, 
> content_qualifier=#time_series,content_type=text/csv link2 : 
> semantics=#counterpart, 
> content_qualifier=#image,content_type=application/fits |
>
> However there is a question: should all the types of irf be 
> potentially exposed in ObsCore as "observations" ? If some of them are 
> excluded it would be difficult to mark them as irrelevant : we don't 
> have ways to exclude some terms from a vocabulary in a given context.
> It's one reason why I think there is also a third solution to manage 
> this "irf type" vocabulary.
>
> 3 ) create a new "irf type" IVOA vocabulary.
> It can be used in "content_qualifier" FIELD of DataLink in combination 
> with "semantics=#calibration" (or any other appropriate term from 
> https://www.ivoa.net/rdf/datalink/core/2022-01-27/datalink.html)
> This is perfectly valid because content_qualifier is not REQUIRED to 
> be taken from "Data Product type" vocabulary. It's only the default ! 
> And the combination between semantics , content_qualifier and 
> content_type works as in 2)
> The drawback is that in that case we cannot use these new irf terms 
> directly in the dataproduct_type FIELD of ObsCore, if we want to 
> expose them there.
> The compromise could be to create a single new term in 
> https://www.ivoa.net/rdf/product-type/2024-05-19/product-type.html : #irf
> and when she finds this, the user/software has to look in another 
> ObsCore field (maybe dataproduct_subtype ???) the accurate term (taken 
> from this independent irf vocabulary) describing the irf
>
> ------------------------------------------------------------------------
>
> In my personal opinion , solution 1 should be rejected. It's not 
> consistent with what we did when we creates #counterpart, #coderived 
> and the content_qualifier FIELD.
> 2 or 3 ) can be further discussed. I personally prefer 3 because IRF 
> may be either images, or spectra, or whatever, in such a way that what 
> distinguish them from classical observation results is the object 
> which is observed: "something in the sky" versus "measuring a response 
> of the instrument". SO I am reluctant to add them to data product type 
> vocabulary where so far the distinction between so far was "what axis 
> is observed, what axis is sampled".
>



Le 07/05/2025 à 00:20, BONNAREL FRANCOIS gmail a écrit :
> Dear all,
>    I think I agree with Markus that we should avoid multiple #this in 
> general.
>    But with one single exception : different formats for the same product
>    Exemples : DataLink allows to have one single line in ObsCore for 
> an image which we can deliver in FITS, JPEG, PNG etc...
>                       DataLink allows to one single line for a 
> spectrum in ObscOre which we can deliver in VOtable, FITS, CSV, etc...
>     In those cases we will have in DataLInk table record 1 : ID = 
> ivo://aaa/bbb
> semantics : #this
> content-type : application/fits
> content-qualifier : image
> description " XXX survey image Target so and so in FITS format"
> record 2 : ID = ivo://aaa/bbb (same as above)
>                                       semantics : #this
> content-type : image/jpeg
> content-qualifier : image
> description " XXX survey image Target so and so in Jpeg format"
>    ....
>     The examples given by Laurent show that a data producer can choose 
> different strategies according to the mode of usage it expects from users.
>           strategy 1 : the bundle is exposed in ObsCOre an then in 
> DataLink #this in semantics row can only be the whole bundle
>           strategy 2 : the spectrum (or event-list, or whatever 
> science data) is exposed in ObsCore and then #this is this very 
> spectrum, or event-list, etc... other material you want to link  to 
> the spectrum must have different semantics (#calibration, #auxiliary ...)
>
>      Answer to your last question below
>
>
> Le 05/05/2025 à 22:45, Jaffe, Tess (GSFC-6601) via heig a écrit :
>>
>> I can add what our use case was.
>>
>> There was a local team building a web tool for x-ray spectra viewing 
>> and quick line identification, so I asked them to use our SSA for 
>> it. We then added a datalink service descriptor to our SSA results 
>> and served the response matrices in the datalink table result.  (The 
>> SSA result at the time had the link to the FITS file.)  It worked 
>> internally with their tool for a while but the tool didn't make it to 
>> production. So that was our use case.
>>
>> Now we are working more on our ObsTAP and its datalinks.  But 
>> precisely the question of which products get their own row in our 
>> ObsCore table is what we're working on now.  Presumably, whatever we 
>> decide, this hypothetical tool could work with as long as the service 
>> is performant. But from the client point of view, being given a 
>> tarball is more annoying to code for. I would prefer listing the 
>> spectrum product alone in ObsCore and give each ancillary file its 
>> own row in the DataLink result table.  I.e., list a spectrum in 
>> ObsCore and return additional rows in the DataLink result for each of 
>> the files that would constitute the bundle.  What would the bundle as 
>> a tarball facilitate?
>
> We discussed examples in the context of CTA and XMM. I think 
> "event-bundle" records in ObsCore may be completed by DataLink records 
> #this with content-type=application/OGIP or 
> content-type=application/GADF (or later VODF)
>
> In that case a dedicated software can directly process these 
> "standard" formats of "bundles".
>
>
> Of course another valid strategy is the one you suggest : product type 
> in ObsCore : event-list , spectrum and then several records with 
> #calibration for the different IRF or ARF
>
> An extended vocabulary for IRF/ARF types (TBD) may be used in the 
> content-qualifier FIELD in each of those rows for better identification.
>
> Probably more adapted for ad hoc processing.
>
> Cheers
>
> François
>
>
>>
>>
>>
>> ------------------------------------------------------------------------
>> *From:* heig <heig-bounces at ivoa.net> on behalf of Mireille Louys via 
>> heig <heig at ivoa.net>
>> *Sent:* Monday, May 5, 2025 1:50 PM
>> *To:* heig at ivoa.net <heig at ivoa.net>
>> *Subject:* Re: [Heig] [EXTERNAL] [BULK] Re: vocabulary update: 
>> proposal for dataproduct_type update for high energy data : 
>> event-list definition and event-bundle
>> CAUTION: This email originated from outside of NASA.  Please take 
>> care when clicking links or opening attachments.  Use the "Report 
>> Message" button to report suspicious messages to the NASA SOC.
>>
>>
>>
>>
>> Hello,
>>
>> Thanks Tess and Laurent for these examples .
>>
>> This was a proposal during the HE workshop last week to bring some 
>> examples
>>
>> showing an Obscore data discovery ( obscore entry here ) and exploring
>> various settings of data link scenarios .
>>
>> Any volunteer for  examples from other archives ?
>>
>> Best , Mireille
>>
>>
>> Le 05/05/2025 à 18:14, Laurent Michel via heig a écrit :
>> > Hello,
>> >
>> >
>> >
>> > Le 05/05/2025 à 08:56, Markus Demleitner via semantics a écrit :
>> >> Hi Tess,
>> >>
>> >> On Wed, Apr 30, 2025 at 01:48:29PM +0000, Jaffe, Tess (GSFC-6601) via
>> >> semantics wrote:
>> >>> Possibly dumb question:
>> >>
>> >> Not at all; you're touching a topic that has been discussed quite a
>> >> few times now without a satisfying result yet: What does it mean if
>> >> there are multiple items with the same semantics in Datalink? Is it
>> >> "all of them together give the thing" or is it "they are
>> >> alternatives"?  Or yet something else?
>> >>
>> >> Previous rounds suggested that the interpretation will probably
>> >> depend on the concept, but the details turned out to be fairly messy.
>> >>
>> >>
>> >>> If an ObsCore table lists an event-bundle as a separate row with
>> >>> its own product_type, and the access_url follows best practice
>> >>> specifying a datalink that will return the bundle, what should the
>> >>> DataLink result include as #this?  We are actively putting this
>> >>> together now at HEASARC.  If the product type is simply a spectrum,
>> >>
>> >> That's excellent news!
>> >>
>> >>> our datalink result has the spectrum file as #this and the response
>> >>> matrices, background, etc. as related products in the same result
>> >>> table.  If the product itself is a bundle, what is the #this? Do
>> >>> we have to provide a tarball or something?  Or are there multiple
>> >>> #this with different dataproduct_subtypes?  The latter doesn't
>> >>> sound right to me.
>> >>
>> >> Given my preamble, I'd avoid multi-#this.  The ideal solution would
>> >> IMHO be a standard archive if the HEIG can commit to such a thing.
>> >> Failing that, I think a tar archive of the individual components
>> >> would be the second best thing.  CADC does something like this,
>> >> although the other way round: They're handing out everything tarred
>> >> together as a #package.  Offering the components individually,
>> >> possibly as #progenitor-s, would help cases when people really only
>> >> want to fetch a single part.
>> >
>> > I agree that having multiple #this is confusing (which one is the good
>> > one.??).
>> > In my understanding #this must match the product_type as in the
>> > Obscore record.
>> >
>> > If a spectrum bundle is exposed in a separate row, we should have
>> > something like this:
>> >
>> > Obscore row:
>> > -----------
>> > - product_type=spectrum-bundle (tbd)
>> > - access_format=application/x-votable+xml;content=datalink
>> >
>> > Datalink response:
>> > -----------------
>> > - link #1
>> >   - semantics=#this
>> >   - content_qualifier=spectrum-bundle (TBD)
>> >   - content_type=application/tar+gzip
>> >   - description="spectrum file + preview + ARF + RMF + Background
>> > spectrum"
>> >
>> >
>> > If the spectrum is exposed in a separate row:
>> >
>> > Obscore row:
>> > -----------
>> > - product_type=spectrum
>> > - access_format=application/x-votable+xml;content=datalink
>> >
>> > Datalink response:
>> > -----------------
>> > - link #1
>> >   - semantics=#this
>> >   - content_qualifier=spectrum
>> >   - content_type=application/fits
>> >   - description="spectrum file"
>> > - link #2
>> >   - semantics=#package
>> >   - content_qualifier=spectrum-bundle (TBD)
>> >   - content_type=application/tar+gzip
>> >   - description="spectrum file + preview + ARF + RMF + Background
>> > spectrum"
>> >
>> > I do not believe we are able to design a standard HEIG archive because
>> > this too much mission/tool specific.
>> > Do we really needs the archive content to be machine readable?
>> > Anyway, individual files can be exposed with an adapted semantics.
>> >
>> > Laurent
>> >
>> >
>> >>
>> >> But at least for a prototype (and, if that works fine, perhaps also
>> >> as a long-term practice), I think nobody would be terribly confused
>> >> if #this were just the time series or spectrum, in particular if a
>> >> content-qualifier would let machines figure out what it is they'll
>> >> get.
>> >>
>> >> I'm not too happy with #progenitor for the individual components,
>> >> though.  Perhaps datalink/core should have a concept #component with
>> >> the definition "for datasets where #this is composed of multiple
>> >> individual artefacts, #component rows offer access to individual
>> >> artefacts.  Use local-semantics to consistently mark up the roles of
>> >> the components." or so.
>> >>
>> >> In the end, I think we need to see what will help clients consuming
>> >> this.  Do we have software that we could use to try that out? What
>> >> do people use to work this #event-bundle-s?
>> >>
>> >>           -- Markus
>> >>
>> >
>> > --
>> > English version: https: 
>> //https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.deepl.com%2Ftranslator&data=05%7C02%7Ctess.jaffe%40nasa.gov%7C6075ae45df2e4bb5325508dd8bfd5006%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C638820642266465078%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=vYY2ZoY99F8RvAPlu9Ske3PYa0z8E2Ne12cW%2BdF8Kl4%3D&reserved=0
>>
>> --
>> --
>> Mireille Louys, MCF (Assistant Professor)
>> Centre de données Astronomiques (CDS)       Equipe Images, ICube
>> Observatoire de Strasbourg                  Telecom Physique Strasbourg
>> 11, rue de l' Université                    300, Bd Sebastien Brandt 
>> CS 10413
>> F-67000 Strasbourg                          F-67412  Illkirch Cedex
>>
>> --
>> heig mailing list
>> heig at ivoa.net
>> https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmail.ivoa.net%2Fmailman%2Flistinfo%2Fheig&data=05%7C02%7Ctess.jaffe%40nasa.gov%7C6075ae45df2e4bb5325508dd8bfd5006%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C638820642266487767%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=YfJb4jKT78AkfN4F7QVYmofeRGcMreiSO0uI9ObBHWo%3D&reserved=0 
>> <http://mail.ivoa.net/mailman/listinfo/heig>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/semantics/attachments/20250515/9ff6a790/attachment-0001.htm>


More information about the semantics mailing list