[Heig] Post running meeting thoughts
Dr. Ian N. Evans
ievans at cfa.harvard.edu
Mon Mar 2 23:01:48 CET 2026
Dear Francois,
You are of course entitled to your opinion. However, what you are arguing is in my view inconsistent with the wording of the ObsCore Recommendation, Version 1.1.
As stated in section “1. Introduction”, of the Recommendation “... The ability to pose a single scientific query to multiple archives simultaneously is a fundamental use case for the Virtual Observatory. Providing a simple standard protocol such as the one described in this document increases the chances that a majority of the data providers in astronomy will be able to implement the protocol, thus allowing data discovery for almost all archived astronomical observations.” That is exactly what we are proposing here, with the scientific data products required for high-energy astrophysics.
Further, under section “2. Use cases”, the Recommendation states “Support any type of science data products (image, cube, spectrum, time series, instrumental data, etc.).” All of our data products satisfy this definition (and in fact instrument responses are a perfect example of “instrumental data”).
But you say “By science data we mean data where we can detect some information of interest coming from the sky.” Sorry, but YOU don’t get to tell US what constitutes OUR “science data”. Section “3.3.3. Observation and Observation Dataset” of the Recommendation states “exactly what comprises an “observation” is not well defined within astronomy and is left up to the data provider to define for their data.” for a reason. Science data products vary dramatically from waveband to waveband, and even within a waveband from instrument to instrument depending on the physical mechanism used by the detector. We consider instrument responses to be “science data” and very much part of the “observation dataset”.
Further down, section 3.3.3. Observation and Observation Dataset” of the Recommendation states “Two different approaches can be followed for exposing the instrumental data from an observation. One can either expose the individual science data products resulting from the observation, all sharing the same obs_id, or one can “package” the data products and expose the package as a single complex instrumental data product. ... Which approach is best depends upon the anticipated scientific usage and is up to the data provider to determine.” Again this is sensibly up to the data provider because the data provider is the one with the understanding of how the provider’s science users access and use their data.
You further posit that “If we don't do this and extend the domain of ObsCore too much we force it to become something else and to loose universality.” On what basis do you make that assumption? Certainly for Chandra data for example, our instrument responses all map to a specific spatial, spectral, and temporal coverage region on the sky. The use cases in Appendix A of the HEA ObsCore Extension almost all comprise queries that are based on sky geometry, spectral, or temporal coverage, with a few others based on obs_id.
You commented “When we designed ObsCore the intention was to design a data model and an associated tap table to expose science data.”, and that is great. However, I doubt very much that the design team included representation from the full range of wavebands or complete representation of different types of experiments, facilities, or missions, and as a result the inputs that went into building the standard (for example, what constitutes “science data”) would have been incomplete. You did an amazing job given the inputs that you had! But standards evolve with time as they become more complete, or they wither and die. ObsCore is currently evolving based on needs from radio, timing, and high-energy astrophysics, and this should be celebrated because it means that the standard is not withering and dying.
Sorry, but we need to to take full advantage of the flexibility provided by the ObsCore Recommendation as written to serve our science users our science data, based on our familiarity with how our science users want to access and use those data. If the IVOA is unwilling to support the needs of high-energy astrophysics, or at least of this very large HEA data provider, then I want to hear that stated directly and clearly by the IVOA Exec.
Thanks,
—Ian
> On Feb 24, 2026, at 08:43, BONNAREL FRANCOIS gmail via heig <heig at ivoa.net> wrote:
>
> Dear Bruno, dear Ian, all
>
> We come back to this.
>
> There is no doubt for us that VO should provide ways to expose such things as "background images" or in your case, Bruno, background rate.
>
> Our concern is about forcing ObsCore to be this way to expose such datasets.
>
> When we designed ObsCore the intention was to design a data model and an associated tap table to expose science data.
>
> By science data we mean data where we can detect some information of interest coming from the sky.
>
> If we don't do this and extend the domain of ObsCore too much we force it to become something else and to loose universality.
>
> So according to this general definition we don't think response function belong to the ObsCore domain. Advanced data products are another issue we won't discuss them today.
>
> Of course there are plenty of ways to expose those data and relate them with science data. VO must for sure improve their description and access modes
>
> DataLink is the minimal method to make response data accessible and relate them to relevant science data but may present the drawback to be a "two steps" process. If direct access to response data is required in a one step process we suggest to explore the solution of defining the DataLink response table as a TAP table in order to allow JOINS with the ObsCore science data table.
>
> But it is true that the description provided by DataLink is rather poor.
>
> So, alternativeky, when needed, different tables may be defined to describe response function datasets and provide pointers to them if necessary.
>
> A table with ucd on most of the columns (existing ucds or new ones to define) would already provide a lot of interoperability between services providing response data.
>
> Moreover, defining "response function data models" may provide more flexible and accurate descriptions and acces methods. Datamodels may be embedded in VOTables and mapped to columns using utypes or Mango+Mivot.
>
> We think some sections of the HeiG note should be revised in these directions.
> We are ready to help to do that.
>
> François with Mireille
>
>
>
> Le 07/02/2026 à 19:15, Bruno Khelifi via heig a écrit :
>> Hi all,
>>
>> About "Background images and pixel masks are not response-function data products", maybe this is the case for X-rays. I won't discuss it.
>>
>> As reminder, the term `background` is very generic and can be used for everything. In gamma-ray astronomy, it is from cosmic rays (it is not broken pixels, that are handled much more earlier during the raw data processing). In the GeV, TeV, PeV, the background rate is without any doubt an IRF!
>> In contrary to X-rays, 3D analysis are routinely made. For that the counts are compared with the predicted counts, that is the sum of the ones associated to gamma rays and the ones associated to the background rate, that are badly classified events as gamma-rays (see our notes). The estimation of the background rate can not be done on the data, because they are gamma rays everywhere in the field of view for the galactic plane (ie one can not use 'OFF' regions). As reminder, the Fermi bubble or eRosita bubble are going very up in latitudes. Also, one can not use simulations of cosmic rays to estimate the background, because the resources would be much too high and also because the simulations badly reproduce the reality (many studies made since decades show that). We use a complex pipeline that takes in input data, creates some exclusion masks iteratively in 3D, generates templates of rate in an hypercube ( [X,Y] or theta, atmospheric quality observable, optical efficiency of our instruments, Zenith angles, azimuth angles between of the geomagnetic effect on the extensive air showers, and reconstructed energy), curates the data to handle empty bins and low statistics bin, interpolates this hypercube template to compute the observation-wise background rate.
>>
>> For the neutrino telescopes, real data are also used. A specific pipeline is of use also to compute the background rate.
>>
>> So, one should keep without any doubt the background rate as data product!
>>
>> Best,
>> Bruno
>>
>>
>> Le 04/02/2026 à 20:38, Dr. Ian N. Evans via heig a écrit :
>>> Dear Francois,
>>>
>>> I consider the arf, rmf, and psf to be response-function data products. Background images and pixel masks are not response-function data products - they are determined directly from the observation event list similarly to a total counts image. Bad pixel is a region data product, but it’s something of a gray area since it’s a combination of known bad pixel regions plus bad pixel regions derived directly from the observation event list.
>>>
>>> For the Chandra Source Catalog (CSC) prototype, at least initially we plan to expose all of the data products directly to demonstrate that the extension provides the flexibility that we need. However in production, we likely would not expose all of the data products individually but rather combine some of them with the event lists as event bundles (at least for the individual observation full-field data product set). We would want to expose the individual observation event lists individually, but might choose for example to construct an event bundle that exposes (at least) the event list, bad pixel regions, aspect histogram, and possible aspect solution as a bundle since there is very little use for the latter 3 types of data product without the event list.
>>>
>>> While tying associated and derived data products to an event list in an event bundle seems sensible for individual observations, our experience is that this isn’t appropriate for the CSC advanced data products. Since CSC 2.0 was released we have had millions of catalog data product downloads and surveyed our user base as to data product usage.
>>>
>>> The typical usage patterns for the CSC advanced data products are different from the typical usage patterns for individual X-ray observation data.
>>>
>>> For the latter the user typically downloads the event list and ancillary data products (such as responses or other data products that can be used to build responses) as a set, and then performs data analysis steps directly on the event list using the ancillary data products, often after applying spatial/spectral/temporal filters to the data. Event bundles facilitate this usage.
>>>
>>> For the CSC advanced data products the usage patterns are quite different. Many (most) of these advanced data products are derived from multiple (in some cases hundreds) observations. Typically the users aren’t interested in performing data analysis steps on the event lists themselves, and often aren’t interested in knowing which observation(s) they are derived from (at least not from the perspective of having to perform a data query). They just want (e.g.) all the spectra (or light curves, or photometry MPDFs, or ...) in a certain region of the sky, or in a given time range, etc. And given the data volume that’s all they want. Maybe they’ll come back later and ask for a subset of additional data products after they’ve performed some preliminary analyses on those data products, but they don’t want those up front.
>>>
>>> Based on these usage patterns, I think we will likely want to expose the remaining CSC data products individually.
>>>
>>> Thanks,
>>> —Ian
>>>
>>>> On Jan 27, 2026, at 09:52, BONNAREL FRANCOIS gmail via heig <heig at ivoa.net> <mailto:heig at ivoa.net> wrote:
>>>>
>>>> Dear all,
>>>>
>>>> After the meeting last week, I was still thinking about what the Chandra prototype could look like
>>>>
>>>> For the Paris HESS prototype, I get the idea since a couple of years now.
>>>>
>>>> Trying to understand what the CSC data products could be I came back to Ian's Malta interop presentation.
>>>>
>>>> I copy/paste here one of the slides where some of these products are described.
>>>>
>>>> Before trying to define dataproduct_type vocabulary terms for those products I am wondering if we really need to expose all this data directly in
>>>>
>>>> an ObsTAP service.
>>>>
>>>> For example background images, psf, pixel mask, bad pixel regions, ARF belong to the "response functions" category if I'm not mistaking.
>>>>
>>>> They probably are attached to a photon event list or an image or ....
>>>>
>>>> Including all this in the main ObsCore table will overload it very heterogeneously. Some of these response functions will be similar to what we get in other domains (psf) some will be very different and specific to Xray.
>>>>
>>>> I understood that the spatial, spectral, time characterization of these specific products could be borrowed from the observation they are associated with. It's ok but is that useful ?
>>>>
>>>> For accessing these response functions I can imagine 4 solutions which all will have the advantage to let the OBsTAp service be focused on measurements obtained from the sky at whatever calib level.
>>>>
>>>> 1 ) the photon event list and response functions are gathered together in the same tar or archive file (or MEF) which is typed as an event-bundle. Direct access to this bundle from Obstap access_url is then easy. It's the client task to figure out what to do with the content of the bundle.
>>>>
>>>> 2 ) the various response material is kept as a set of individual products. All are associated to an event list or an image or a spectrum. In that case ObsTAP point to a datalink response which lists all these different products. The semantics FIELD writes calibration or response function. Content_qalifier FIELD writes the very nature of the product.
>>>>
>>>> 3 ) the DataLink reponse content may be organized as a TAP table. It's then possible to query at the same time the ObsTAP table and the DataLink-like table by a join on ObsCore/obs_publisher_did-DataLink/ID
>>>>
>>>> 4 ) if we need a more detailed description of the response products to help discover and select them we could imagine creating a specific "response product" table following a specific datamodel as proposed by Mireille in her Gorlitz presentation. This will allow to attach specific eg :
>>>>
>>>> - time range to a psf or
>>>>
>>>> - specific release date and description to an arf or a bad pixel map
>>>>
>>>> -....
>>>>
>>>> Natural join on obs_publisher_did in both tables will allow to query those table at the same time with selection criteria from both.
>>>>
>>>> Cheers
>>>>
>>>> François
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> <ulnvpjC4zXtPT0zn.png>
>>>>
>>>> --
>>>> heig mailing list
>>>> heig at ivoa.net <mailto:heig at ivoa.net>
>>>> https://www.google.com/url?q=http://mail.ivoa.net/mailman/listinfo/heig&source=gmail-imap&ust=1770130364000000&usg=AOvVaw1NuTRL3a6Ib8NM2g8f0TM8 <https://www.google.com/url?q=https://www.google.com/url?q%3Dhttp://mail.ivoa.net/mailman/listinfo/heig%26source%3Dgmail-imap%26ust%3D1770130364000000%26usg%3DAOvVaw1NuTRL3a6Ib8NM2g8f0TM8&source=gmail-imap&ust=1772545426000000&usg=AOvVaw1VPiNz8zudrIQeUMJoRNTa>
>>>
>>> —
>>>
>>> Dr. Ian Evans
>>> Astrophysicist
>>> Chandra X-ray Center
>>> Center for Astrophysics | Harvard & Smithsonian
>>>
>>> Office: (617) 496 7846 | Cell: (617) 699 5152
>>> 60 Garden Street | MS 81 | Cambridge, MA 02138
>>>
>>> <PastedGraphic-2.png>
>>>
>>>
>>> <PastedGraphic-3.png>
>>>
>>> <https://www.google.com/url?q=http://cfa.harvard.edu/&source=gmail-imap&ust=1772545426000000&usg=AOvVaw0uFI_1KoCUvDnmcfLwnZWl>cfa.harvard.edu <https://www.google.com/url?q=http://cfa.harvard.edu/&source=gmail-imap&ust=1772545426000000&usg=AOvVaw0uFI_1KoCUvDnmcfLwnZWl> | Facebook <https://www.google.com/url?q=http://cfa.harvard.edu/facebook&source=gmail-imap&ust=1772545426000000&usg=AOvVaw02xqYrC2mM2M8D3GD_fAAy> | Twitter <https://www.google.com/url?q=http://cfa.harvard.edu/twitter&source=gmail-imap&ust=1772545426000000&usg=AOvVaw3ilzQjksdV2EyBorR2VpR3> | YouTube <https://www.google.com/url?q=http://cfa.harvard.edu/youtube&source=gmail-imap&ust=1772545426000000&usg=AOvVaw39-gxMDL8maWEsAwabab0W> | Newsletter <https://www.google.com/url?q=http://cfa.harvard.edu/newsletter&source=gmail-imap&ust=1772545426000000&usg=AOvVaw1GftJaRGdajEXnp9-teyn8>
>>>
>>>
>> --
>>
>> Bruno Khelifi
>> Physicist at CNRS (laboratory APC, Paris)
>> Phone: +33.1.57.27.61.58 - Fax: +33.1.57.27.60.71
>> APC, IN2P3/CNRS - Universite de Paris Cite
>>
>>
>
> --
> heig mailing list
> heig at ivoa.net
> https://www.google.com/url?q=http://mail.ivoa.net/mailman/listinfo/heig&source=gmail-imap&ust=1772545426000000&usg=AOvVaw1CKWt3qicSCcjAbQuwDfB1
—
Dr. Ian Evans
Astrophysicist
Chandra X-ray Center
Center for Astrophysics | Harvard & Smithsonian
Office: (617) 496 7846 | Cell: (617) 699 5152
60 Garden Street | MS 81 | Cambridge, MA 02138


<http://cfa.harvard.edu/>cfa.harvard.edu <http://cfa.harvard.edu/> | Facebook <http://cfa.harvard.edu/facebook> | Twitter <http://cfa.harvard.edu/twitter> | YouTube <http://cfa.harvard.edu/youtube> | Newsletter <http://cfa.harvard.edu/newsletter>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/heig/attachments/20260302/55850783/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-2.png
Type: image/png
Size: 581 bytes
Desc: not available
URL: <http://mail.ivoa.net/pipermail/heig/attachments/20260302/55850783/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-3.png
Type: image/png
Size: 21717 bytes
Desc: not available
URL: <http://mail.ivoa.net/pipermail/heig/attachments/20260302/55850783/attachment-0003.png>
More information about the heig
mailing list