[Heig] HEIG Note discussion: Obscore and dataproduct_type and various classes of datasets in High energy data

Thu Oct 2 21:54:08 CEST 2025

Hi Mireille,

Sorry it has taken me so long to get back to you.  Hopefully these thoughts will still be useful.

I have some concerns about your data product type figure, in that it appears to separate what to me are tightly-related concepts and I'm not convinced that is appropriate.

Part of the issue is that datasets such as "pdf" and "draws" (plural BTW) can be derived from many different types of data products.  For example, for the Chandra Source Catalog we have a "pdf" product that is the probability distribution of the aperture photometry for an X-ray detection or source.  However, one could equally construct a "pdf" data product that is the probability distribution of a different kind of measure.  For example, the "psf” response-function is in-fact a "pdf" because it is the probability distribution that (using the simplest possible terminology) a photon from a point source on the sky will be detected at a specific location on the detector.  Similarly, an "edisp" response-function is also a "pdf" because it maps the probability that a detected particle with energy E will be detected in channel PHA on the detector (here I'm using the soft X-ray paradigm, but it's similar for other HEA wavebands).

In a vocabularly, you might think of #pdf as being a parent data product type to #response-function's such as #psf, #edisp, ... that are probability density functions.

More generally, the datasets that you have labeled "Explanatory" could in principle be linked to any kind of dataset, including calibrations, response-functions, etc. and not just observation datasets.  While the ones we want to use with ObsCore are specifically derived directly from one or more observation datasets, the connection with response-functions (even in this context) should not be ignored.

> ObsDataset defined in Obscore is the result of an observation process, using the observing configuration.
> Response  are computed from various Obsconfig parameters.
> Explanatory datasets are also computed from statistical analysis on the observed data.

I don't think these statements are entirely accurate when applied to HEA.

— ObsDataset may be the result of the observation process, but the result of the observation process is a dataset of observables, which might be pixel positions, DN, voltages, ... but typically are not physical quantities such as world coordinates, flux, etc. until calibrations have been applied.  As described below, the mappings between observables and physical quantities is much more complex for HEA and may not always be sensibly performed by the data provider.

— Responses may in some cases be determined from various Obsconfig parameters, but this may not be the case.  They *can* depend on the circumstances of the individual observation (e.g., the actual "psf" for a ground-based facility may not be computable a priori and the "bkg" may not be determinable from the observation configuration).  Often these depend also on the actual observation.

— Explanatory datasets certainly may be computed from statistical analysis of the observables (which will typically require application of the response-functions), but this is not always the case and, as described above, may be simplifying the relationship too much.

I've also thought more about the relationship between "response-functions" and "calibrations".

One thing I realized is that although in HEA we often colloquially refer to "response-functions” as “calibrations" they are rather different beasts from many other kinds of calibrations.  (Perhaps part of the issue here is that some people seem to associate certain notions about data products and their uses with the term "calibration" that may not be applicable to "response-functions" in the HEA regime.) 

Typically, calibrations map “observables" to "physical measures".  For example, one can map an observable such as pixel position on the detector to a physical location on the sky, or map an observable such as DN accumulated in a detector pixel to a flux.  Many HEA "response-functions" do the opposite, i.e., they provide a "pdf" that maps physical measures to observables.  For example, providing the probability that an incident particle with a specific energy (the physical measure) will be detected with a given PHA (which is the observable; again I'm using a soft X-ray detector example here).  In this sense, they are "decalibrations" rather than "calibrations".  This is typically done in HEA because the transformations are not invertible in the Poisson regime and one has to instead use a forward fitting approach to analyze the data.

Because they are "decalibrations", "response-functions” behave and are used very differently from "calibrations".  In other wavebands, one typically applies the calibrations to the observables to derive physical measures and then the "calibrated" data are typically used from then on for analysis.  ObsCore happily quantifies spatial/spectral/temporal/polarization axes in terms of physical measures and units even though the actual observation data may have units of pixels and DN (for example).

However, that does't necessarily work for HEA because one often may need to apply the response-functions during data analysis.  In some cases response-functions can be computed a priori, but in other cases they can depend on data selection (spatial/spectral/temporal), binning etc., which may depend on the science being performed.  This is why response-functions (or ancillary data products needed to compute response-functions) are tightly linked to the individual observation datasets.  In one sense, this is somewhat similar to the issue of radio interferometry, where the field of view and resolution depend on the weightings applied to the different baselines.  Weighting the shorter baselines higher and you get a larger field of view and poorer spatial resolution; conversely weighting the longer baselines higher yields better spatial resolution but a smaller field of view.

Like radio, we need to use reasonable values for the physical measures of individual observation datasets to make them findable in ObsCore.  All of the data products can be characterized by some kind of mean pointing and a field of view (which for some detectors may be 4pi steradians), an energy range (which may be bounded by the capabilities of the telescope/instrument/detector), a set of time intervals during which useful data are being acquired, and possibly polarization states.  Queries will almost always constrain one or more of these parameters to identify datasets of interest, and also provide additional constraints to select the types of data products that are of interest.  For ObsCore "advanced data products", users will often not be interested in the original observation datasets, but will nevertheless use similar constraints to identify the data products of interest (as the example use cases in the draft HEA ObsCore document demonstrate).

Cheers,
—Ian

> On Sep 3, 2025, at 12:30, Mireille Louys via heig <heig at ivoa.net> wrote:
> 
> Dear VO Heig , ( cross post to DM as well)
> 
> During the VOHE note review , I came up on a question how to identify the various kinds of data sets we are talking about in the note,
> and how to consider their relationships .
> 
> I think we need uses cases to illustrate how a user will select a data set of interest from the various properties exposed:
> - in an observation data set,
> - in a response function data set ,
> -in other files as draws, pdf, regions proposed with respect to the Chandra archive that I have named 'explanatory datasets'
> 
> ObsDataset defined in Obscore is the result of an observation process, using the observing configuration.
> Response  are computed from various Obsconfig parameters.
> Explanatory datasets are also computed from statistical analysis on the observed data.
> 
> I suggest these are diffrent in nature and cannot derive from the Obsdataset class of Obscore .
> But they are tightly linked to it .
> 
> I have tried to illustrate the relations I can guess from our discussions , but this must be iterated with you, High Energy astronomers ,
> and with TAP designers to converge to a clear use case .
> 
> here attached is a first take  .
> 
> Thanks for your comments and suggestions .
> 
> Best , Mireille
> 
> -- 
> --
> Mireille Louys, MCF (Assistant Professor)
> Centre de données Astronomiques (CDS)       Equipe Images, ICube
> Observatoire de Strasbourg                  Telecom Physique Strasbourg
> 11, rue de l' Université                    300, Bd Sebastien Brandt CS 10413
> F-67000 Strasbourg                          F-67412  Illkirch Cedex
> <Datasets_roles_inVOHE.pdf>-- 
> heig mailing list
> heig at ivoa.net
> https://www.google.com/url?q=http://mail.ivoa.net/mailman/listinfo/heig&source=gmail-imap&ust=1757521858000000&usg=AOvVaw1qdWGJknwfddexPdOP-MkP

—

Dr. Ian Evans
Astrophysicist
Chandra X-ray Center
Center for Astrophysics | Harvard & Smithsonian

Office: (617) 496 7846 | Cell: (617) 699 5152
60 Garden Street | MS 81 | Cambridge, MA 02138

 <http://cfa.harvard.edu/>cfa.harvard.edu <http://cfa.harvard.edu/> | Facebook <http://cfa.harvard.edu/facebook> | Twitter <http://cfa.harvard.edu/twitter> | YouTube <http://cfa.harvard.edu/youtube> | Newsletter <http://cfa.harvard.edu/newsletter>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20251002/1496a7bd/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-2.png
Type: image/png
Size: 581 bytes
Desc: not available
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20251002/1496a7bd/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-3.png
Type: image/png
Size: 21717 bytes
Desc: not available
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20251002/1496a7bd/attachment-0003.png>