[Heig] validation of UCD proposal by the semantics WG : results of the semantics meeting for UCD in Strasbourg 6-7 of May

Wed May 20 22:19:09 CEST 2026

Hi Mireille,

Here is some feedback on what is currently written for VEP-analysis-products-MLouys-2026-04-22.txt.

————

> New Term: draws
> 
> Action: Addition
> 
> Label: draws
> 
> Description: Probabilistic dataset containing a collection of samples (draws)
> 	generated from a probability distribution.

Description: A dataset that records statistical draws computed from a probability distribution or a sample population, for example Markov chain Monte Carlo (MCMC) draws used when computing the Bayesian marginal probability density function for a random variable. The draws
can be interpreted to provide a robust estimation of the probability distribution of variable, and correlations between the draws provide information about how well the draws converge to the parent probability distribution.

> 
> Relationships: none

Relationships: parent #measurements

> 
> Used-in: % todo : provide a link to an example dataset 
> ?? example corner plot gammapy ?? 
> by high energy photon and neutrino experiments, and by cosmological observatories

Used-in: Example: detection position uncertainty draws data products (Chandra Source Catalog data product), e.g., https://cda.cfa.harvard.edu/csccli/retrieveFile?filename=acisf03498_000N030_r2102s_draws3.fits&filetype=draws&version=rel2.1; there are also aperture photometry draws data products (draws for various flux distributions) that will be released in October 2026.

> 
> Rationale: 
> 
> High-energy photon and neutrino experiments must employ statistical methods to derive 
> final products  like #spectrum, #sed, #light-curve or #image in physical units. The 
> underlying reason is that instrument responses are inherently non-invertible. By 
> computation of probabilities for random variables associated with spectral, spatial, 
> and/or temporal models, these final products can be derived.
> 
> In a frequentist approach, the best parameter estimates correspond to the maximum 
> likelihood probability among all possible realizations of the random variables.
> When priors are applied, the estimate is derived from the maximum of the posterior
> probability. In Bayesian inference, the best estimate is associated with the 50th 
> percentile (median) of the posterior draws.
> 
> This dataset maps the likelihood or probability landscapes across a space phase of 
> possible values of the random variables. The collection of probabilities enables the
> computation of quantiles, confidence intervals, confidence limits, and thus uncertainties,
> upper limits, and lower limits. This collection is particularly critical in cases of 
> non-Gaussian degeneracies or when dealing with a large number of parameters.

Rationale:

Many analysis methods across all wavebands use statistical methods to establish optimal parameter estimates for measured or derived properties.  In particular, high-energy astrophysics analyses must employ statistical methods to derive products such as #spectrum, #sed, #light-curve, #image etc. in physical units since the instrument responses are usually non-invertible.

The term draws is equally applicable to Bayesian inference or frequentist analysis.  In the frequentist approach, the best parameter estimates correspond to the maximum likelihood probability among all realizations of the random variables.  In Bayesian inference, The best parameter estimates are typically derived from the mode of the posterior probability distribution.

A draws dataset maps the probability (or equivalently, likelihood) of the desired parameters across a phase space of possible values of selected random variables.  The set of draws enables the computation of the distributions of the probability density functions of desired parameters as a function of the random variables, enabling determination of optimal parameter estimates, confidence intervals, quantiles, confidence limits, and thus uncertainties, upper limits, and lower limits.  The draws provide information as to the actual statistical distribution of parameter uncertainties, with is particularly critical in cases of non-Gaussian degeneracies, small number statistics (inherently non-Gaussian), or when dealing with large numbers of parameters.  Additionally, a key benefit of draws is that the dataset inherently provides information on the robustness of the statistical sampling approach and how well the draws converge to the parent probability distribution, which is not available from other parameter estimation data products such as probability density functions.

> 
> Discussion : 
> ++ The term is highly generic and applicable to any statistical framework, whether frequentist or Bayesian. It is worth noting that "draws" is a term typically associated with Bayesian statistics, whereas "samples" is more generic.
> Note that 'samples', initially considered, can also be used for moon rocks samples, or other laboratory physical samples which would be outside of the HEIG scope here and misleading. 
> 

Discussion:

The term “draws” is highly generic and is applicable to any statistical framework, whether frequentist analysis or Bayesian inference.  The term “samples” was also considered initially, but is very general and widely used in astronomy for a variety of different purposes (for example, moon rocks samples, or other laboratory physical samples which would be outside of the HEIG scope here and misleading.

There is a subtle difference between the widely used meanings of the term “samples” used in statistical analyses and the term “draws”, although they are often used interchangeably:  
  — “Samples” are the individual components of a statistical sample selected from a larger population, and the sample is typically used as representative of a population.  This term is commonly used in frequentist statistical analyses.
  — “Draws” are very similar, but can be drawn either from a population or from a probability distribution (such as the posterior probability distribution used in Bayesian statistics).  This term is commonly used in Bayesian statistical analyses, *but is also applicable to frequentist analyses* (in the former case is sampling parameters of the distribution whereas for the latter case one is sampling data points from the observed population).
Because of this, we recommend the use of the term “draws”.  We note that the existing datasets that require this definition are Bayesian posterior distributions where “samples” isn’t really an appropriate choice.

————

> New Term: pdf
> 
> Action: Addition
> 
> Label: Probability Density Function of a quantity
> 
> Description: Probability density function of a quantity, for example the Bayesian
> 	marginal probability density function associated to the spectral index of
> 	a spectrum

Description: A dataset that records the probability density function of a quantity, for example the Bayesian marginal probability density
function for a random variable, or the DeltaTS associated with a quantity from a Frequentist analysis. The probability density function provides a robust estimation of the variable and allows arbitrary confidence intervals to be computed directly from the distribution.

> 
> Relationships: none

Relationships: parent #measurements, child: #psf, #rmf, #edisp

> 
> Used-in: --> please provide an example 
> by high energy photon and neutrino experiments, and by cosmological observatories

Used-in: Example: aperture photometry (net counts, count rate, photon flux, and energy flux) probability density function data products (Chandra Source Catalog data product), e.g., https://cda.cfa.harvard.edu/csccli/retrieveFile?filename=acisf14335_000N031_r2598b_phot3.fits&filetype=aperphot&version=rel2.1

> 
> Rationale:
> When statistical analyses are in used to derive final products like #spectrum, #sed, 
> #light-curve or #image in physical units, the probability density function (PDF) associated
> to a random variable can be derived. This PSF can be the probability, the posterior or even
> the prior of a random variable.
> 
> This is very useful when the distribution is highly asymmetrical or multi-modal. If the 
> variable is for exemple the size of an object, the knowledge of asymmetry of this PDF is 
> obviously more useful than symmetric errors.
> 
> Note that this PDF can be "differential" (e.g. a probability at a given value), "integral" or
> "average" (when bins are used for the random variable). The serialization of this data production
> should then contain accurate metadata information.
> 
> When statistical analyses are employed to derive final products such as #spectrum, #sed, #light-curve
> or #image in physical units, the probability density function (PDF) associated with a random variable
> can be derived. This PDF may represent the probability, posterior, or prior distribution of the random variable.

Rationale:

Statistical analyses used to establish parameter estimates for measured or derived properties yield typically quantities that describe the shape of the probability density function (or pdf) of those parameters.  For simple analyses, these may be (e.g.) the mean and variance of a Gaussian distribution that approximates the actual probability distribution.  

High-energy astrophysics must employ statistical methods for parameter estimation and to derive products such as #spectrum, #sed, #light-curve, #image etc. in physical units.  In many cases the probability distribution is non-Gaussian (indeed, non-analytic), and so a representation of the *actual* probability distribution is needed for robust further analysis (especially in HEA, where source counts in the extreme Poisson regime are common and uncertainties in the calibrations themselves [random and systematic] must also be considered.  

Estimates such as the mean/median/mode, and confidence intervals etc. can be derived from the pdf; however many modern analyses will use the pdf distribution directly.  This is very useful when the distribution is highly asymmetrical or multi-modal. If the variable is for example the size of an object, the knowledge of asymmetry of this PDF is obviously more useful than symmetric errors.

There are two main types of pdfs in common use: (1) a “differential” pdf (this is the most common) reports the probability density as a function of the random variable so that the pdf is a table of P(x) vs. x; in practical representations, the random variable is quantized rather than continuous, so the pdf is a table where each row typically records the integral probability within a single x bin, i.e., P(x_lo-to-x_hi) vs. x; (2) an “integral” pdf (commonly termed a cdf), which corresponds to the cumulative probability P(-infinity-to-x) vs. x.  A third type of PDF is the “average” pdf, which provides the expected value (center of mass) of the distribution; however these may be represented by a single value and do not require a tabular representation.

> Discussion:
> This approach is particularly valuable when the distribution is highly asymmetric or multimodal. For example, if the variable represents the size of an object, knowledge of the asymmetry in the PDF is significantly more informative than symmetric error estimates.
> 
> It is important to note that the PDF can be "differential" (e.g., the probability at a specific value), "integral",
> or "averaged" (when bins are used for the random variable). Consequently, the serialisation of this data product
> must include precise metadata to ensure clarity and reproducibility.
> 
> M.L: --> parameters to describe for PDF to be retrieved : 
> probability_type = differential/integral/averaged
> 

Discussion:

The serialization of the data product should preferably include metadata to differentiate between the types of pdfs.  However, this may not be critical since the type of pdf can be determined from the sum of the probabilities over the distribution (the sum of the probabilities of a differential pdf that includes only the instantaneous probabilities at the x values will be < 1, for a binned differential pdf the sum will be 1, and for a cdf the sum will be > 1) provided the pdf spans the distribution adequately.

————

> New Term: region
> 
> Action: Addition
> 
> Label: Region
> 
> Description: dataset that encodes (one or more) regions of parameter space, for example 
> a spatial region or a region of phase space covered by a dataset. The set of dimensions
> represented by the region can be arbitrary
> 
> Relationships: none

Relationships: parent #measurements

> 
> Used-in: %todo: provide a real example 

Used-in: Example: region data products (Chandra Source Catalog data product), e.g., https://cda.cfa.harvard.edu/csccli/retrieveFile?filename=acisf15546_000N030_r3154_reg3.fits&filetype=srcreg&version=rel2.1

> Rationale: 
> %todo: clarify the role and dimensionality of this dataset kind
> It seems that the spatial coverage of the observation is given as an extra data product (like an excess_map, or an error_map ?) in Chandra. 

Rationale:

Existing astronomical data archives record region information in many different formats (typically not related to IVOA standards, since in many cases they pre-date those standards).  For example, Chandra X-ray Observatory typically records spatial regions using the FITS Spatial Region File Registered Convention, which is supported by the widely use CFITSIO FITS I/O software library as well as Astropy.  XMM and Fermi support ds9 format region data products, and the NRAO Common Astronomy Software Applications (CASA) radio package supports the CRTF region file format.  Within the IVOA, a MOC data product is a type of region data product.  Different region data products standards may include information regarding the shape, whether it is a source or background region, whether it is an inclusion or exclusion region, whether it can be edited/moved/rotated/deleted, region color and width, and associated metadata.

Advanced data products (ObsCore calib_level > 2) may result from analyses of (possibly multiple) existing data products and may not want to attach region information to existing data products.  For example, a catalog such as the Chandra Source Catalog may identify (detect) tens of thousands of sources from an existing data product and then analyze properties for each of the sources; information about the source and background regions and cutouts is essential to correctly compute various source properties (for example, to compute aperture corrections for aperture photometry), but in general one would not want to add these region definitions to existing data products and would not want to duplicate this information in multiple other data products.  Recording the region information as queryable data products that work with current software is a sensible solution.

The purpose of region is to provide a data product type that can be used to query existing archives for those data products, irrespective of the internal format or serialization of the data product.

> 
> Discussion:
> Not clear how universal this can be in the High Energy domain. 
> Some data collections like XMM, SVOM, etc. may store this information in a FITS file extension , or a S_MOC extension.
> If the dataset is multidimensional, it does not fit into the tree proposed in "https://www.google.com/url?q=http://www.ivoa.net/rdf/product-type&source=gmail-imap&ust=1779901952000000&usg=AOvVaw1sXRg5TwJRZb6Q_jNR67f6", which is based on the number and kind of data axes 

Discussion:

The region data product is intended to be universal for those facilities and archives that include region information recorded in data products that are separate from associated data.  There are some data products that record region information as FITS file extensions or perhaps an S_MOC extension.  In such cases, a separate region data product may not be necessary.

We have intentionally not restricted the dimensionality of region data products.  However, most existing archival region data products are restricted to 2 spatial dimensions, although there are some that include spectral and temporal dimensions.

————

Thanks,
—Ian

> 
> On May 20, 2026, at 13:12, Mireille Louys <mireille.louys at unistra.fr> wrote:
> 
> Hi everyone, 
> 
> The discussion about the UCD terms proposed in the note is summarized on the request for modification page for UCD: 
> https://wiki.ivoa.net/twiki/bin/view/IVOA/UCDList_1-7_RFM <https://www.google.com/url?q=https://wiki.ivoa.net/twiki/bin/view/IVOA/UCDList_1-7_RFM&source=gmail-imap&ust=1779901952000000&usg=AOvVaw166if-Cy2BboO_Q0vrAKow>
> The terms are described in the VEP-UCD description files available from their specific link from the page above. 
> All the VEP-UCD files are also available at https://voparis-gitlab.obspm.fr/vespa/ivoa-standards/semantics/vep-ucd <https://www.google.com/url?q=https://voparis-gitlab.obspm.fr/vespa/ivoa-standards/semantics/vep-ucd&source=gmail-imap&ust=1779901952000000&usg=AOvVaw3qfBXSz0zl3imUQflsSD56>
> We need to update the UCD section following the decisions taken . 
> 
> There is another topic with semantics : the analysis products vocabulary . 
> 
> I attach here a draft version of a VEP for analysis data product type : 
> What is needed are 
> 
> - a revision of the definitions in order to encompass various kinds of HE experiments 
> - file examples for the Used-in section 
> - clarification of the #region data product
> 
> Thanks for helping for this.
> 
> Mireille
> 
> -- 
> --
> Mireille Louys, MCF (Assistant Professor)
> Centre de données Astronomiques (CDS)       Equipe Images, ICube
> Observatoire de Strasbourg                  Telecom Physique Strasbourg
> 11, rue de l' Université                    300, Bd Sebastien Brandt CS 10413
> F-67000 Strasbourg                          F-67412  Illkirch Cedex
> <VEP-analysis-products-MLouys-2026-04-22.txt>

—

Dr. Ian Evans
Astrophysicist
Chandra X-ray Center
Center for Astrophysics | Harvard & Smithsonian

Office: (617) 496 7846 | Cell: (617) 699 5152
60 Garden Street | MS 81 | Cambridge, MA 02138

 <http://cfa.harvard.edu/>cfa.harvard.edu <http://cfa.harvard.edu/> | Facebook <http://cfa.harvard.edu/facebook> | Twitter <http://cfa.harvard.edu/twitter> | YouTube <http://cfa.harvard.edu/youtube> | Newsletter <http://cfa.harvard.edu/newsletter>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/heig/attachments/20260520/e27d7e99/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-2.png
Type: image/png
Size: 581 bytes
Desc: not available
URL: <http://mail.ivoa.net/pipermail/heig/attachments/20260520/e27d7e99/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-3.png
Type: image/png
Size: 21717 bytes
Desc: not available
URL: <http://mail.ivoa.net/pipermail/heig/attachments/20260520/e27d7e99/attachment-0003.png>