[Heig] validation of UCD proposal by the semantics WG : results of the semantics meeting for UCD in Strasbourg 6-7 of May
Mireille Louys
mireille.louys at unistra.fr
Thu May 21 19:23:19 CEST 2026
Hi Ian , thank you for your inputs.
here are my comments ( included) , before I include most of it in the
VEP update.
The updated version of this VEP is uploaded on
ivoa/HighEnergyObscoreExtension in a new pull request so that we can
review it internally.
thanks , Mireille.
Le 20/05/2026 à 10:19 PM, Dr. Ian N. Evans a écrit :
> Hi Mireille,
>
> Here is some feedback on what is currently written for
> VEP-analysis-products-MLouys-2026-04-22.txt.
>
> ————
>
>> New Term: draws
>>
>> Action: Addition
>>
>> Label: draws
>>
>> Description: Probabilistic dataset containing a collection of samples
>> (draws)
>> generated from a probability distribution.
>
> Description: A dataset that records statistical draws computed from a
> probability distribution or a sample population, for example Markov
> chain Monte Carlo (MCMC) draws used when computing the Bayesian
> marginal probability density function for a random variable. The draws
>
> can be interpreted to provide a robust estimation of the probability
> distribution of variable, and correlations between the draws provide
> information about how well the draws converge to the parent
> probability distribution.
>
>
>
>>
>> Relationships: none
>
> Relationships: parent #measurements
>
Measurements is not recommended anymore in the use of ObsCore as I
explained some time ago.
The term #measurements is not implemented , and too ambiguous.
And this hierarchy does not help to figure out the content of these data
products.
There is no reasonning involved on the VEP labels .
>
>>
>> Used-in: % todo : provide a link to an example dataset
>> ?? example corner plot gammapy ??
>> by high energy photon and neutrino experiments, and by cosmological
>> observatories
>
> Used-in: Example: detection position uncertainty draws data products
> (Chandra Source Catalog data product), e.g.,
> https://cda.cfa.harvard.edu/csccli/retrieveFile?filename=acisf03498_000N030_r2102s_draws3.fits&filetype=draws&version=rel2.1
> <https://cda.cfa.harvard.edu/csccli/retrieveFile?filename=acisf03498_000N030_r2102s_draws3.fits&filetype=draws&version=rel2.1>;
> there are also aperture photometry draws data products (draws for
> various flux distributions) that will be released in October 2026.
Thanks for this example
>
>>
>> Rationale:
>>
>> High-energy photon and neutrino experiments must employ statistical
>> methods to derive
>> final products like #spectrum, #sed, #light-curve or #image in
>> physical units. The
>> underlying reason is that instrument responses are inherently
>> non-invertible. By
>> computation of probabilities for random variables associated with
>> spectral, spatial,
>> and/or temporal models, these final products can be derived.
>>
>> In a frequentist approach, the best parameter estimates correspond to
>> the maximum
>> likelihood probability among all possible realizations of the random
>> variables.
>> When priors are applied, the estimate is derived from the maximum of
>> the posterior
>> probability. In Bayesian inference, the best estimate is associated
>> with the 50th
>> percentile (median) of the posterior draws.
>>
>> This dataset maps the likelihood or probability landscapes across a
>> space phase of
>> possible values of the random variables. The collection of
>> probabilities enables the
>> computation of quantiles, confidence intervals, confidence limits,
>> and thus uncertainties,
>> upper limits, and lower limits. This collection is particularly
>> critical in cases of
>> non-Gaussian degeneracies or when dealing with a large number of
>> parameters.
>
> Rationale:
>
> Many analysis methods across all wavebands use statistical methods to
> establish optimal parameter estimates for measured or derived
> properties. In particular, high-energy astrophysics analyses must
> employ statistical methods to derive products such as #spectrum, #sed,
> #light-curve, #image etc. in physical units since the instrument
> responses are usually non-invertible.
>
> The term draws is equally applicable to Bayesian inference or
> frequentist analysis. In the frequentist approach, the best parameter
> estimates correspond to the maximum likelihood probability among all
> realizations of the random variables. In Bayesian inference, The best
> parameter estimates are typically derived from the mode of the
> posterior probability distribution.
>
> A draws dataset maps the probability (or equivalently, likelihood) of
> the desired parameters across a phase space of possible values of
> selected random variables. The set of draws enables the computation
> of the distributions of the probability density functions of desired
> parameters as a function of the random variables, enabling
> determination of optimal parameter estimates, confidence intervals,
> quantiles, confidence limits, and thus uncertainties, upper limits,
> and lower limits. The draws provide information as to the actual
> statistical distribution of parameter uncertainties, with is
> particularly critical in cases of non-Gaussian degeneracies, small
> number statistics (inherently non-Gaussian), or when dealing with
> large numbers of parameters. Additionally, a key benefit of draws is
> that the dataset inherently provides information on the robustness of
> the statistical sampling approach and how well the draws converge to
> the parent probability distribution, which is not available from other
> parameter estimation data products such as probability density functions.
>
>
>>
>> Discussion :
>> ++ The term is highly generic and applicable to any statistical
>> framework, whether frequentist or Bayesian. It is worth noting that
>> "draws" is a term typically associated with Bayesian statistics,
>> whereas "samples" is more generic.
>> Note that 'samples', initially considered, can also be used for moon
>> rocks samples, or other laboratory physical samples which would be
>> outside of the HEIG scope here and misleading.
>>
>
> Discussion:
>
> The term “draws” is highly generic and is applicable to any
> statistical framework, whether frequentist analysis or Bayesian
> inference. The term “samples” was also considered initially, but is
> very general and widely used in astronomy for a variety of different
> purposes (for example, moon rocks samples, or other laboratory
> physical samples which would be outside of the HEIG scope here and
> misleading.
>
> There is a subtle difference between the widely used meanings of the
> term “samples” used in statistical analyses and the term “draws”,
> although they are often used interchangeably:
> — “Samples” are the individual components of a statistical sample
> selected from a larger population, and the sample is typically used as
> representative of a population. This term is commonly used in
> frequentist statistical analyses.
> — “Draws” are very similar, but can be drawn either from a
> population or from a probability distribution (such as the posterior
> probability distribution used in Bayesian statistics). This term is
> commonly used in Bayesian statistical analyses, *but is also
> applicable to frequentist analyses* (in the former case is sampling
> parameters of the distribution whereas for the latter case one is
> sampling data points from the observed population).
> Because of this, we recommend the use of the term “draws”. We note
> that the existing datasets that require this definition are Bayesian
> posterior distributions where “samples” isn’t really an appropriate
> choice.
>
> ————
>
>> New Term: pdf
>>
>> Action: Addition
>>
>> Label: Probability Density Function of a quantity
>>
>> Description: Probability density function of a quantity, for example
>> the Bayesian
>> marginal probability density function associated to the spectral index of
>> a spectrum
>
> Description: A dataset that records the probability density function
> of a quantity, for example the Bayesian marginal probability density
>
> function for a random variable, or the DeltaTS associated with a
> quantity from a Frequentist analysis. The probability density function
> provides a robust estimation of the variable and allows arbitrary
> confidence intervals to be computed directly from the distribution.
>
>
>
>>
>> Relationships: none
>
> Relationships: parent #measurements, child: #psf, #rmf, #edisp
>
same remarks as above : no reasonning involved between labels
>
>>
>> Used-in: --> please provide an example
>> by high energy photon and neutrino experiments, and by cosmological
>> observatories
>
> Used-in: Example: aperture photometry (net counts, count rate, photon
> flux, and energy flux) probability density function data products
> (Chandra Source Catalog data product), e.g.,
> https://cda.cfa.harvard.edu/csccli/retrieveFile?filename=acisf14335_000N031_r2598b_phot3.fits&filetype=aperphot&version=rel2.1
> <https://cda.cfa.harvard.edu/csccli/retrieveFile?filename=acisf14335_000N031_r2598b_phot3.fits&filetype=aperphot&version=rel2.1>
>
>
>>
>> Rationale:
>> When statistical analyses are in used to derive final products like
>> #spectrum, #sed,
>> #light-curve or #image in physical units, the probability density
>> function (PDF) associated
>> to a random variable can be derived. This PSF can be the probability,
>> the posterior or even
>> the prior of a random variable.
>>
>> This is very useful when the distribution is highly asymmetrical or
>> multi-modal. If the
>> variable is for exemple the size of an object, the knowledge of
>> asymmetry of this PDF is
>> obviously more useful than symmetric errors.
>>
>> Note that this PDF can be "differential" (e.g. a probability at a
>> given value), "integral" or
>> "average" (when bins are used for the random variable). The
>> serialization of this data production
>> should then contain accurate metadata information.
>>
>> When statistical analyses are employed to derive final products such
>> as #spectrum, #sed, #light-curve
>> or #image in physical units, the probability density function (PDF)
>> associated with a random variable
>> can be derived. This PDF may represent the probability, posterior, or
>> prior distribution of the random variable.
>
> Rationale:
>
> Statistical analyses used to establish parameter estimates for
> measured or derived properties yield typically quantities that
> describe the shape of the probability density function (or pdf) of
> those parameters. For simple analyses, these may be (e.g.) the mean
> and variance of a Gaussian distribution that approximates the actual
> probability distribution.
>
> High-energy astrophysics must employ statistical methods for parameter
> estimation and to derive products such as #spectrum, #sed,
> #light-curve, #image etc. in physical units. In many cases the
> probability distribution is non-Gaussian (indeed, non-analytic), and
> so a representation of the *actual* probability distribution is needed
> for robust further analysis (especially in HEA, where source counts in
> the extreme Poisson regime are common and uncertainties in the
> calibrations themselves [random and systematic] must also be considered.
>
> Estimates such as the mean/median/mode, and confidence intervals etc.
> can be derived from the pdf; however many modern analyses will use the
> pdf distribution directly. This is very useful when the distribution
> is highly asymmetrical or multi-modal. If the variable is for example
> the size of an object, the knowledge of asymmetry of this PDF is
> obviously more useful than symmetric errors.
>
> There are two main types of pdfs in common use: (1) a “differential”
> pdf (this is the most common) reports the probability density as a
> function of the random variable so that the pdf is a table of P(x) vs.
> x; in practical representations, the random variable is quantized
> rather than continuous, so the pdf is a table where each row typically
> records the integral probability within a single x bin, i.e.,
> P(x_lo-to-x_hi) vs. x; (2) an “integral” pdf (commonly termed a cdf),
> which corresponds to the cumulative probability P(-infinity-to-x) vs.
> x. A third type of PDF is the “average” pdf, which provides the
> expected value (center of mass) of the distribution; however these may
> be represented by a single value and do not require a tabular
> representation.
>
>
>
>> Discussion:
>> This approach is particularly valuable when the distribution is
>> highly asymmetric or multimodal. For example, if the variable
>> represents the size of an object, knowledge of the asymmetry in the
>> PDF is significantly more informative than symmetric error estimates.
>>
>> It is important to note that the PDF can be "differential" (e.g., the
>> probability at a specific value), "integral",
>> or "averaged" (when bins are used for the random variable).
>> Consequently, the serialisation of this data product
>> must include precise metadata to ensure clarity and reproducibility.
>>
>> M.L: --> parameters to describe for PDF to be retrieved :
>> probability_type = differential/integral/averaged
>>
>
> Discussion:
>
> The serialization of the data product should preferably include
> metadata to differentiate between the types of pdfs. However, this
> may not be critical since the type of pdf can be determined from the
> sum of the probabilities over the distribution (the sum of the
> probabilities of a differential pdf that includes only the
> instantaneous probabilities at the x values will be < 1, for a binned
> differential pdf the sum will be 1, and for a cdf the sum will be > 1)
> provided the pdf spans the distribution adequately.
>
> ————
>
>> New Term: region
>>
>> Action: Addition
>>
>> Label: Region
>>
>> Description: dataset that encodes (one or more) regions of parameter
>> space, for example
>> a spatial region or a region of phase space covered by a dataset. The
>> set of dimensions
>> represented by the region can be arbitrary
>>
>> Relationships: none
>
> Relationships: parent #measurements
>
>
>>
>> Used-in: %todo: provide a real example
>
> Used-in: Example: region data products (Chandra Source Catalog data
> product), e.g.,
> https://cda.cfa.harvard.edu/csccli/retrieveFile?filename=acisf15546_000N030_r3154_reg3.fits&filetype=srcreg&version=rel2.1
> <https://cda.cfa.harvard.edu/csccli/retrieveFile?filename=acisf15546_000N030_r3154_reg3.fits&filetype=srcreg&version=rel2.1>
>
>
>> Rationale:
>> %todo: clarify the role and dimensionality of this dataset kind
>> It seems that the spatial coverage of the observation is given as an
>> extra data product (like an excess_map, or an error_map ?) in Chandra.
>
> Rationale:
>
> Existing astronomical data archives record region information in many
> different formats (typically not related to IVOA standards, since in
> many cases they pre-date those standards). For example, Chandra X-ray
> Observatory typically records spatial regions using the FITS Spatial
> Region File Registered Convention, which is supported by the widely
> use CFITSIO FITS I/O software library as well as Astropy. XMM and
> Fermi support ds9 format region data products, and the NRAO Common
> Astronomy Software Applications (CASA) radio package supports the CRTF
> region file format. Within the IVOA, a MOC data product is a type of
> region data product. Different region data products standards may
> include information regarding the shape, whether it is a source or
> background region, whether it is an inclusion or exclusion region,
> whether it can be edited/moved/rotated/deleted, region color and
> width, and associated metadata.
>
> Advanced data products (ObsCore calib_level > 2) may result from
> analyses of (possibly multiple) existing data products and may not
> want to attach region information to existing data products. For
> example, a catalog such as the Chandra Source Catalog may identify
> (detect) tens of thousands of sources from an existing data product
> and then analyze properties for each of the sources; information about
> the source and background regions and cutouts is essential to
> correctly compute various source properties (for example, to compute
> aperture corrections for aperture photometry), but in general one
> would not want to add these region definitions to existing data
> products and would not want to duplicate this information in multiple
> other data products. Recording the region information as queryable
> data products that work with current software is a sensible solution.
>
> The purpose of region is to provide a data product type that can be
> used to query existing archives for those data products, irrespective
> of the internal format or serialization of the data product.
>
>
>>
>> Discussion:
>> Not clear how universal this can be in the High Energy domain.
>> Some data collections like XMM, SVOM, etc. may store this information
>> in a FITS file extension , or a S_MOC extension.
>> If the dataset is multidimensional, it does not fit into the tree
>> proposed in
>> "https://www.google.com/url?q=http://www.ivoa.net/rdf/product-type&source=gmail-imap&ust=1779901952000000&usg=AOvVaw1sXRg5TwJRZb6Q_jNR67f6",
>> which is based on the number and kind of data axes
>
> Discussion:
>
> The region data product is intended to be universal for those
> facilities and archives that include region information recorded in
> data products that are separate from associated data. There are some
> data products that record region information as FITS file extensions
> or perhaps an S_MOC extension. In such cases, a separate region data
> product may not be necessary.
>
> We have intentionally not restricted the dimensionality of region data
> products. However, most existing archival region data products are
> restricted to 2 spatial dimensions, although there are some that
> include spectral and temporal dimensions.
>
> ————
>
> Thanks,
> —Ian
>
>>
>> On May 20, 2026, at 13:12, Mireille Louys <mireille.louys at unistra.fr>
>> wrote:
>>
>> Hi everyone,
>>
>> The discussion about the UCD terms proposed in the note is summarized
>> on the request for modification page for UCD:
>> https://wiki.ivoa.net/twiki/bin/view/IVOA/UCDList_1-7_RFM
>>
>> The terms are described in the VEP-UCD description files available
>> from their specific link from the page above.
>> All the VEP-UCD files are also available at
>> https://voparis-gitlab.obspm.fr/vespa/ivoa-standards/semantics/vep-ucd
>>
>> We need to update the UCD section following the decisions taken .
>>
>> There is another topic with semantics : the analysis products
>> vocabulary .
>>
>> I attach here a draft version of a VEP for analysis data product type :
>> What is needed are
>>
>> - a revision of the definitions in order to encompass various kinds
>> of HE experiments
>> - file examples for the Used-in section
>> - clarification of the #region data product
>>
>> Thanks for helping for this.
>>
>> Mireille
>>
>> --
>> --
>> Mireille Louys, MCF (Assistant Professor)
>> Centre de données Astronomiques (CDS) Equipe Images, ICube
>> Observatoire de Strasbourg Telecom Physique Strasbourg
>> 11, rue de l' Université 300, Bd Sebastien Brandt CS 10413
>> F-67000 Strasbourg F-67412 Illkirch Cedex
>> <VEP-analysis-products-MLouys-2026-04-22.txt>
>
> —
> Dr. Ian Evans
> *Astrophysicist*
> *Chandra X-ray Center*
> Center for Astrophysics | Harvard & Smithsonian
> Office: (617) 496 7846 | Cell: (617) 699 5152
> 60 Garden Street | MS 81 | Cambridge, MA 02138
>
> PastedGraphic-2.png
>
> PastedGraphic-3.png _
>
> <http://cfa.harvard.edu/>__cfa.harvard.edu
> <http://cfa.harvard.edu/>_ | _Facebook
> <http://cfa.harvard.edu/facebook>_ | _Twitter
> <http://cfa.harvard.edu/twitter>_ | _YouTube
> <http://cfa.harvard.edu/youtube>_ | _Newsletter
> <http://cfa.harvard.edu/newsletter>_
>
--
--
Mireille Louys, MCF (Assistant Professor)
Centre de données Astronomiques (CDS) Equipe Images, ICube
Observatoire de Strasbourg Telecom Physique Strasbourg
11, rue de l' Université 300, Bd Sebastien Brandt CS 10413
F-67000 Strasbourg F-67412 Illkirch Cedex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/heig/attachments/20260521/2e64ad4a/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-2.png
Type: image/png
Size: 581 bytes
Desc: not available
URL: <http://mail.ivoa.net/pipermail/heig/attachments/20260521/2e64ad4a/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-3.png
Type: image/png
Size: 21717 bytes
Desc: not available
URL: <http://mail.ivoa.net/pipermail/heig/attachments/20260521/2e64ad4a/attachment-0003.png>
More information about the heig
mailing list