[Heig] vocabulary update: proposal for dataproduct_type update for high energy data : event-list definition and event-bundle
BONNAREL FRANCOIS gmail
francois.bonnarel at gmail.com
Wed May 28 22:24:44 CEST 2025
Hi Ian, all,
1 ) Using a new IVOA vocabulary for "response functions" in the
dataproduct_subtype of ObsCore is perfectly allowed
See this section of the spec
> B.1.2. Data Product Subtype (dataproduct_subtype)
> In order to be more precise the data product type may be refined with
> a second field, the data
> product subtype. Unlike the more generic dataproduct_type, this field
> is intended to precisely
> specify the scientific nature of the data product, possibly in terms
> relevant only to a specific
> archive or data collection. While less useful for global data
> discovery this allows the data
> products within a specific archive to be precisely identified and
> referenced in queries of that
> specific archive. The data provider should define a vocabulary
> sufficient to classify all science
> data products in their archive to be exposed with ObsTAP. In the
> future we may be able to
> define broader standards to classify data at this level, although it
> will likely always be the case
> that data differs at the level of specific instrumental survey data
> collections. The data product
> subtype allows data within a specific archive or data collection to be
> precisely classified and
> referenced in subsequent discovery queries. The list of terms defined
> for dataproduct_subtype
> labels should be published by the data centers and documented. These
> can be gathered as
> an IVOA vocabulary in a RDF dedicated resource for instance.
> However, in the meantime, the vocabulary used can be discovered easily
> by a simple query
> based on ‘select distinct …’
Distinguishing terms which are defined by the data provider only and
others which are part of an "official" IVOA vocabulary can easily be done*
by using the full URI of the term in the latter case. For example
dataproduct_type : cube
dataproduct_subtype : https://www.ivoa.net/rdf/responsefunction_type#psf
beside
dataproduct_type : spectrum
dataproduct_subtype :hard-Xray (or whatever provider dependant term)
2 ) Thinking more about it and reading one of Bruno's emails claiming
that response functions may be multi-d cubes, spectra, or images I don't
think we need to set dataproduct_type =
https://www.ivoa.net/rdf/responsefunction_type#response-function (even
if I wrote that a week ago).
Indeed , our IVOA vocabularies are hierarchical and recognition software
should take this into account
A query with a constraint such as "WHERE
ivoa_smaller(dataproduct_subtype,https://www.ivoa.net/rdf/responsefunction_type#response-function")
should validate for #psf, #lsf, #arf, etc....
3 ) This of course would be interesting only in the case that response
function can be described by ObsCore attributes, which is not obvious
after reading the discussion between Ian and Mireille
Cheers
François
Le 21/05/2025 à 20:16, Dr. Ian N. Evans via heig a écrit :
> Hi Mireille,
>
> In the current draft of the HEA ObsCore note, I access a PSF using
>
> dataproduct_type = ‘response-function’
> dataproduct_subtype = ‘psf’
>
> because a PSF is a type of response-function (and there are many types
> of response function so adding all of these separately as different
> dataproduct_type would grow the list very significantly. Having said
> that, using dataproduct_subtype may not be ideal because there is not
> a vocabulary for the latter defined in ObsCore. In the future, if we
> were to migrate to a dataproduct_type vocabulary that included all the
> different types of products then I might do things differently.
>
> For Chandra, the PSF is dependent on off-axis (theta) and azimuthal
> (phi) angles relative to the mirror optical axis, and also energy (and
> the detector can be moved relative to the mirror optical axis so this
> does not translate to detector coordinates). Note that this also
> means that the PSF varies significantly across the field of view of a
> single observation. Trying to query to find a PSF in this manner
> would require significant enhancements to ObsCore to support
> non-celestial coordinate systems that will be facility-dependent.
> However, to identify PSFs associated with a particular source
> detection in a single observation, one could use a query like (current
> use case A.1.3)
>
> SELECT * FROM ivoa.obscore
>
> WHERE
>
> (CONTAINS(POINT(s_ra, s_dec), CIRCLE, 83.84358, -5.43639, 0.033333) = 1)
>
> AND (dataproduct_type EQ ‘response-function’)
>
> AND (dataproduct_subtype EQ ‘psf’)
>
> AND (obs_id = ‘4374’)
>
> AND (obs_collection = ‘CSC2’)
>
>
> since we have specified both an obs_id and a position on the sky -
> which for a single observation will map to a specific (theta, phi).
> This would return all PSFs for different energies for that source
> detection and observation from CSC release 2. Note that if I didn’t
> specify the obs_id, I would get PSFs for all of the observations that
> included that location on the sky (and the different energies) from
> the catalog. You would need the obs_id to tie them to a specific
> observation. However, perhaps you want to identify which (if any)
> observations that include your source have PSFs that are small enough
> to deblend a nearby source (this can be complicated for Chandra since
> off-axis PSFs have very complex, asymmetric structures that vary with
> bth theta and phi).
>
>
> We don’t have any extant ObsCore tables for PSFs. I have tried to
> mock up a conceptual single entry ObsCore table for a Chandra PSF -
> hopefully it will be helpful. I don’t know that all the values are
> populated correctly, or even if the table and column values would all
> validate.
>
>
>
>
> With regard to vocabulary, I prefer “response-function” as the higher
> level concept rather than “irf” (we started out with just “response”
> but Markus convinced me that term was too overloaded and
> “response-function” is technically correct. Response-functions are
> widely applicable across multiple wavebands. For example, a point
> spread function is a type of response-function that is used across all
> wavebands. Similarly, a line spread function is a response-function
> used in UV through IR spectroscopy. The term “irf” is not generally
> used across all high-energy projects. In the USA, most high-energy
> projects follow the NASA HEASARC OGIP standards, and so will use “rmf”
> for the redistribution matrix file” and “arf” for the “auxiliary
> response file” (and will keep these separate). Internationally, some
> projects historically used “irf” to represent the product of the rmf
> and arf. More recently some projects have used “irf” as the
> equivalent of “response-function” giving it a broader interpretation.
> So this can be a source of confusion and lack of clarity. I also
> note that “irf” stands for “instrument response function” and there
> are certainly response-functions such as software filters (e.g., a
> modified Hanning filter used for optimal extraction) where
> “instrument” would be a misnomer.
>
>
> I might suggest something more like
>
> response-function
>
> arf as child of response-function
>
> rmf as child of response-function
>
> psf as child of response-function
>
> lsf as child of response-function (not HEA)
>
> ...
>
> Is irf a child of response-function? The original usage of irf as
> product of rmf and arf definitely would be. Interpreted in a more
> general way I would still say yes (“instrument response function” is a
> child of “response function”.
>
> I’m not convinced that background/background rate/background noise are
> response-functions, and these concepts have much wider applicability
> across multiple wavebands.
>
> Cheers,
> —Ian
>
>> On May 19, 2025, at 14:43, Mireille Louys via heig <heig at ivoa.net> wrote:
>>
>> Hi Ian, hi folks,
>>
>> Thanks for the various querying scenarios.
>>
>> I understand the PSF use case you describe would be something like:
>>
>> give me all dataproduct_type='psf' with additional constraints like
>>
>> SELECT TOP 100 * FROM ivoa.ObsCore
>> WHERE obs_id='22830' and dataproduct_type='psf'
>>
>> How is the relationship between a PSF data product with a data file?
>> through obs_id, obs_publisher_id, ?
>>
>> Can we explore how the obscore1.1 table would be filled for a PSF
>> data product?
>> are there file examples?
>>
>> I understand we identify here a vocabulary to identify various irf
>> file like : "psf , irf, arf, rmf, background noise, etc..."
>> with a kind of hierarchy like
>> /irf/
>> /arf as child of irf
>> /
>> /rmf as child of irf
>> /
>> /psf as child of irf /
>> /back_ground_noise /
>>
>> with the assumption that em_ucd, sxel1, sxel2, em_xel etc. can be
>> filled in the Obscore table.
>> Parametric functions could not be described here.
>>
>> so if these are obscore data product_types we should be able to
>> describe these features.
>>
>> the "content_qualifier" used in data link, if I remember well, can
>> be something else than an obscore data product type also.
>>
>> (to be checked)
>>
>> I will look deeper in this and come back on the topic soon,
>>
>> Cheers, Mireille
>>
>>
>> Le 30/04/2025 à 19:52, Dr. Ian N. Evans via semantics a écrit :
>>> Hi Markus,
>>>
>>> See inline comments below.
>>>
>>>> On Apr 25, 2025, at 03:29, Markus Demleitner via heig
>>>> <heig at ivoa.net> wrote:
>>>>
>>>> Dear Mireille,
>>>>
>>>> Thanks for your VEP.
>>>>
>>>> On Thu, Apr 24, 2025 at 06:16:36PM +0200, Mireille Louys via
>>>> semantics wrote:
>>>>> • Proposedefinitionsforaproduct-type *event-bundle:* An
>>>>> event-bundledataset
>>>>> is a complex object containing an event-list and multiple files or
>>>>> other substructures that are products necessary to analyse the
>>>>> event-list.
>>>>> Data in an event-bundle may thus be used to produce higher leveldata
>>>>> products such as images or spectra.
>>>>
>>>> I think the definition is reasonably clear and applicable in
>>>> practice. Before merging this, however, I'd have a few requests for
>>>> clarification:
>>>>
>>>> (1) used-in: I really, *really* would like to see actual, published
>>>> data here (always, in all VEPs; it's a pain if we go into all the
>>>> trouble of defining a concept and then nobody's ever using it in
>>>> practice). I see that CSC on
>>>> https://www.google.com/url?q=http://cda.cfa.harvard.edu/csctap&source=gmail-imap&ust=1746354088000000&usg=AOvVaw0j-MeVn0RHNoz8KtLXi1tH
>>>> (or
>>>> https://www.google.com/url?q=http://cda.cfa.harvard.edu/csc21tap&source=gmail-imap&ust=1746354088000000&usg=AOvVaw3xeGe6fO6Vz7vemy8dYWD2
>>>> [1]) has an obscore table. It
>>>> would really be excellent if they could mark up their event bundles
>>>> with the new term, such that we could say:
>>>>
>>>> used-in: dataset ivo://csc.harvard.edu/scsr2?some-obs-id on
>>>> https://www.google.com/url?q=http://cda.cfa.harvard.edu/csctap&source=gmail-imap&ust=1746354088000000&usg=AOvVaw0j-MeVn0RHNoz8KtLXi1tH
>>>>
>>>> That would help me maintain a clear consciousness when setting up the
>>>> new term[2].
>>>
>>> I’m not sure that we would use event bundles for any of the CSC data
>>> product types, since our catalog users generally want to discover
>>> and retrieve individual data products (typically, many of the same
>>> data product such as a light curve or aperture photometry MPDF for a
>>> set of sources, such as all high redshift quasars).
>>>
>>> We do make event bundles available through the Chandra Data Archive
>>> (CDA) ObsCore for individual observations, but we have been doing
>>> this for many years and therefore those bundles do not conform to
>>> the proposed ObsCore HE extension. For example, at
>>> https://cda.harvard.edu/cxctap one might do a query like
>>> SELECT TOP 100
>>> *
>>> FROM ivoa.ObsCore
>>> WHERE obs_id='22830'
>>>
>>> The bundles currently have dataproduct_type = ‘’ and correspond to
>>> the Chandra “primary” and “primary” + “secondary” data product
>>> categories for the observation. The “primary” package includes a
>>> basic set of data products that an observer would need to analyze
>>> the observation and produce photometrically calibrated spectra,
>>> while the “secondary” set would be needed in addition if the user
>>> wanted to recalibrate the observation with updated calibrations
>>> (which we recommend because some updated calibrations only become
>>> available typically several months after an observation is completed
>>> as calibrations such as detector gain change with time). The
>>> bundles are tarballs and have access_format “application/x-tar”.
>>>
>>> Note that our bundles do not include responses, unlike (e.g.) CTAO.
>>> This is partly because the spacecraft dithers on the sky during an
>>> observation so any target moves across the detector during the
>>> observation. So the responses depend critically on the user’s
>>> choice of data filtering. We follow the HEASARC standards and
>>> provide separate RMF and ARF data products rather than a combined
>>> IRF. For spectral fitting, the integrated ARF (i.e., the effective
>>> area integrated over an energy range) also depends on the source
>>> spectral model, so we provide tools for the user to compute the
>>> responses and the data products needed to do so in the “primary”
>>> bundle. See https://cxc.harvard.edu/cda/DataProdList.html#A for the
>>> list of Chandra primary and secondary products.
>>>
>>>
>>>>
>>>> (2) Relationship: That's an operational field, i.e., I need to create
>>>> an RDF triple from this. The question thus is: is #event-list wider
>>>> than #event-bundle or is it the other way round? I could conjure up
>>>> arguments for both, so, as usual, I'd approach the question from the
>>>> user side: If I'm looking for #event-bundle, do I want to see
>>>> #event-list, too? If I'm looking for #event-list, do I want to see
>>>> #event-bundle, too? Whatever ought to encompass the other is the
>>>> wider term.
>>>
>>> I would guess probably not, but maybe. I think an event bundle will
>>> always include the event list, so if you are asking for an event
>>> bundle then the event list would be redundant. Note that an event
>>> bundle could really be a physical bundle such as a tar file (as is
>>> the case for Chandra) or perhaps all the products are accessed via
>>> DataLink. If somebody asks for an event list, they probably just
>>> want the event list but perhaps they don’t know about event bundles
>>> and therefore might like to see both. There are definitely cases
>>> where they really do want just the event list - for example, a lot
>>> of folks are interested only in morphological studies and,
>>> especially if they want all observations of an object or class of
>>> object, bundles may add a lot of extra unwanted data volume. Would
>>> it hurt to see them? Well if your query returns 500 event lists and
>>> also 500 bundles it just muddies the waters. For users of advanced
>>> data products (e.g., Chandra Source Catalog products) I suspect that
>>> bundles would likely never be desired (and as I said above, I don’t
>>> think we would provide them).
>>>
>>> Conversely, some folks may only want responses and not the
>>> associated event lists. At the catalog level for example, we have
>>> lots of folks who just want to download point spread functions.
>>> Why? The Chandra PSF varies strongly (~factor of 50) across the
>>> field of view (with off-axis angle and azimuthal angle) and also
>>> with energy. We actually archive the local PSF for every catalog
>>> detection at several energies, and with ~50,000 counts. So we
>>> effectively have a library of (currently) ~1.3 million Chandra PSFs
>>> at several energies that are catalog advanced data products and may
>>> save users from having to create their own PSFs.
>>>
>>> A lot of our actual catalog usage cases start with users querying
>>> the catalog, then refining their sample and retrieving a subset of
>>> catalog data products (likely including event lists), then
>>> retrieving additional data products such as light curves, spectra,
>>> or aperture photometry products as they work through their analyses.
>>> These products are not part of bundles, but we find this usage
>>> pattern of wanting to retrieve additional products in stages to be
>>> quite common for folks who are doing archival science (vs. those who
>>> are retrieving their own observation data).
>>>
>>>
>>>>
>>>> (3) Rationale: If the answer to both of the two questions in the
>>>> preceding paragraph is "Yes", then it turns out the concepts are
>>>> identical (A ⊂ B and B ⊂ A implies A = B), and hence you really don't
>>>> want a new concept but augement #event-list to be something like,
>>>> say, "Event list, possibly augmented with ancillary information".
>>>> This points to an issue with your rationale: It basically argues that
>>>> there's something you would like to say.
>>>>
>>>> An aphorism I'm bringing up rather often these days is: "In protocol
>>>> design, don't think about what you want to say. Think about what
>>>> others want to listen to." Hence, it's be really great if the
>>>> rationale said why someone would want to look for #event-bundle
>>>> *rather than* #event-list (or for #event-list rather than
>>>> #event-bundle, if the the former is the narrower term). Could you
>>>> provide that information in the Rationale section?
>>>>
>>>> Thanks,
>>>>
>>>> Markus
>>>>
>>>>
>>>> [1] Regrettably, the CSC TAP services seem to be mildly broken at the
>>>> moment. Coming in with http, they issue https redirects which
>>>> confuse TOPCAT; CXC folks: if you really need the forced redirects
>>>> (see
>>>> <https://www.google.com/url?q=https://blog.tfiu.de/foced-https-redirects-considered-harmful.html&source=gmail-imap&ust=1746354088000000&usg=AOvVaw2y6JI9YuzGb_dp9cgELJym>
>>>> for
>>>> a better alternative) then please update your registry records to
>>>> point to the https URIs. Even with https, however, I'm getting a
>>>> "cscrel2.dbo.obscore not found" error from TOPCAT when running
>>>>
>>>> select top 30 * from ivoa.obscore where dataproduct_type='event-list'
>>>>
>>>> It would be great if you could fix that (and a regular run of stilts
>>>> taplint is good practice anyway)
>>>
>>> The https to http redirects is a known issue and should be fixed in
>>> the next full data system release, which is scheduled for mid-June.
>>>
>>> We’ll look into the other issue you reported.
>>>
>>>
>>>>
>>>> [2] You see,
>>>> <https://www.google.com/url?q=https://ivoa.net/documents/Vocabularies/20230206/REC-Vocabularies-2.1.html%23tth_sEcC.2&source=gmail-imap&ust=1746354088000000&usg=AOvVaw0vPm8jxoQ6ePuvFWnKr6ez>,
>>>> while not exactly normative, is clear on:
>>>>
>>>> In particular, ensure [...] resources mentioned in Used-in can be
>>>> reached and reflect the proposed term [...]
>>>>
>>>> --
>>>> heig mailing list
>>>> heig at ivoa.net
>>>> https://www.google.com/url?q=http://mail.ivoa.net/mailman/listinfo/heig&source=gmail-imap&ust=1746354088000000&usg=AOvVaw0wTMYFfnqD-UBwfT4NR3kr
>>>
>>>
>>> Thanks,
>>> —Ian
>>>
>>> —
>>> Dr. Ian Evans
>>> *Astrophysicist*
>>> *Chandra X-ray Center*
>>> Center for Astrophysics | Harvard & Smithsonian
>>> Office: (617) 496 7846 | Cell: (617) 699 5152
>>> 60 Garden Street | MS 81 | Cambridge, MA 02138
>>>
>>> _
>>>
>>> <https://www.google.com/url?q=http://cfa.harvard.edu/&source=gmail-imap&ust=1748285049000000&usg=AOvVaw0-pYU--F-N7Y0nKCGNJSH8>__cfa.harvard.edu
>>> <https://www.google.com/url?q=http://cfa.harvard.edu/&source=gmail-imap&ust=1748285049000000&usg=AOvVaw0-pYU--F-N7Y0nKCGNJSH8>_ |
>>> _Facebook
>>> <https://www.google.com/url?q=http://cfa.harvard.edu/facebook&source=gmail-imap&ust=1748285049000000&usg=AOvVaw2rk5ZZumVXMxyElN-W6EeR>_ |
>>> _Twitter
>>> <https://www.google.com/url?q=http://cfa.harvard.edu/twitter&source=gmail-imap&ust=1748285049000000&usg=AOvVaw2qSrhy1C8EQlIDrIs4dNZN>_ |
>>> _YouTube
>>> <https://www.google.com/url?q=http://cfa.harvard.edu/youtube&source=gmail-imap&ust=1748285049000000&usg=AOvVaw3RvQbhE3Dp0Z7rZEBILot3>_ |
>>> _Newsletter
>>> <https://www.google.com/url?q=http://cfa.harvard.edu/newsletter&source=gmail-imap&ust=1748285049000000&usg=AOvVaw2nTLboM54_QXj6kOxrA0ZW>_
>>>
>>>
>> --
>> --
>> Mireille Louys, MCF (Assistant Professor)
>> Centre de données Astronomiques (CDS) Equipe Images, ICube
>> Observatoire de Strasbourg Telecom Physique Strasbourg
>> 11, rue de l' Université 300, Bd Sebastien Brandt CS 10413
>> F-67000 Strasbourg F-67412 Illkirch Cedex
>> --
>> heig mailing list
>> heig at ivoa.net
>> https://www.google.com/url?q=http://mail.ivoa.net/mailman/listinfo/heig&source=gmail-imap&ust=1748285049000000&usg=AOvVaw18dfRwCG5grk9uR5hZc8Kj
>
> —
> Dr. Ian Evans
> *Astrophysicist*
> *Chandra X-ray Center*
> Center for Astrophysics | Harvard & Smithsonian
> Office: (617) 496 7846 | Cell: (617) 699 5152
> 60 Garden Street | MS 81 | Cambridge, MA 02138
>
>
>
>
>
> _
>
> <http://cfa.harvard.edu/>__cfa.harvard.edu
> <http://cfa.harvard.edu/>_ | _Facebook
> <http://cfa.harvard.edu/facebook>_ | _Twitter
> <http://cfa.harvard.edu/twitter>_ | _YouTube
> <http://cfa.harvard.edu/youtube>_ | _Newsletter
> <http://cfa.harvard.edu/newsletter>_
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/heig/attachments/20250528/464e12c6/attachment-0001.htm>
More information about the heig
mailing list