[Heig] vocabulary update: proposal for dataproduct_type update for high energy data : event-list definition and event-bundle

Mireille Louys mireille.louys at unistra.fr
Mon May 19 20:43:52 CEST 2025


Hi Ian, hi folks,

Thanks for the various querying scenarios.

I understand the PSF  use case you describe would be something like:

give me all dataproduct_type='psf' with additional constraints like

SELECT TOP 100 * FROM ivoa.ObsCore
WHERE obs_id='22830' and dataproduct_type='psf'

How is the relationship between a PSF data product with a data file?
through obs_id, obs_publisher_id, ?

Can we explore how  the obscore1.1  table would be filled  for a PSF 
data product?
are there file examples?

I understand we identify here a vocabulary to identify various irf file 
like : "psf , irf, arf, rmf, background noise, etc..."
with a kind of hierarchy like
/irf/
/arf as child of irf
/
/rmf as child of irf
/
/psf as child of irf /
/back_ground_noise /

with the assumption that em_ucd, sxel1, sxel2, em_xel etc. can be filled 
in the Obscore table.
Parametric functions could not be described here.

so if these are obscore data product_types we should be able to describe 
these features.

the "content_qualifier" used in data link,  if I remember well, can be 
something else than an obscore data product type also.

(to be checked)

I will look deeper in this and come back on the topic soon,

Cheers, Mireille


Le 30/04/2025 à 19:52, Dr. Ian N. Evans via semantics a écrit :
> Hi Markus,
>
> See inline comments below.
>
>> On Apr 25, 2025, at 03:29, Markus Demleitner via heig <heig at ivoa.net> 
>> wrote:
>>
>> Dear Mireille,
>>
>> Thanks for your VEP.
>>
>> On Thu, Apr 24, 2025 at 06:16:36PM +0200, Mireille Louys via 
>> semantics wrote:
>>> • Proposedefinitionsforaproduct-type *event-bundle:* An 
>>> event-bundledataset
>>> is a complex object containing an event-list and multiple files or
>>> other substructures that are products necessary to analyse the 
>>> event-list.
>>> Data in an event-bundle may thus be used to produce higher leveldata
>>> products such as images or spectra.
>>
>> I think the definition is reasonably clear and applicable in
>> practice.  Before merging this, however, I'd have a few requests for
>> clarification:
>>
>> (1) used-in: I really, *really* would like to see actual, published
>> data here (always, in all VEPs; it's a pain if we go into all the
>> trouble of defining a concept and then nobody's ever using it in
>> practice).  I see that CSC on 
>> https://www.google.com/url?q=http://cda.cfa.harvard.edu/csctap&source=gmail-imap&ust=1746354088000000&usg=AOvVaw0j-MeVn0RHNoz8KtLXi1tH 
>> (or
>> https://www.google.com/url?q=http://cda.cfa.harvard.edu/csc21tap&source=gmail-imap&ust=1746354088000000&usg=AOvVaw3xeGe6fO6Vz7vemy8dYWD2 
>> [1]) has an obscore table.  It
>> would really be excellent if they could mark up their event bundles
>> with the new term, such that we could say:
>>
>>  used-in: dataset ivo://csc.harvard.edu/scsr2?some-obs-id on 
>> https://www.google.com/url?q=http://cda.cfa.harvard.edu/csctap&source=gmail-imap&ust=1746354088000000&usg=AOvVaw0j-MeVn0RHNoz8KtLXi1tH
>>
>> That would help me maintain a clear consciousness when setting up the
>> new term[2].
>
> I’m not sure that we would use event bundles for any of the CSC data 
> product types, since our catalog users generally want to discover and 
> retrieve individual data products (typically, many of the same data 
> product such as a light curve or aperture photometry MPDF for a set of 
> sources, such as all high redshift quasars).
>
> We do make event bundles available through the Chandra Data Archive 
> (CDA) ObsCore for individual observations, but we have been doing this 
> for many years and therefore those bundles do not conform to the 
> proposed ObsCore HE extension.  For example, at 
> https://cda.harvard.edu/cxctap one might do a query like
> SELECT TOP 100
>        *
> FROM ivoa.ObsCore
> WHERE obs_id='22830'
>
> The bundles currently have dataproduct_type = ‘’ and correspond to the 
> Chandra “primary” and “primary” + “secondary” data product categories 
> for the observation.  The “primary” package includes a basic set of 
> data products that an observer would need to analyze the observation 
> and produce photometrically calibrated spectra, while the “secondary” 
> set would be needed in addition if the user wanted to recalibrate the 
> observation with updated calibrations (which we recommend because some 
> updated calibrations only become available typically several months 
> after an observation is completed as calibrations such as detector 
> gain change with time).  The bundles are tarballs and have 
> access_format “application/x-tar”.
>
> Note that our bundles do not include responses, unlike (e.g.) CTAO. 
>  This is partly because the spacecraft dithers on the sky during an 
> observation so any target moves across the detector during the 
> observation.  So the responses depend critically on the user’s choice 
> of data filtering.  We follow the HEASARC standards and provide 
> separate RMF and ARF data products rather than a combined IRF.  For 
> spectral fitting, the integrated ARF (i.e., the effective area 
> integrated over an energy range) also depends on the source spectral 
> model, so we provide tools for the user to compute the responses and 
> the data products needed to do so in the “primary” bundle.  See 
> https://cxc.harvard.edu/cda/DataProdList.html#A for the list of 
> Chandra primary and secondary products.
>
>
>>
>> (2) Relationship: That's an operational field, i.e., I need to create
>> an RDF triple from this.  The question thus is: is #event-list wider
>> than #event-bundle or is it the other way round?  I could conjure up
>> arguments for both, so, as usual, I'd approach the question from the
>> user side: If I'm looking for #event-bundle, do I want to see
>> #event-list, too?  If I'm looking for #event-list, do I want to see
>> #event-bundle, too?  Whatever ought to encompass the other is the
>> wider term.
>
> I would guess probably not, but maybe.  I think an event bundle will 
> always include the event list, so if you are asking for an event 
> bundle then the event list would be redundant.  Note that an event 
> bundle could really be a physical bundle such as a tar file (as is the 
> case for Chandra) or perhaps all the products are accessed via 
> DataLink.  If somebody asks for an event list, they probably just want 
> the event list but perhaps they don’t know about event bundles and 
> therefore might like to see both.  There are definitely cases where 
> they really do want just the event list - for example, a lot of folks 
> are interested only in morphological studies and, especially if they 
> want all observations of an object or class of object, bundles may add 
> a lot of extra unwanted data volume.  Would it hurt to see them?  Well 
> if your query returns 500 event lists and also 500 bundles it just 
> muddies the waters.  For users of advanced data products (e.g., 
> Chandra Source Catalog products) I suspect that bundles would likely 
> never be desired (and as I said above, I don’t think we would provide 
> them).
>
> Conversely, some folks may only want responses and not the associated 
> event lists.  At the catalog level for example, we have lots of folks 
> who just want to download point spread functions.  Why?  The Chandra 
> PSF varies strongly (~factor of 50) across the field of view (with 
> off-axis angle and azimuthal angle) and also with energy.  We actually 
> archive the local PSF for every catalog detection at several energies, 
> and with ~50,000 counts.  So we effectively have a library of 
> (currently) ~1.3 million Chandra PSFs at several energies that are 
> catalog advanced data products and may save users from having to 
> create their own PSFs.
>
> A lot of our actual catalog usage cases start with users querying the 
> catalog, then refining their sample and retrieving a subset of catalog 
> data products (likely including event lists), then retrieving 
> additional data products such as light curves, spectra, or aperture 
> photometry products as they work through their analyses.  These 
> products are not part of bundles, but we find this usage pattern of 
> wanting to retrieve additional products in stages to be quite common 
> for folks who are doing archival science (vs. those who are retrieving 
> their own observation data).
>
>
>>
>> (3) Rationale: If the answer to both of the two questions in the
>> preceding paragraph is "Yes", then it turns out the concepts are
>> identical (A ⊂ B and B ⊂ A implies A = B), and hence you really don't
>> want a new concept but augement #event-list to be something like,
>> say, "Event list, possibly augmented with ancillary information".
>> This points to an issue with your rationale: It basically argues that
>> there's something you would like to say.
>>
>> An aphorism I'm bringing up rather often these days is: "In protocol
>> design, don't think about what you want to say.  Think about what
>> others want to listen to."  Hence, it's be really great if the
>> rationale said why someone would want to look for #event-bundle
>> *rather than* #event-list (or for #event-list rather than
>> #event-bundle, if the the former is the narrower term).  Could you
>> provide that information in the Rationale section?
>>
>> Thanks,
>>
>>             Markus
>>
>>
>> [1] Regrettably, the CSC TAP services seem to be mildly broken at the
>> moment.  Coming in with http, they issue https redirects which
>> confuse TOPCAT; CXC folks: if you really need the forced redirects
>> (see
>> <https://www.google.com/url?q=https://blog.tfiu.de/foced-https-redirects-considered-harmful.html&source=gmail-imap&ust=1746354088000000&usg=AOvVaw2y6JI9YuzGb_dp9cgELJym> 
>> for
>> a better alternative) then please update your registry records to
>> point to the https URIs.  Even with https, however, I'm getting a
>> "cscrel2.dbo.obscore not found" error from TOPCAT when running
>>
>>  select top 30 * from ivoa.obscore where dataproduct_type='event-list'
>>
>> It would be great if you could fix that (and a regular run of stilts
>> taplint is good practice anyway)
>
> The https to http redirects is a known issue and should be fixed in 
> the next full data system release, which is scheduled for mid-June.
>
> We’ll look into the other issue you reported.
>
>
>>
>> [2] You see,
>> <https://www.google.com/url?q=https://ivoa.net/documents/Vocabularies/20230206/REC-Vocabularies-2.1.html%23tth_sEcC.2&source=gmail-imap&ust=1746354088000000&usg=AOvVaw0vPm8jxoQ6ePuvFWnKr6ez>,
>> while not exactly normative, is clear on:
>>
>>  In particular, ensure [...] resources mentioned in Used-in can be
>>  reached and reflect the proposed term [...]
>>
>> -- 
>> heig mailing list
>> heig at ivoa.net
>> https://www.google.com/url?q=http://mail.ivoa.net/mailman/listinfo/heig&source=gmail-imap&ust=1746354088000000&usg=AOvVaw0wTMYFfnqD-UBwfT4NR3kr
>
>
> Thanks,
> —Ian
>
>> Dr. Ian Evans
> *Astrophysicist*
> *Chandra X-ray Center*
> Center for Astrophysics | Harvard & Smithsonian
> Office: (617) 496 7846 | Cell: (617) 699 5152
> 60 Garden Street | MS 81 | Cambridge, MA 02138
>
> _
>
> <http://cfa.harvard.edu/>__cfa.harvard.edu 
> <http://cfa.harvard.edu/>_ | _Facebook 
> <http://cfa.harvard.edu/facebook>_ | _Twitter 
> <http://cfa.harvard.edu/twitter>_ | _YouTube 
> <http://cfa.harvard.edu/youtube>_ | _Newsletter 
> <http://cfa.harvard.edu/newsletter>_
>
-- 
--
Mireille Louys, MCF (Assistant Professor)
Centre de données Astronomiques (CDS)       Equipe Images, ICube
Observatoire de Strasbourg                  Telecom Physique Strasbourg
11, rue de l' Université                    300, Bd Sebastien Brandt CS 10413
F-67000 Strasbourg                          F-67412  Illkirch Cedex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/semantics/attachments/20250519/9b851d7d/attachment-0001.htm>


More information about the semantics mailing list