[Heig] vocabulary update: proposal for dataproduct_type update for high energy data : event-list definition and event-bundle
Mireille Louys
mireille.louys at unistra.fr
Mon May 19 20:43:52 CEST 2025
Hi Ian, hi folks,
Thanks for the various querying scenarios.
I understand the PSF use case you describe would be something like:
give me all dataproduct_type='psf' with additional constraints like
SELECT TOP 100 * FROM ivoa.ObsCore
WHERE obs_id='22830' and dataproduct_type='psf'
How is the relationship between a PSF data product with a data file?
through obs_id, obs_publisher_id, ?
Can we explore how the obscore1.1 table would be filled for a PSF
data product?
are there file examples?
I understand we identify here a vocabulary to identify various irf file
like : "psf , irf, arf, rmf, background noise, etc..."
with a kind of hierarchy like
/irf/
/arf as child of irf
/
/rmf as child of irf
/
/psf as child of irf /
/back_ground_noise /
with the assumption that em_ucd, sxel1, sxel2, em_xel etc. can be filled
in the Obscore table.
Parametric functions could not be described here.
so if these are obscore data product_types we should be able to describe
these features.
the "content_qualifier" used in data link, if I remember well, can be
something else than an obscore data product type also.
(to be checked)
I will look deeper in this and come back on the topic soon,
Cheers, Mireille
Le 30/04/2025 à 19:52, Dr. Ian N. Evans via semantics a écrit :
> Hi Markus,
>
> See inline comments below.
>
>> On Apr 25, 2025, at 03:29, Markus Demleitner via heig <heig at ivoa.net>
>> wrote:
>>
>> Dear Mireille,
>>
>> Thanks for your VEP.
>>
>> On Thu, Apr 24, 2025 at 06:16:36PM +0200, Mireille Louys via
>> semantics wrote:
>>> • Proposedefinitionsforaproduct-type *event-bundle:* An
>>> event-bundledataset
>>> is a complex object containing an event-list and multiple files or
>>> other substructures that are products necessary to analyse the
>>> event-list.
>>> Data in an event-bundle may thus be used to produce higher leveldata
>>> products such as images or spectra.
>>
>> I think the definition is reasonably clear and applicable in
>> practice. Before merging this, however, I'd have a few requests for
>> clarification:
>>
>> (1) used-in: I really, *really* would like to see actual, published
>> data here (always, in all VEPs; it's a pain if we go into all the
>> trouble of defining a concept and then nobody's ever using it in
>> practice). I see that CSC on
>> https://www.google.com/url?q=http://cda.cfa.harvard.edu/csctap&source=gmail-imap&ust=1746354088000000&usg=AOvVaw0j-MeVn0RHNoz8KtLXi1tH
>> (or
>> https://www.google.com/url?q=http://cda.cfa.harvard.edu/csc21tap&source=gmail-imap&ust=1746354088000000&usg=AOvVaw3xeGe6fO6Vz7vemy8dYWD2
>> [1]) has an obscore table. It
>> would really be excellent if they could mark up their event bundles
>> with the new term, such that we could say:
>>
>> used-in: dataset ivo://csc.harvard.edu/scsr2?some-obs-id on
>> https://www.google.com/url?q=http://cda.cfa.harvard.edu/csctap&source=gmail-imap&ust=1746354088000000&usg=AOvVaw0j-MeVn0RHNoz8KtLXi1tH
>>
>> That would help me maintain a clear consciousness when setting up the
>> new term[2].
>
> I’m not sure that we would use event bundles for any of the CSC data
> product types, since our catalog users generally want to discover and
> retrieve individual data products (typically, many of the same data
> product such as a light curve or aperture photometry MPDF for a set of
> sources, such as all high redshift quasars).
>
> We do make event bundles available through the Chandra Data Archive
> (CDA) ObsCore for individual observations, but we have been doing this
> for many years and therefore those bundles do not conform to the
> proposed ObsCore HE extension. For example, at
> https://cda.harvard.edu/cxctap one might do a query like
> SELECT TOP 100
> *
> FROM ivoa.ObsCore
> WHERE obs_id='22830'
>
> The bundles currently have dataproduct_type = ‘’ and correspond to the
> Chandra “primary” and “primary” + “secondary” data product categories
> for the observation. The “primary” package includes a basic set of
> data products that an observer would need to analyze the observation
> and produce photometrically calibrated spectra, while the “secondary”
> set would be needed in addition if the user wanted to recalibrate the
> observation with updated calibrations (which we recommend because some
> updated calibrations only become available typically several months
> after an observation is completed as calibrations such as detector
> gain change with time). The bundles are tarballs and have
> access_format “application/x-tar”.
>
> Note that our bundles do not include responses, unlike (e.g.) CTAO.
> This is partly because the spacecraft dithers on the sky during an
> observation so any target moves across the detector during the
> observation. So the responses depend critically on the user’s choice
> of data filtering. We follow the HEASARC standards and provide
> separate RMF and ARF data products rather than a combined IRF. For
> spectral fitting, the integrated ARF (i.e., the effective area
> integrated over an energy range) also depends on the source spectral
> model, so we provide tools for the user to compute the responses and
> the data products needed to do so in the “primary” bundle. See
> https://cxc.harvard.edu/cda/DataProdList.html#A for the list of
> Chandra primary and secondary products.
>
>
>>
>> (2) Relationship: That's an operational field, i.e., I need to create
>> an RDF triple from this. The question thus is: is #event-list wider
>> than #event-bundle or is it the other way round? I could conjure up
>> arguments for both, so, as usual, I'd approach the question from the
>> user side: If I'm looking for #event-bundle, do I want to see
>> #event-list, too? If I'm looking for #event-list, do I want to see
>> #event-bundle, too? Whatever ought to encompass the other is the
>> wider term.
>
> I would guess probably not, but maybe. I think an event bundle will
> always include the event list, so if you are asking for an event
> bundle then the event list would be redundant. Note that an event
> bundle could really be a physical bundle such as a tar file (as is the
> case for Chandra) or perhaps all the products are accessed via
> DataLink. If somebody asks for an event list, they probably just want
> the event list but perhaps they don’t know about event bundles and
> therefore might like to see both. There are definitely cases where
> they really do want just the event list - for example, a lot of folks
> are interested only in morphological studies and, especially if they
> want all observations of an object or class of object, bundles may add
> a lot of extra unwanted data volume. Would it hurt to see them? Well
> if your query returns 500 event lists and also 500 bundles it just
> muddies the waters. For users of advanced data products (e.g.,
> Chandra Source Catalog products) I suspect that bundles would likely
> never be desired (and as I said above, I don’t think we would provide
> them).
>
> Conversely, some folks may only want responses and not the associated
> event lists. At the catalog level for example, we have lots of folks
> who just want to download point spread functions. Why? The Chandra
> PSF varies strongly (~factor of 50) across the field of view (with
> off-axis angle and azimuthal angle) and also with energy. We actually
> archive the local PSF for every catalog detection at several energies,
> and with ~50,000 counts. So we effectively have a library of
> (currently) ~1.3 million Chandra PSFs at several energies that are
> catalog advanced data products and may save users from having to
> create their own PSFs.
>
> A lot of our actual catalog usage cases start with users querying the
> catalog, then refining their sample and retrieving a subset of catalog
> data products (likely including event lists), then retrieving
> additional data products such as light curves, spectra, or aperture
> photometry products as they work through their analyses. These
> products are not part of bundles, but we find this usage pattern of
> wanting to retrieve additional products in stages to be quite common
> for folks who are doing archival science (vs. those who are retrieving
> their own observation data).
>
>
>>
>> (3) Rationale: If the answer to both of the two questions in the
>> preceding paragraph is "Yes", then it turns out the concepts are
>> identical (A ⊂ B and B ⊂ A implies A = B), and hence you really don't
>> want a new concept but augement #event-list to be something like,
>> say, "Event list, possibly augmented with ancillary information".
>> This points to an issue with your rationale: It basically argues that
>> there's something you would like to say.
>>
>> An aphorism I'm bringing up rather often these days is: "In protocol
>> design, don't think about what you want to say. Think about what
>> others want to listen to." Hence, it's be really great if the
>> rationale said why someone would want to look for #event-bundle
>> *rather than* #event-list (or for #event-list rather than
>> #event-bundle, if the the former is the narrower term). Could you
>> provide that information in the Rationale section?
>>
>> Thanks,
>>
>> Markus
>>
>>
>> [1] Regrettably, the CSC TAP services seem to be mildly broken at the
>> moment. Coming in with http, they issue https redirects which
>> confuse TOPCAT; CXC folks: if you really need the forced redirects
>> (see
>> <https://www.google.com/url?q=https://blog.tfiu.de/foced-https-redirects-considered-harmful.html&source=gmail-imap&ust=1746354088000000&usg=AOvVaw2y6JI9YuzGb_dp9cgELJym>
>> for
>> a better alternative) then please update your registry records to
>> point to the https URIs. Even with https, however, I'm getting a
>> "cscrel2.dbo.obscore not found" error from TOPCAT when running
>>
>> select top 30 * from ivoa.obscore where dataproduct_type='event-list'
>>
>> It would be great if you could fix that (and a regular run of stilts
>> taplint is good practice anyway)
>
> The https to http redirects is a known issue and should be fixed in
> the next full data system release, which is scheduled for mid-June.
>
> We’ll look into the other issue you reported.
>
>
>>
>> [2] You see,
>> <https://www.google.com/url?q=https://ivoa.net/documents/Vocabularies/20230206/REC-Vocabularies-2.1.html%23tth_sEcC.2&source=gmail-imap&ust=1746354088000000&usg=AOvVaw0vPm8jxoQ6ePuvFWnKr6ez>,
>> while not exactly normative, is clear on:
>>
>> In particular, ensure [...] resources mentioned in Used-in can be
>> reached and reflect the proposed term [...]
>>
>> --
>> heig mailing list
>> heig at ivoa.net
>> https://www.google.com/url?q=http://mail.ivoa.net/mailman/listinfo/heig&source=gmail-imap&ust=1746354088000000&usg=AOvVaw0wTMYFfnqD-UBwfT4NR3kr
>
>
> Thanks,
> —Ian
>
> —
> Dr. Ian Evans
> *Astrophysicist*
> *Chandra X-ray Center*
> Center for Astrophysics | Harvard & Smithsonian
> Office: (617) 496 7846 | Cell: (617) 699 5152
> 60 Garden Street | MS 81 | Cambridge, MA 02138
>
> _
>
> <http://cfa.harvard.edu/>__cfa.harvard.edu
> <http://cfa.harvard.edu/>_ | _Facebook
> <http://cfa.harvard.edu/facebook>_ | _Twitter
> <http://cfa.harvard.edu/twitter>_ | _YouTube
> <http://cfa.harvard.edu/youtube>_ | _Newsletter
> <http://cfa.harvard.edu/newsletter>_
>
--
--
Mireille Louys, MCF (Assistant Professor)
Centre de données Astronomiques (CDS) Equipe Images, ICube
Observatoire de Strasbourg Telecom Physique Strasbourg
11, rue de l' Université 300, Bd Sebastien Brandt CS 10413
F-67000 Strasbourg F-67412 Illkirch Cedex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/heig/attachments/20250519/9b851d7d/attachment-0001.htm>
More information about the heig
mailing list