[Heig] [EXTERNAL] [BULK] Re: Post running meeting thoughts
Dr. Ian N. Evans
ievans at cfa.harvard.edu
Fri May 8 17:13:55 CEST 2026
Hi Francois,
Thank you for your response.
I don’t think it is correct to say that *the* “main discovery path is to start from sources”, although that is certainly *a* discovery path.
We want CSC data products to be discoverable when somebody searches for data because they may be a better choice for the end-user than the regular Chandra archive data products in many cases, and we don’t want end users missing them because they are not performing source-centric data discovery. For example, because the catalog generates well-curated stacked event-lists, images, and associated products, the signal-to-noise of the catalog products may be much higher than for the individual observation products because the effective exposure time is much longer (perhaps by as much as two orders of magnitude). This is particularly true for more recent data where spacecraft thermal constraints limit the maximum possible single exposure time compared to the start of the mission. Given the large size of some of these products, we also want to give users the option of discovering the cutout data products around detections, since many (though not all) users are interested in detected X-ray sources. The “not all” in the last sentence is important too, since some users are interested in data products that do not have embedded sources; such products are part of the CSC but will never be discoverable using a source-centric approach.
Thanks,
—Ian
> On May 7, 2026, at 03:08, BONNAREL FRANCOIS gmail <francois.bonnarel at gmail.com> wrote:
>
> Hi Tess, Ian, all
>
> Sorry for the long delay
>
> After reading Tess email late in March I took the time to look at Heasarc TAP service and I will comment below. Let's first discuss one
>
> point in Ian's answer.
>
>
>>
>> Hi Tess,
>>
>>
>> I think whether end users want to search for specific datasets vary depending on the data collection and types of data, and the number of and size of data products in the collection.
>>
>> If the data collection consists solely of individual observations processed through standard data processing pipelines and that need further user data analysis that require the set of associated data products to extract science then I agree that the ability to search for the individual observation event list (or perhaps event bundle) with associated data products accessible using datalink is likely sufficient.
>>
>> If the data collection contains advanced data products (for example the Chandra Source Catalog data products) the usage patterns change. Our experience is that users doing catalog science typically identify potential sources of interest and then want to retrieve subsets of data products for those sources, often in several rounds.
>>
>> For example, they may identify hundreds, thousands, or in some cases tens of thousands of candidate sources matching their search criteria, and may subsequently download (e.g.) the light curves for all of the observations of these sources (on average 3 times as many as number of sources), do some automated pre-filtering on the light curves, and then download (e.g.) the cutout event lists surrounding the individual observation detections for further analysis. They might subsequently come back to download the region definitions, and perhaps the individual observation PHA spectra of the detections.
> But if the main discovery path is to start from sources then I realize that the first approach would be a SimpleConeSearch or a TAP query on the catalog seen as a table with parameters estimated for this source. You already have this in Vizier and in your chandra archive (see below within Aladin)
>
> The SCS interface to this catalog in VizieR is the following (with randomly chosen RA, DEC, SR): https://vizier.cds.unistra.fr/viz-bin/conesearch/IX/70/csc21Mas?RA=00.6405&DEC=-08.2216&SR=5.0 <https://www.google.com/url?q=https://vizier.cds.unistra.fr/viz-bin/conesearch/IX/70/csc21Mas?RA%3D00.6405%26DEC%3D-08.2216%26SR%3D5.0&source=gmail-imap&ust=1778742498000000&usg=AOvVaw3OCofQP4w0BxXE_91abncr>
> The result is a VOTable where a DataLink service descriptor could be added in order to point to all related event list, response functions and analysis data products.
>
> This DataLink usage with Source catalogs (instead of ObsTAP/SIA.SSA context) is an extension introduced with version 1.1 and is perfectly adapted to use cases where
>
> you don't know the number and type and content of the additional items you want to attach to the primary resource (here the source record in the table)
>
> This is how light curves are related to sources in corot for example : https://vizier.cds.unistra.fr/viz-bin/conesearch/B/corot/Faint_star?RA=100.94235&DEC=-00.89651&SR=1 <https://www.google.com/url?q=https://vizier.cds.unistra.fr/viz-bin/conesearch/B/corot/Faint_star?RA%3D100.94235%26DEC%3D-00.89651%26SR%3D1&source=gmail-imap&ust=1778742498000000&usg=AOvVaw0RRcuUYFbUt3ovIVLTm4PE>
> You can load this URL in TOPCAT and invoke the service and will discover the different light curves associated with each source
>
> Now, as for Heasarc TAP service, it's very similar in the sense that they often have a DataLink service descriptor in the catalogue response and not in ObsTAP see for example the CALET GamaRay Burst master catalog : https://heasarc.gsfc.nasa.gov/xamin/vo/tap/sync?REQUEST=doQuery&LANG=ADQL&MAXREC=20000000&QUERY=SELECT+TOP+9999+*+FROM+calgbmmstr+ <https://www.google.com/url?q=https://heasarc.gsfc.nasa.gov/xamin/vo/tap/sync?REQUEST%3DdoQuery%26LANG%3DADQL%26MAXREC%3D20000000%26QUERY%3DSELECT%2BTOP%2B9999%2B*%2BFROM%2Bcalgbmmstr%2B&source=gmail-imap&ust=1778742498000000&usg=AOvVaw3pcJGBgmSRsNOF7OUbyj2->
> We can look at this VOTable with help of TOPCAT as you can see below in the second screenshot. There are plenty of links attached to each row in the first table, including the #this one.
>
> The others are all auxiliary. I would only suggest to detail the various flavors of auxiliary in the local semantics field instead of semantics in order to improve interoperability, because #auxiliary only belongs to the standard vocabulary.
>
>>
>> This is a very different usage pattern where end users are retrieving particular data products for potentially a large number of objects, and subsequently refining the list and downloading additional data products, sometimes in multiple steps. One reason for this approach is scale. For example, there are roughly 100x the number of data products, and 10x the data volume, for the Chandra Source Catalog vs. the Chandra data archive data products for the set of processed science observations.
>>
>> Could this be done by requiring end users to search for observations and then using datalink to access the individual data products? Probably not, because many of our data products merge data from multiple observations and it would be very difficult to encode the necessary source — stack detection — observation detection linkages correctly. In any case, doing queries like this en masse and then having to select subsets of datalinks is going to be much more difficult than a simple ObsCore query that directly returns the records (and access_urls) that you are looking for.
> If the issue there is to be required to pass through a two step process, then the TAP interface to the DataLink table would allow to avoid that by joining thecatalog table to the DataLink table on the source id.
>
> Best regards
> François
>>
>> With regard to your specific question regarding RMFs. I don’t know that users will download RMFs without either concurrently or previously downloading the PHA. Occasionally users will search for RMFs (and ARFs) separately from PHA spectra because they have previously retrieved the latter. On the other hand, we do for example see end users downloading PSFs independently from the primary datasets. This is likely because the catalog includes a vast set of high quality PSFs (of order 10M) covering the entire Chandra field of view and PSFs are rather expensive to generate.
>>
>> We have specifically tried very hard to focus on data discovery in the proposed ObsCore extensions note, and have used actual experience - how do we see our users wanting to work - to help guide our proposals.
>>
>> Thanks,
>> —Ian
>>
>>> On Mar 20, 2026, at 11:26, Jaffe, Tess (GSFC-6601) via heig <heig at ivoa.net> <mailto:heig at ivoa.net> wrote:
>>>
>>> Hi everybody,
>>>
>>> I agree with Francois on a number of things, but especially that there is a lot of misunderstanding and misrepresentation going on here. Nobody has ever expressed reluctance to ensure that HEA-specific ancillary products such as responses etc. are made available easily through VO protocols. Let’s focus on what the issue actually is, because I think the discussion has lost sight of it.
>>>
>>> In my opinion, the main issue is not whether things like response matrices are science data, are needed by the users, or should be in the VO. I think we all agree that this is obvious. The question is what is the best method for making them accessible in the needed context and how far we need to customize what goes in the ObsCore table itself for different fields. That then is a question about discoverability and complexity.
>>>
>>> Having an individual row in an ObsCore table enables a user to search for that one specific thing. The best practice recommendation for the use of ObsCore is that the access_url be a datalink. So for a given product listed in an ObsCore table, three queries are needed: one to find the product, one to get its datalinks, and then one to download the file(s). I cannot recall having heard of a use case where somebody was interested in finding only the RMFs from a given instrument in a given year. (Please let me know if you have a use case for this so that we can address it directly. I can think of calibration projects, but this is an edge case that can be addressed another way.) Users will instead want to find all of the spectra from some source/time/waveband. That is why ObsCore has a row for such a product. Nobody disputes that to do the scientific analysis on that spectrum requires the user to also have an RMF. But that RMF does not need to be independently discoverable, just correctly linked to the spectrum that is of interest.
>>>
>>> Francois has proposed a number of solutions to this. ObsCore has a very reasonable amount of flexibility and specificity, and it is quite important to worry about adding unnecessary complexity and size. (I myself was worried about the additional complexity of the datalink layer, but now in implementation, I’m becoming a fan.) The radio extension doc you may note proposes a number of fields that are all about discovery. It then states, “Auxiliary datasets such as uv distribution map, dirty beam maps, frequency/amplitude plots, phase/amplitude plots are useful for astronomers to check data quality. In that case DataLink … may provide a solution to attach these auxiliary data to ObsCore records.” That makes sense to me.
>>>
>>> So I suggest we follow what the radio folks are doing. With this in mind, I think that three of the proposed columns -- T_intervals , Obs_mode , Event_type – are very clearly applicable to data discovery and should be added to the ObsCore table. But some of the other proposed fields would be better added in datalinks with a HEA-specific vocabulary. We should discuss these on a case-by-case basis after having agreed on the purpose of a row in ObsCore.
>>>
>>> I hope this helps the discussion move along productively.
>>>
>>> Tess
>>>
>>
>> —
>>
>> Dr. Ian Evans
>> Astrophysicist
>> Chandra X-ray Center
>> Center for Astrophysics | Harvard & Smithsonian
>>
>> Office: (617) 496 7846 | Cell: (617) 699 5152
>> 60 Garden Street | MS 81 | Cambridge, MA 02138
>>
>> <dE1Cw3iquJU9U5Gs.png>
>>
>>
>> <q47uHWbzeotQhe1M.png>
>>
>> <https://www.google.com/url?q=http://cfa.harvard.edu/&source=gmail-imap&ust=1778742498000000&usg=AOvVaw0Po9GxnX1u4uQcFoe5_kA6>cfa.harvard.edu <https://www.google.com/url?q=http://cfa.harvard.edu/&source=gmail-imap&ust=1778742498000000&usg=AOvVaw0Po9GxnX1u4uQcFoe5_kA6> | Facebook <https://www.google.com/url?q=http://cfa.harvard.edu/facebook&source=gmail-imap&ust=1778742498000000&usg=AOvVaw11gsHBju9sV16HN6a__1PI> | Twitter <https://www.google.com/url?q=http://cfa.harvard.edu/twitter&source=gmail-imap&ust=1778742498000000&usg=AOvVaw3R8svDEyoT8z5QvZCxzxRj> | YouTube <https://www.google.com/url?q=http://cfa.harvard.edu/youtube&source=gmail-imap&ust=1778742498000000&usg=AOvVaw2JsaG5rXCRo8R-gtCBkr2K> | Newsletter <https://www.google.com/url?q=http://cfa.harvard.edu/newsletter&source=gmail-imap&ust=1778742498000000&usg=AOvVaw2PhymYG8-Cou2sU1y5RUiV>
>>
>>
>> --
>> heig mailing list
>> heig at ivoa.net <mailto:heig at ivoa.net>
>> http://mail.ivoa.net/mailman/listinfo/heig <https://www.google.com/url?q=http://mail.ivoa.net/mailman/listinfo/heig&source=gmail-imap&ust=1778742498000000&usg=AOvVaw19_CWGJ1bdUC2ffi2C-jDj>
> <gxnNUG3n4r2NqNSJ.png>
>
>
>
>
>
>
> <z9DBZN104p2Thy8m.png>
>
—
Dr. Ian Evans
Astrophysicist
Chandra X-ray Center
Center for Astrophysics | Harvard & Smithsonian
Office: (617) 496 7846 | Cell: (617) 699 5152
60 Garden Street | MS 81 | Cambridge, MA 02138


<http://cfa.harvard.edu/>cfa.harvard.edu <http://cfa.harvard.edu/> | Facebook <http://cfa.harvard.edu/facebook> | Twitter <http://cfa.harvard.edu/twitter> | YouTube <http://cfa.harvard.edu/youtube> | Newsletter <http://cfa.harvard.edu/newsletter>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/heig/attachments/20260508/4b57e066/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-2.png
Type: image/png
Size: 581 bytes
Desc: not available
URL: <http://mail.ivoa.net/pipermail/heig/attachments/20260508/4b57e066/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-3.png
Type: image/png
Size: 21717 bytes
Desc: not available
URL: <http://mail.ivoa.net/pipermail/heig/attachments/20260508/4b57e066/attachment-0003.png>
More information about the heig
mailing list