[Heig] [EXTERNAL] [BULK] Re: Post running meeting thoughts

Thu May 7 00:17:25 CEST 2026

Hi Tess, Ian, all

Sorry for the long delay

After reading Tess email late in March I took the time to look at 
Heasarc TAP service and I will comment below. Let's first discuss one

point in Ian's answer.

>
> Hi Tess,
>
>
> I think whether end users want to search for specific datasets vary 
> depending on the data collection and types of data, and the number of 
> and size of data products in the collection.
>
> If the data collection consists solely of individual observations 
> processed through standard data processing pipelines and that need 
> further user data analysis that require the set of associated data 
> products to extract science then I agree that the ability to search 
> for the individual observation event list (or perhaps event bundle) 
> with associated data products accessible using datalink is likely 
> sufficient.
>
> If the data collection contains advanced data products (for example 
> the Chandra Source Catalog data products) the usage patterns change. 
>  Our experience is that users doing catalog science typically identify 
> potential sources of interest and then want to retrieve subsets of 
> data products for those sources, often in several rounds.
>
> For example, they may identify hundreds, thousands, or in some cases 
> tens of thousands of candidate sources matching their search criteria, 
> and may subsequently download (e.g.) the light curves for all of the 
> observations of these sources (on average 3 times as many as number of 
> sources), do some automated pre-filtering on the light curves, and 
> then download (e.g.) the cutout event lists surrounding the individual 
> observation detections for further analysis.  They might subsequently 
> come back to download the region definitions, and perhaps the 
> individual observation PHA spectra of the detections.
But if the main discovery path is to start  from sources then I realize 
that the first approach would be a SimpleConeSearch or a TAP query on 
the catalog seen as a table with parameters estimated for this source. 
You already have this in Vizier and in your chandra archive (see below 
within Aladin)

The SCS interface to this catalog in VizieR is the following (with 
randomly chosen RA, DEC, SR): 
https://vizier.cds.unistra.fr/viz-bin/conesearch/IX/70/csc21Mas?RA=00.6405&DEC=-08.2216&SR=5.0

The result is a VOTable where a DataLink service descriptor could be 
added in order to point to all related event list, response functions 
and analysis data products.

This DataLink usage with Source catalogs (instead of ObsTAP/SIA.SSA 
context) is an extension introduced with version 1.1 and is perfectly 
adapted to use cases where

you don't know the number and type and content of the additional items 
you want to attach to the primary resource (here the source record in 
the table)

This is how light curves are related to sources in corot for example : 
https://vizier.cds.unistra.fr/viz-bin/conesearch/B/corot/Faint_star?RA=100.94235&DEC=-00.89651&SR=1

You can load this URL in TOPCAT and invoke the service and will discover 
the different light curves associated with each source

Now, as for Heasarc TAP service, it's very similar in the sense that 
they often  have a  DataLink service descriptor in the catalogue 
response and not in ObsTAP see for example the CALET GamaRay Burst 
master catalog : 
https://heasarc.gsfc.nasa.gov/xamin/vo/tap/sync?REQUEST=doQuery&LANG=ADQL&MAXREC=20000000&QUERY=SELECT+TOP+9999+*+FROM+calgbmmstr+

We can look at this VOTable with help of TOPCAT as you can see below in 
the second screenshot. There are plenty of links attached to each row in 
the first table, including the #this one.

The others are all auxiliary. I would only suggest to detail the various 
flavors of auxiliary in the local semantics field instead of semantics 
in order to improve interoperability, because #auxiliary only belongs to 
the standard vocabulary.

>
> This is a very different usage pattern where end users are retrieving 
> particular data products for potentially a large number of objects, 
> and subsequently refining the list and downloading additional data 
> products, sometimes in multiple steps.  One reason for this approach 
> is scale.  For example, there are roughly 100x the number of data 
> products, and 10x the data volume, for the Chandra Source Catalog vs. 
> the Chandra data archive data products for the set of processed 
> science observations.
>
> Could this be done by requiring end users to search for observations 
> and then using datalink to access the individual data products? 
>  Probably not, because many of our data products merge data from 
> multiple observations and it would be very difficult to encode the 
> necessary source — stack detection — observation detection linkages 
> correctly.  In any case, doing queries like this en masse and then 
> having to select subsets of datalinks is going to be much more 
> difficult than a simple ObsCore query that directly returns the 
> records (and access_urls) that you are looking for.
If the issue there is to be required to pass through a two step process, 
then the TAP interface to the DataLink table would allow to avoid that 
by joining thecatalog table to the DataLink table on the source id.

Best regards
François
>
> With regard to your specific question regarding RMFs.  I don’t know 
> that users will download RMFs without either concurrently or 
> previously downloading the PHA.  Occasionally users will search for 
> RMFs (and ARFs) separately from PHA spectra because they have 
> previously retrieved the latter.  On the other hand, we do for example 
> see end users downloading PSFs independently from the primary 
> datasets.  This is likely because the catalog includes a vast set of 
> high quality PSFs (of order 10M) covering the entire Chandra field of 
> view and PSFs are rather expensive to generate.
>
> We have specifically tried very hard to focus on data discovery in the 
> proposed ObsCore extensions note, and have used actual experience - 
> how do we see our users wanting to work - to help guide our proposals.
>
> Thanks,
> —Ian
>
>> On Mar 20, 2026, at 11:26, Jaffe, Tess (GSFC-6601) via heig 
>> <heig at ivoa.net> wrote:
>>
>> Hi everybody,
>> I agree with Francois on a number of things, but especially that 
>> there is a lot of misunderstanding and misrepresentation going on 
>> here. *Nobody*has ever expressed reluctance to ensure that 
>> HEA-specific ancillary products such as responses etc. are made 
>> available easily through VO protocols.  Let’s focus on what the issue 
>> actually is, because I think the discussion has lost sight of it.
>> In my opinion, the main issue is not whether things like response 
>> matrices are science data, are needed by the users, or should be in 
>> the VO.  I think we all agree that this is obvious.  The question is 
>> what is the best method for making them accessible/in the needed 
>> context/ and how far we need to customize what goes in the ObsCore 
>> table itself for different fields.  That then is a question about 
>> discoverability and complexity.
>> Having an individual row in an ObsCore table enables a user to*search 
>> for*that one specific thing.  The best practice recommendation for 
>> the use of ObsCore is that the access_url be a datalink.  So for a 
>> given product listed in an ObsCore table, three queries are needed:  
>> one to find the product, one to get its datalinks, and then one to 
>> download the file(s).  I cannot recall having heard of a use case 
>> where somebody was interested in finding only the RMFs from a given 
>> instrument in a given year.  (Please let me know if you have a use 
>> case for this so that we can address it directly. I can think of 
>> calibration projects, but this is an edge case that can be addressed 
>> another way.)  Users will instead want to find all of the spectra 
>> from some source/time/waveband.  That is why ObsCore has a row for 
>> such a product.  Nobody disputes that to do the scientific analysis 
>> on that spectrum requires the user to also have an RMF.  But that RMF 
>> does not need to be independently discoverable, just correctly linked 
>> to the spectrum that is of interest.
>> Francois has proposed a number of solutions to this.  ObsCore has a 
>> very reasonable amount of flexibility and specificity, and it is 
>> quite important to worry about adding unnecessary complexity and 
>> size. (I myself was worried about the additional complexity of the 
>> datalink layer, but now in implementation, I’m becoming a fan.) The 
>> radio extension doc you may note proposes a number of fields that 
>> are/all about discovery/.  It then states, “Auxiliary datasets such 
>> as uv distribution map, dirty beam maps, frequency/amplitude plots, 
>> phase/amplitude plots are useful for astronomers to check data 
>> quality. In that case DataLink … may provide a solution to attach 
>> these auxiliary data to ObsCore records.”  That makes sense to me.
>> So I suggest we follow what the radio folks are doing.  With this in 
>> mind, I think that three of the proposed columns -- T_intervals , 
>> Obs_mode , Event_type – are very clearly applicable to data discovery 
>> and should  be added to the ObsCore table.  But some of the other 
>> proposed fields would be better added in datalinks with a 
>> HEA-specific vocabulary.  We should discuss these on a case-by-case 
>> basis after having agreed on the purpose of a row in ObsCore.
>> I hope this helps the discussion move along productively.
>>
>> Tess
>>
>
> —
> Dr. Ian Evans
> *Astrophysicist*
> *Chandra X-ray Center*
> Center for Astrophysics | Harvard & Smithsonian
> Office: (617) 496 7846 | Cell: (617) 699 5152
> 60 Garden Street | MS 81 | Cambridge, MA 02138
>
> PastedGraphic-2.png
>
> PastedGraphic-3.png_
>
> <http://cfa.harvard.edu/>__cfa.harvard.edu 
> <http://cfa.harvard.edu/>_ | _Facebook 
> <http://cfa.harvard.edu/facebook>_ | _Twitter 
> <http://cfa.harvard.edu/twitter>_ | _YouTube 
> <http://cfa.harvard.edu/youtube>_ | _Newsletter 
> <http://cfa.harvard.edu/newsletter>_
>
>
> -- heig mailing list heig at ivoa.net 
> http://mail.ivoa.net/mailman/listinfo/heig

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/heig/attachments/20260507/0ab23cca/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Db661jvCUgjPxJQ2.png
Type: image/png
Size: 370 bytes
Desc: not available
URL: <http://mail.ivoa.net/pipermail/heig/attachments/20260507/0ab23cca/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: z37WhtUsDYxcdt2m.png
Type: image/png
Size: 8364 bytes
Desc: not available
URL: <http://mail.ivoa.net/pipermail/heig/attachments/20260507/0ab23cca/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: WLRSjrzg7Q4TMrYr.png
Type: image/png
Size: 724537 bytes
Desc: not available
URL: <http://mail.ivoa.net/pipermail/heig/attachments/20260507/0ab23cca/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: a1UtYHh8hV84glV0.png
Type: image/png
Size: 340626 bytes
Desc: not available
URL: <http://mail.ivoa.net/pipermail/heig/attachments/20260507/0ab23cca/attachment-0007.png>