Vocabularising dataproduct_type

alberto micol amicol.ivoa at googlemail.com
Wed Mar 25 12:28:54 CET 2020



> On 25. Mar 2020, at 11:51, Markus Demleitner <msdemlei at ari.uni-heidelberg.de> wrote:
> 
> Dear Alberto,
> 
> 
>> 2.- Our measurements products are FITS binary tables of 3 subtypes: 
>> - catalog: scientific catalogue (typically all-sky) in single FITS
>> binary table (26 such catalogs)
> 
> That, I think, is what the original catalog -> measurements renaming
> strived to prevent.
> 
> I don't feel strongly about having them in obscore, but right now no
> recommended discovery pattern for this kind of thing will find them
> (they'll show up in GloTS and hence TOPCAT, ok, but that's
> non-standard).
> 
> To fit well into the VO, it would be great if these catalogues got
> proper registry records as well so TAP and (as applicable) SCS
> clients will find them, and they're present in the VO with reasonably
> complete metadata.  I'm happy to help if you're not sure how to go
> about that.

I understand your feeling, though there are still some users interested in downloading those binary FITS catalogs at once.
But the beauty of having them in obscore is that obscore offers the access_url pointing to our datalink service which supports those catalogs,
and, for example, can provide back the provenance information: example: the GAIA-ESO all-sky spectroscopic catalog with, as provenance, all the thousand spectra the measurements were taken from. 

That is, we use obscore + datalink to allow user to both discover datasets and find the relations between the different products; 
obscore is used to discover the catalog (e.g. a TAP query to find all catalogs in a given obs_collection), and datalink is used to provide the provenance information. In this sense it is not important if users actually download the all-sky catalogs, the important thing is the ability to use always the same tools to access all existing metadata. 

Surely, it would also be good to expose those catalogs via the Registry, but I have had no time yet to do that. When I’ll look into it I will ask you back some help, thanks!

>> - catalogtile: one FITS binary table for each of the tiles an ESO Public Survey (or other observing programmes) is partitioned into 
>>  (22,502 such catalog tiles)
> 
> Hm... what's the scenario here, i.e., why would people looking for
> such a tile run an obscore query in the first place?  Is this a
> pattern you expect other data centres to follow?

Each catalog tile can be downloaded individually, and in fact they are downloaded a lot.
They appear also in the ESO science portal, a web interface that uses datalink in the backend to gather information about the various products.
So, having all products available within the same vo layers is very useful both when developing discovery tools, and when accessing the archive programmatically.


>> - srctbl: source tables derived from individual images (~370,000 such srctbl)
> 
> I think that is what the Obscore authors had in mind when they put in
> #measurements -- is that right?
> 

Check the section “A.7 Complex Use Cases” of the ObsCore standard (pages 38 and 39): therein, with the usual confusion between sources and objects, you will see that measurements covers both cases.


>> 3- I want to stress that we make a distinction between “sources” and physical “objects"
>> 
>> sources: are detections on single images (single bands). It is not
>> given that a detection is for a real object, it could be just only
>> a spurious detection. In this sense, sources are not yet objects,
>> unless they get confirmed into “objects" by the analysis process
>> (see physical objects)
>> 
>> physical objects, e.g.:
>> - objects in catalog tiles: sources in different images (e.g. in
>> multiple wavelength bands) recognised to be detections of the same
>> object (cross-correlation implied)
>> - objects in all-sky catalogs: whereby typically the measurements
>> are derived from 1 or multiple spectra of the same object
> 
> I think this is a very interesting distinction to make, and that
> could help us to improve the definition of #measurements.  What if we
> said:
> 
>  [#measurements is] tabular data containing a list of sources, i.e.,
>  simple detections in some observation, not necessarily
>  corresponding to physical objects.  Catalogues of physical objects
>  derived from further analysis of such measurements are not covered
>  by this term.
> 
> Does that convey the original obscore intention?

No, as described above.
The concept you have in mind for measurements is more something I’d call “detections”… ?

> This would also make #measurements a parent of #event, I think.


>> From the above you can immediately understand that I fully second
>> Laurent: A catalog can be derived from many source tables (e.g. via
>> cross-correlation of source tables in different bands).
> 
> Ye...es -- the question is: do we *want* "measurements" to mean that?
> So far, I think the obscore authors' answer would be no.  But of
> course given that that is not entirely clear from the obscore spec,
> and, more importantly, there is actual usage of the term in the wider
> sense (i.e., it's both object and source lists), I'd be open to
> change the meaning of #measurements to, perhaps, "any tabular data
> containing coordinates" (though again I wonder if that's a concept
> useful for discovery).
> 
>> 4- multi-typed catalogs: some all-sky catalogs (e.g. the PESSTO
>> multi-epoch and multi-band photometry) are actually time series of
>> SEDs, (I should probably change the subtype to SED to make it
>> discoverable), while others are simpler (e.g. NGTS) light curves,
>> ie time series of photometric points in one single band (that’s
>> where the bulk of the 40 billion records (31E9) come from).
> 
> I'd say these, again, should be separate services (probably SSA, in
> these cases).  I don't think you're doing anyone a service by dumping
> huge and complex FITS tables into obscore.  But that, again, is an
> orthogonal discussion that should take place in a separate thread and
> exclusively on DAL.

Again, to me the beauty is to serve all products with the same vo layer obscore+datalink
so that all tools and accesses (e.g. web science portal, programmatic access) 
can benefit of a common infrastructure. 
Segregating products by their types under different interfaces is in my opinion of no help; it would require users to search in many different places to have a sense of all a data provide offers. 
It is of help, as you stated before, to publish catalogs directly in the registry,
but that is good as an additional thing, on top of publishing them in the common place (obscore).
Multiple advertising channels is always a good marketing strategy.

> 
> Thanks,
> 
>          Markus

Thanks to you Markus, it’s a very useful discussion,
Alberto




More information about the semantics mailing list