[ObsCore 1.1] Recap on requirements for a new dataproduct_type value for source list / catalog

Wed Aug 24 18:55:46 CEST 2016

Dear DM users,

It seems we have an agreement to add a new data product type in the 
existing list, as proposed in the Obscore spec 
(http://www.ivoa.net/documents/ObsCore/20160330/PR-ObsCore-v1.1-20160330.pdf) 
at section 3.3.1

I try to summarise here the different aspects of the discussions since 
the last interop in Cape Town.

The reasons I collected are :
     - asked for to expose /analysis products /extracted from an 
observation
     - Existing data collections need to expose them and have use cases 
: CASDA, ESO, CADC, CTA, others?
     - This case was already considered in Dataset Metadata DM which 
already has a tag for it named 'catalog'

#Rationale
The use-cases considered here assume that these catalogs are obtained 
from an observation via an analysis process.
They represent an added value to the observational data they are 
stemming from and are helpful to guide the choice of the user when 
analysing the query response of an Obscore search .
The generation link from one observation to the various analysis data 
products that are derived from it are made explicit in the IVOA 
Provenance DM, currently developped, but are highly detailed.

However, such a fine grain description is not needed for general data 
discovery as supported in ObsCore, and this can be solved by just adding 
another dataproduct_type value which allows to discover both the 
observation data sets and analysis data products, typically lists of 
detected objects inside the data.

Obscore helps to clarify the footprint of such a list on physical axes and
provides dataproduct management information as well (origin, curation, etc)

Here below are various cases where analysis data products are useful to 
discover together with an observation dataset:

- spectrum + extracted lines
- cube + extracted objects with sky position  and photometry in multiple 
bands
- eventlist and detected source list
- visibility data + detected sources
- image + extracted sources
- IFU data cube with identified structures ( object central position + 
average spectrum for instance)
- etc..

#Discussion on the possible dataproduct type name
The proposed options for this name were:

*source table*
- pros:

  * fits for list of sources extracted from one observation
    anddistributed as a data set companion.
  *   supports most 2D, 2D+lambda, 3D, etc observation types

  - cons:

  *   restricted to source extraction
  *   implies an entry (row) describes one source, and columns describe
    the measures of that source.

  *source list*
-  pros: Can be in any format ( no implication of a table structure.)
- cons: restricted to source extraction

  NB: This does not cover the general case of astronomical catalogs 
which would need a richer description of their columns in terms of 
metadata.
  The Obscore metadata for source lists may be inherited from the 
original dataset for axis coverage, for instance. Resolution and errors 
will be defined according to the detection algorithm precision.
  All the axis length data model elements( s_xel, em_xel , etc .) do not 
apply and should be NULL.

*catalog*
- pros: this term is more generic: any entry ( row) has properties 
described and quantified in various terms (columns).
- cons : this term is widely used in astronomy, and may represent 
complex information structures, as 3XMM, SDSS, and other compiled survey 
catalogs, linking together a source with other dataproducts for the same 
entry.
The catalog content then is very much project dependent.

*table*
pros: applied to the various analysis data products listed above.
cons:

  * overlaps with other dataproduct types like 'eventlist'
  * too general: no definition of the content

#Restriction of the scope of such analysis data products
NB: The scope of Obscore is restricted to the context of an observation 
and its results.
The DM will not cover all-sky catalogs for instance, as available at 
SDSS, ESO, Heasarc, CDS data centers etc.
These need specific column metadata as described in Vizier Readme files 
for instance.
The characterisation of physical axes proposed in Obscore does not apply 
efficiently for these all sky catalogs.

This new dataproduct_type may cover:
- sources detections obtained from a single bservation or from multiple 
multi-wavelength observations
- simple tile source list
- crossreferenced detections observed on one specific dataset.

This data product type does not cover :
- observation logs
- compiled catalogs ( e.g. 3XMM) with several dataproduct attached to 
the same object source ( image, spectrum, sed, lightcurve, thumbnails 
images, finding charts, etc)
- calibration files

#Using dataproduct_subtype to disentangle various cases:
This field contains free text that helps to specify the dataproduct 
pecularities. This is not standardized vocabulary but helps to clarify 
the content.

Examples: ( with the asumption that we choose the term 'catalog' for 
this new value of dataproduct_type)
- List of detected sources in IRIS image after S_extractor:
data_producttype='catalog'
data_product_subtype='Detected sources'
description= 'Extracted sources using S-extrator connecting more than 10 
pixels at 3 sigma'
s_region = from image
em_min, em_max=from image
t_min_t_max =from image
o_ucd = phot.mag
pol_states= from image
...

- List of labelled emission / absorption lines in a spectrum
data_producttype='catalog'
data_product_subtype='list of identified emission lines'
description= 'spectral identification of emission lines at snr > 1'
s_region = from spectrum
em_min, em_max=from spectrum
t_min_t_max =from spectrum
o_ucd = spect.line.intensity
pol_states= from spectrum

- List of observations used for building an SED
This overlaps with a provenance use-case, where we want to describe the 
progenitor observations used to build this SED.
May be seen as a catalog of observations.
In this case, the s_, em_, t_, pol, etc. quantities no longer apply for 
the description of the dataproduct content.
same case for a list of observations combined in a mosaïc.
same case for an observation log.

So the idea to bring together analysis data products in the results 
lists of an Obscore query  could be worked out this way.
Still we need to  define the limit of what would be covered as 
complementary analysis data product and not.

Two questions for you:
- Which of these terms seems preferable to you?
- As a data provider, are there other results you would like to expose 
together with your distributed observations ?
- As a user, how would you estimate the benefits of discovering at the 
same time, results and their original observations ?  ( more results in 
the query response also mean more selection steps for the user)

Thanks for considering these questions, and for your feedback,
Best regards ,
Mireille

--
Mireille Louys
CDS                                             Laboratoire Icube
Observatoire de Strasbourg        Telecom Physique Strasbourg
11 rue de l'Université                  300, Bd Sebastien Brandt CS 10413
F- 67000-STRASBOURG          F- 67412 ILLKIRCH Cedex
tel: +33 3 68 85 24 34
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20160824/8aa85d9c/attachment.html>