[ObsCore 1.1] Recap on requirements for a new dataproduct_type value for source list / catalog
Mireille Louys
mireille.louys at unistra.fr
Wed Aug 24 18:55:46 CEST 2016
Dear DM users,
It seems we have an agreement to add a new data product type in the
existing list, as proposed in the Obscore spec
(http://www.ivoa.net/documents/ObsCore/20160330/PR-ObsCore-v1.1-20160330.pdf)
at section 3.3.1
I try to summarise here the different aspects of the discussions since
the last interop in Cape Town.
The reasons I collected are :
- asked for to expose /analysis products /extracted from an
observation
- Existing data collections need to expose them and have use cases
: CASDA, ESO, CADC, CTA, others?
- This case was already considered in Dataset Metadata DM which
already has a tag for it named 'catalog'
#Rationale
The use-cases considered here assume that these catalogs are obtained
from an observation via an analysis process.
They represent an added value to the observational data they are
stemming from and are helpful to guide the choice of the user when
analysing the query response of an Obscore search .
The generation link from one observation to the various analysis data
products that are derived from it are made explicit in the IVOA
Provenance DM, currently developped, but are highly detailed.
However, such a fine grain description is not needed for general data
discovery as supported in ObsCore, and this can be solved by just adding
another dataproduct_type value which allows to discover both the
observation data sets and analysis data products, typically lists of
detected objects inside the data.
Obscore helps to clarify the footprint of such a list on physical axes and
provides dataproduct management information as well (origin, curation, etc)
Here below are various cases where analysis data products are useful to
discover together with an observation dataset:
- spectrum + extracted lines
- cube + extracted objects with sky position and photometry in multiple
bands
- eventlist and detected source list
- visibility data + detected sources
- image + extracted sources
- IFU data cube with identified structures ( object central position +
average spectrum for instance)
- etc..
#Discussion on the possible dataproduct type name
The proposed options for this name were:
*source table*
- pros:
* fits for list of sources extracted from one observation
anddistributed as a data set companion.
* supports most 2D, 2D+lambda, 3D, etc observation types
- cons:
* restricted to source extraction
* implies an entry (row) describes one source, and columns describe
the measures of that source.
*source list*
- pros: Can be in any format ( no implication of a table structure.)
- cons: restricted to source extraction
NB: This does not cover the general case of astronomical catalogs
which would need a richer description of their columns in terms of
metadata.
The Obscore metadata for source lists may be inherited from the
original dataset for axis coverage, for instance. Resolution and errors
will be defined according to the detection algorithm precision.
All the axis length data model elements( s_xel, em_xel , etc .) do not
apply and should be NULL.
*catalog*
- pros: this term is more generic: any entry ( row) has properties
described and quantified in various terms (columns).
- cons : this term is widely used in astronomy, and may represent
complex information structures, as 3XMM, SDSS, and other compiled survey
catalogs, linking together a source with other dataproducts for the same
entry.
The catalog content then is very much project dependent.
*table*
pros: applied to the various analysis data products listed above.
cons:
* overlaps with other dataproduct types like 'eventlist'
* too general: no definition of the content
#Restriction of the scope of such analysis data products
NB: The scope of Obscore is restricted to the context of an observation
and its results.
The DM will not cover all-sky catalogs for instance, as available at
SDSS, ESO, Heasarc, CDS data centers etc.
These need specific column metadata as described in Vizier Readme files
for instance.
The characterisation of physical axes proposed in Obscore does not apply
efficiently for these all sky catalogs.
This new dataproduct_type may cover:
- sources detections obtained from a single bservation or from multiple
multi-wavelength observations
- simple tile source list
- crossreferenced detections observed on one specific dataset.
This data product type does not cover :
- observation logs
- compiled catalogs ( e.g. 3XMM) with several dataproduct attached to
the same object source ( image, spectrum, sed, lightcurve, thumbnails
images, finding charts, etc)
- calibration files
#Using dataproduct_subtype to disentangle various cases:
This field contains free text that helps to specify the dataproduct
pecularities. This is not standardized vocabulary but helps to clarify
the content.
Examples: ( with the asumption that we choose the term 'catalog' for
this new value of dataproduct_type)
- List of detected sources in IRIS image after S_extractor:
data_producttype='catalog'
data_product_subtype='Detected sources'
description= 'Extracted sources using S-extrator connecting more than 10
pixels at 3 sigma'
s_region = from image
em_min, em_max=from image
t_min_t_max =from image
o_ucd = phot.mag
pol_states= from image
...
- List of labelled emission / absorption lines in a spectrum
data_producttype='catalog'
data_product_subtype='list of identified emission lines'
description= 'spectral identification of emission lines at snr > 1'
s_region = from spectrum
em_min, em_max=from spectrum
t_min_t_max =from spectrum
o_ucd = spect.line.intensity
pol_states= from spectrum
- List of observations used for building an SED
This overlaps with a provenance use-case, where we want to describe the
progenitor observations used to build this SED.
May be seen as a catalog of observations.
In this case, the s_, em_, t_, pol, etc. quantities no longer apply for
the description of the dataproduct content.
same case for a list of observations combined in a mosaïc.
same case for an observation log.
So the idea to bring together analysis data products in the results
lists of an Obscore query could be worked out this way.
Still we need to define the limit of what would be covered as
complementary analysis data product and not.
Two questions for you:
- Which of these terms seems preferable to you?
- As a data provider, are there other results you would like to expose
together with your distributed observations ?
- As a user, how would you estimate the benefits of discovering at the
same time, results and their original observations ? ( more results in
the query response also mean more selection steps for the user)
Thanks for considering these questions, and for your feedback,
Best regards ,
Mireille
--
Mireille Louys
CDS Laboratoire Icube
Observatoire de Strasbourg Telecom Physique Strasbourg
11 rue de l'Université 300, Bd Sebastien Brandt CS 10413
F- 67000-STRASBOURG F- 67412 ILLKIRCH Cedex
tel: +33 3 68 85 24 34
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dm/attachments/20160824/8aa85d9c/attachment.html>
More information about the dm
mailing list