<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p><font face="Times New Roman, Times, serif">Dear DM users, <br>
</font></p>
<p><font face="Times New Roman, Times, serif">It seems we have an
agreement to add a new data product type in the existing list,
as proposed in the Obscore spec (<a
href="http://www.ivoa.net/documents/ObsCore/20160330/PR-ObsCore-v1.1-20160330.pdf">http://www.ivoa.net/documents/ObsCore/20160330/PR-ObsCore-v1.1-20160330.pdf</a>)
at section 3.3.1<br>
</font></p>
<p><font face="Times New Roman, Times, serif">I try to summarise
here the different aspects of the discussions since the last
interop in Cape Town.<br>
</font></p>
<p><font face="Times New Roman, Times, serif">The reasons I
collected are :<br>
- asked for to expose <i>analysis products </i>extracted
from an observation <br>
- Existing data collections need to expose them and have use
cases : CASDA, ESO, CADC, CTA, others?<br>
- This case was already considered in Dataset Metadata DM
which already has a tag for it named 'catalog'</font></p>
<p><font face="Times New Roman, Times, serif">#Rationale<br>
The use-cases considered here assume that these catalogs are
obtained from an observation via an analysis process. <br>
They represent an added value to the observational data they are
stemming from and are helpful to guide the choice of the user
when analysing the query response of an Obscore search .<br>
The generation link from one observation to the various analysis
data products that are derived from it are made explicit in the
IVOA Provenance DM, currently developped, but are highly
detailed. <br>
</font></p>
<p><font face="Times New Roman, Times, serif">However, such a fine
grain description is not needed for general data discovery as
supported in ObsCore, and this can be solved by just adding
another dataproduct_type value which allows to discover both the
observation data sets and analysis data products, typically
lists of detected objects inside the data.<br>
</font></p>
<p><font face="Times New Roman, Times, serif"><font face="Times New
Roman, Times, serif">Obscore helps to clarify the footprint of
such a list on physical axes and <br>
provides dataproduct management information as well (origin,
curation, etc)</font></font></p>
<p><font face="Times New Roman, Times, serif">Here below are various
cases where analysis data products are useful to discover
together with an observation dataset:<br>
</font></p>
<font face="Times New Roman, Times, serif">- spectrum + extracted
lines</font><br>
<font face="Times New Roman, Times, serif">- cube + extracted
objects with sky position and photometry in multiple bands </font><br>
<font face="Times New Roman, Times, serif">- eventlist and detected
source list </font><br>
<font face="Times New Roman, Times, serif">- visibility data +
detected sources </font><br>
<font face="Times New Roman, Times, serif">- image + extracted
sources</font><br>
<font face="Times New Roman, Times, serif">- IFU data cube with
identified structures ( object central position + average spectrum
for instance)</font><br>
- etc..<br>
<font face="Times New Roman, Times, serif"><br>
#Discussion on the possible dataproduct type name <br>
The proposed options for this name were: <br>
<br>
*source table*<br>
- pros: <br>
</font>
<ul>
<li><font face="Times New Roman, Times, serif">fits for list of
sources extracted from one observation and</font><font
face="Times New Roman, Times, serif"> distributed as a data
set companion.</font></li>
<li><font face="Times New Roman, Times, serif"> supports most 2D,
2D+lambda, 3D, etc observation types </font><br>
</li>
</ul>
<font face="Times New Roman, Times, serif"> - cons: <br>
</font>
<ul>
<li><font face="Times New Roman, Times, serif"> restricted to
source extraction</font></li>
<li><font face="Times New Roman, Times, serif"> implies an entry
(row) describes one source, and columns describe the measures
of that source. </font></li>
</ul>
<font face="Times New Roman, Times, serif"> *source list* <br>
- pros: Can be in any format ( </font><font face="Times New
Roman, Times, serif"><font face="Times New Roman, Times, serif">no
implication of a table structure.)</font></font><br>
<font face="Times New Roman, Times, serif"><font face="Times New
Roman, Times, serif">- cons: </font></font><font face="Times
New Roman, Times, serif"><font face="Times New Roman, Times,
serif"><font face="Times New Roman, Times, serif">restricted to
source extraction<br>
</font></font> <br>
NB: This does not cover the general case of astronomical catalogs
which would need a richer description of their columns in terms of
metadata. <br>
The Obscore metadata for source lists may be inherited from the
original dataset for axis coverage, for instance. Resolution and
errors will be defined according to the detection algorithm
precision. <br>
All the axis length data model elements( s_xel, em_xel , etc .)
do not apply and should be NULL.<br>
<br>
*catalog* <br>
- pros: this term is more generic: any entry ( row) has properties
described and quantified in various terms (columns).<br>
- cons : this term is widely used in astronomy, and may represent
complex information structures, as 3XMM, SDSS, and other compiled
survey catalogs, linking together a source with other dataproducts
for the same entry. <br>
The catalog content then is very much project dependent.<br>
<br>
*table* <br>
pros: applied to the various analysis data products listed above.
<br>
cons: <br>
</font>
<ul>
<li><font face="Times New Roman, Times, serif">overlaps with other
dataproduct types like 'eventlist' </font></li>
<li><font face="Times New Roman, Times, serif">too general: no
definition of the content <br>
</font></li>
</ul>
<font face="Times New Roman, Times, serif">#Restriction of the scope
of such analysis data products<br>
NB: The scope of Obscore is restricted to the context of an
observation and its results. <br>
The DM will not cover all-sky catalogs for instance, as available
at SDSS, ESO, Heasarc, CDS data centers etc.<br>
These need specific column metadata as described in Vizier Readme
files for instance.<br>
The characterisation of physical axes proposed in Obscore does not
apply efficiently for these all sky catalogs.<br>
<br>
This new dataproduct_type may cover: <br>
- sources detections obtained from a single bservation or from
multiple multi-wavelength observations<br>
- simple tile source list<br>
- crossreferenced detections observed on one specific dataset.<br>
<br>
This data product type does not cover : <br>
- observation logs <br>
- compiled catalogs ( e.g. 3XMM) with several dataproduct attached
to the same object source ( image, spectrum, sed, lightcurve,
thumbnails images, finding charts, etc) <br>
- calibration files <br>
<br>
#Using dataproduct_subtype to disentangle various cases:<br>
This field contains free text that helps to specify the
dataproduct pecularities. This is not standardized vocabulary but
helps to clarify the content.<br>
<br>
Examples: ( with the asumption that we choose the term 'catalog'
for this new value of dataproduct_type)<br>
- List of detected sources in IRIS image after S_extractor: <br>
data_producttype='catalog'<br>
data_product_subtype='Detected sources'<br>
description= 'Extracted sources using S-extrator connecting more
than 10 pixels at 3 sigma' <br>
s_region = from image <br>
em_min, em_max=from image <br>
t_min_t_max =from image <br>
o_ucd = phot.mag<br>
pol_states= from image <br>
...<br>
<br>
- List of labelled emission / absorption lines in a spectrum <br>
data_producttype='catalog'<br>
data_product_subtype='list of identified emission lines'<br>
description= 'spectral identification of emission lines at snr
> 1' <br>
s_region = from spectrum <br>
em_min, em_max=from spectrum <br>
t_min_t_max =from spectrum <br>
o_ucd = spect.line.intensity<br>
pol_states= from spectrum <br>
<br>
- List of observations used for building an SED <br>
This overlaps with a provenance use-case, where we want to
describe the progenitor observations used to build this SED.<br>
May be seen as a catalog of observations.<br>
In this case, the s_, em_, t_, pol, etc. quantities no longer
apply for the description of the dataproduct content.<br>
same case for a list of observations combined in a mosaïc.<br>
same case for an observation log.<br>
<br>
So the idea to bring together analysis data products in the
results lists of an Obscore query could be worked out this way. <br>
Still we need to define the limit of what would be covered as
complementary analysis data product and not.<br>
<br>
Two questions for you: <br>
- Which of these terms seems preferable to you?<br>
- As a data provider, are there other results you would like to
expose together with your distributed observations ?<br>
- As a user, how would you estimate the benefits of discovering at
the same time, results and their original observations ? ( more
results in the query response also mean more selection steps for
the user) <br>
<br>
Thanks for considering these questions, and for your feedback,<br>
Best regards , <br>
Mireille<br>
<br>
--<br>
Mireille Louys<br>
CDS Laboratoire Icube
<br>
Observatoire de Strasbourg Telecom Physique Strasbourg<br>
11 rue de l'Université 300, Bd Sebastien Brandt
CS 10413 <br>
F- 67000-STRASBOURG F- 67412 ILLKIRCH Cedex<br>
tel: +33 3 68 85 24 34<br>
</font>
</body>
</html>