<html>

  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <p><font face="Times New Roman, Times, serif">Dear DM users, <br>

      </font></p>

    <p><font face="Times New Roman, Times, serif">It seems we have an

        agreement to add a new data product type in the existing list,

        as proposed in the Obscore spec (<a

href="http://www.ivoa.net/documents/ObsCore/20160330/PR-ObsCore-v1.1-20160330.pdf">http://www.ivoa.net/documents/ObsCore/20160330/PR-ObsCore-v1.1-20160330.pdf</a>) 

        at section 3.3.1<br>

      </font></p>

    <p><font face="Times New Roman, Times, serif">I try to summarise

        here the different aspects of the discussions since the last

        interop in Cape Town.<br>

      </font></p>

    <p><font face="Times New Roman, Times, serif">The reasons I

        collected are :<br>

            - asked for to expose  <i>analysis products </i>extracted

        from an observation <br>

            - Existing data collections need to expose them and have use

        cases : CASDA, ESO, CADC, CTA, others?<br>

            - This case was already considered in Dataset Metadata DM

        which already has a tag for it named 'catalog'</font></p>

    <p><font face="Times New Roman, Times, serif">#Rationale<br>

        The use-cases considered here assume that these catalogs are

        obtained from an observation via an analysis process. <br>

        They represent an added value to the observational data they are

        stemming from and are helpful to guide the choice of the user

        when analysing the query response of an Obscore search .<br>

        The generation link from one observation to the various analysis

        data products that are derived from it are made explicit in the

        IVOA Provenance DM, currently developped, but are highly

        detailed. <br>

      </font></p>

    <p><font face="Times New Roman, Times, serif">However, such a fine

        grain description is not needed for general data discovery as

        supported in ObsCore, and this can be solved by just adding

        another dataproduct_type value which allows to discover both the

        observation data sets and analysis data products, typically

        lists of detected objects inside the data.<br>

      </font></p>

    <p><font face="Times New Roman, Times, serif"><font face="Times New

          Roman, Times, serif">Obscore helps to clarify the footprint of

          such a list on physical axes and <br>

          provides dataproduct management information as well (origin,

          curation, etc)</font></font></p>

    <p><font face="Times New Roman, Times, serif">Here below are various

        cases where analysis data products are useful to discover

        together with an observation dataset:<br>

      </font></p>

    <font face="Times New Roman, Times, serif">- spectrum + extracted

      lines</font><br>

    <font face="Times New Roman, Times, serif">- cube + extracted

      objects with sky position  and photometry in multiple bands </font><br>

    <font face="Times New Roman, Times, serif">- eventlist and detected

      source list </font><br>

    <font face="Times New Roman, Times, serif">- visibility data +

      detected sources </font><br>

    <font face="Times New Roman, Times, serif">- image + extracted

      sources</font><br>

    <font face="Times New Roman, Times, serif">- IFU data cube with

      identified structures ( object central position + average spectrum

      for instance)</font><br>

    - etc..<br>

    <font face="Times New Roman, Times, serif"><br>

      #Discussion on the possible dataproduct type name  <br>

      The proposed options for this name were: <br>

      <br>

      *source table*<br>

      - pros: <br>

    </font>

    <ul>

      <li><font face="Times New Roman, Times, serif">fits for list of

          sources extracted from one observation and</font><font

          face="Times New Roman, Times, serif"> distributed as a data

          set companion.</font></li>

      <li><font face="Times New Roman, Times, serif"> supports most 2D,

          2D+lambda, 3D, etc observation types </font><br>

      </li>

    </ul>

    <font face="Times New Roman, Times, serif"> - cons: <br>

    </font>

    <ul>

      <li><font face="Times New Roman, Times, serif"> restricted to

          source extraction</font></li>

      <li><font face="Times New Roman, Times, serif"> implies an entry

          (row) describes one source, and columns describe the measures

          of that source.  </font></li>

    </ul>

    <font face="Times New Roman, Times, serif"> *source list* <br>

      -  pros: Can be in any format ( </font><font face="Times New

      Roman, Times, serif"><font face="Times New Roman, Times, serif">no

        implication of a table structure.)</font></font><br>

    <font face="Times New Roman, Times, serif"><font face="Times New

        Roman, Times, serif">- cons: </font></font><font face="Times

      New Roman, Times, serif"><font face="Times New Roman, Times,

        serif"><font face="Times New Roman, Times, serif">restricted to

          source extraction<br>

        </font></font> <br>

       NB: This does not cover the general case of astronomical catalogs

      which would need a richer description of their columns in terms of

      metadata. <br>

       The Obscore metadata for source lists may be inherited from the

      original dataset for axis coverage, for instance. Resolution and

      errors will be defined according to the detection algorithm

      precision. <br>

       All the axis length data model elements( s_xel, em_xel , etc .)

      do not apply and should be NULL.<br>

      <br>

      *catalog* <br>

      - pros: this term is more generic: any entry ( row) has properties

      described and quantified in various terms (columns).<br>

      - cons : this term is widely used in astronomy, and may represent

      complex information structures, as 3XMM, SDSS, and other compiled

      survey catalogs, linking together a source with other dataproducts

      for the same entry. <br>

      The catalog content then is very much project dependent.<br>

      <br>

      *table*  <br>

      pros: applied to the various analysis data products listed above.

      <br>

      cons: <br>

    </font>

    <ul>

      <li><font face="Times New Roman, Times, serif">overlaps with other

          dataproduct types like 'eventlist' </font></li>

      <li><font face="Times New Roman, Times, serif">too general: no

          definition of the content <br>

        </font></li>

    </ul>

    <font face="Times New Roman, Times, serif">#Restriction of the scope

      of such analysis data products<br>

      NB: The scope of Obscore is restricted to the context of an

      observation and its results. <br>

      The DM will not cover all-sky catalogs for instance, as available

      at SDSS, ESO, Heasarc, CDS data centers etc.<br>

      These need specific column metadata as described in Vizier Readme

      files for instance.<br>

      The characterisation of physical axes proposed in Obscore does not

      apply efficiently for these all sky catalogs.<br>

      <br>

      This new dataproduct_type may cover: <br>

      - sources detections obtained from a single bservation or from

      multiple multi-wavelength observations<br>

      - simple tile source list<br>

      - crossreferenced detections observed on one specific dataset.<br>

      <br>

      This data product type does not cover :  <br>

      - observation logs  <br>

      - compiled catalogs ( e.g. 3XMM) with several dataproduct attached

      to the same object source ( image, spectrum, sed, lightcurve,

      thumbnails images, finding charts, etc) <br>

      - calibration files <br>

          <br>

      #Using dataproduct_subtype to disentangle various cases:<br>

      This field contains free text that helps to specify the

      dataproduct pecularities. This is not standardized vocabulary but

      helps to clarify the content.<br>

      <br>

      Examples: ( with the asumption that we choose the term 'catalog'

      for this new value of dataproduct_type)<br>

      - List of detected sources in IRIS image after S_extractor: <br>

      data_producttype='catalog'<br>

      data_product_subtype='Detected sources'<br>

      description= 'Extracted sources using S-extrator connecting more

      than 10 pixels at 3 sigma' <br>

      s_region = from image <br>

      em_min, em_max=from image <br>

      t_min_t_max =from image <br>

      o_ucd = phot.mag<br>

      pol_states= from image <br>

      ...<br>

      <br>

      - List of labelled emission / absorption lines in a spectrum <br>

      data_producttype='catalog'<br>

      data_product_subtype='list of identified emission lines'<br>

      description= 'spectral identification of emission lines at snr

      &gt; 1' <br>

      s_region = from spectrum <br>

      em_min, em_max=from spectrum <br>

      t_min_t_max =from spectrum <br>

      o_ucd = spect.line.intensity<br>

      pol_states= from spectrum <br>

      <br>

      - List of observations used for building an SED <br>

      This overlaps with a provenance use-case, where we want to

      describe the progenitor observations used to build this SED.<br>

      May be seen as a catalog of observations.<br>

      In this case, the s_, em_, t_, pol, etc. quantities no longer

      apply for the description of the dataproduct content.<br>

      same case for a list of observations combined in a mosaïc.<br>

      same case for an observation log.<br>

      <br>

      So the idea to bring together analysis data products in the

      results lists of an Obscore query  could be worked out this way. <br>

      Still we need to  define the limit of what would be covered as

      complementary analysis data product and not.<br>

      <br>

      Two questions for you: <br>

      - Which of these terms seems preferable to you?<br>

      - As a data provider, are there other results you would like to

      expose together with your distributed observations ?<br>

      - As a user, how would you estimate the benefits of discovering at

      the same time, results and their original observations ?  ( more

      results in the query response also mean more selection steps for

      the user) <br>

      <br>

      Thanks for considering these questions, and for your feedback,<br>

      Best regards , <br>

      Mireille<br>

      <br>

      --<br>

      Mireille Louys<br>

      CDS                                             Laboratoire Icube

      <br>

      Observatoire de Strasbourg        Telecom Physique Strasbourg<br>

      11 rue de l'Université                  300, Bd Sebastien Brandt

      CS 10413         <br>

      F- 67000-STRASBOURG          F- 67412 ILLKIRCH Cedex<br>

      tel: +33 3 68 85 24 34<br>

    </font>

  </body>

</html>