<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <p>Hi all,a</p>

    <p>Trying to go further on this point (also related to Markus email

      VEP003 yesterday)</p>

    <p>2 things...</p>

    <p>I ) After discussing with a couple of people, I think the

      productype for these associated datasets can be set in the content

      param of the media-type-value of DataLink content_type eg<br>

    </p>

    application/fits;content=timeseries;subtype=lightcurve<br>

    <br>

    This will probably require a new change proposal in the DataLink

    spec itself. It's not a big one . I will prepare it tommorrow.<br>

    <br>

    II ) For the semantics term we need to relate a dataset to a source

    in a catalog, I think the most general thing we cans say about it is

    that it is "cross-correlation". I propose the term "CrossedDataset".

    This can be the head term for #sibling, #contains, #folowup, etc...<br>

    <br>

    Cheers<br>

    François<br>

    <br>

    <div class="moz-cite-prefix">Le 07/01/2020 à 23:04, Patrick Dowler a

      écrit :<br>

    </div>

    <blockquote type="cite"

cite="mid:CAFK8nrpfMDO5M5tfDNCaG4MfFjs5to9MANLfey0FfbLEHnBRRg@mail.gmail.com">

      <meta http-equiv="Context-Type" content="text/html; charset=UTF-8">

      <div dir="ltr">

        <div class="gmail_default"><br>

        </div>

        <div class="gmail_default">First, although DataLink was

          conceived with an implicit "resource is a dataset" that leaked

          into the terminology and examples, I agree that there is no

          reason that it cannot be used for other kinds of entities.

          Using that particular word does conjure up provenance, but

          datalink and provenance are already related (#progenitor)

          conceptually.<br>

        </div>

        <div class="gmail_default"><br>

        </div>

        <div class="gmail_default">The way I am still seeing this,

          dataproduct_type (from ObsCore) says what something *is* and

          that is not a relationship per se. Aside: on the issue of

          subtype, I would prefer/like to make dataproduct_type a

          vocabulary so people could extend it rather than using a

          two-level type/subtype mechanism -- but only if we can figure

          out a sane/nice way to query vocabulary terms via TAP that

          actually works.<br>

        </div>

        <div class="gmail_default"><br>

        </div>

        <div class="gmail_default">I can think of several relationships

          from a source in a catalogue to a dataset and I still feel

          that the concept behind "Observation_Result_of_source" is

          eluding me. The relation could be:</div>

        <div class="gmail_default"><br>

        </div>

        <div class="gmail_default"> #progenitor : some/all source

          properties were measured in that dataset</div>

        <div class="gmail_default">#derivation : the dataset was created

          from the source properties<br>

        </div>

        <div class="gmail_default"><br>

        </div>

        <div class="gmail_default">other possible relationships:</div>

        <div class="gmail_default"><br>

        </div>

        <div class="gmail_default">contains : the dataset contains the

          source (seems like this is a top-level very general and vague

          statement; I would interpret this to also mean "and not

          progenitor")<br>

        </div>

        <div class="gmail_default"><br>

        </div>

        <div class="gmail_default">followup : the existence/discovery of

          the source caused a new observation to occur (child of

          contains, causal relation)</div>

        <div class="gmail_default"><br>

        </div>

        <div class="gmail_default">So, for someone with a source

          (catalogue) and a realted image|spectrum|lightcurve, is that

          data one of these or is it some other concept?<br>

        </div>

        <div class="gmail_default"><br>

        </div>

        <div class="gmail_default"><br>

        </div>

        <div>

          <div dir="ltr" class="gmail_signature"

            data-smartmail="gmail_signature">

            <div dir="ltr">

              <div>

                <div dir="ltr">

                  <div>

                    <div>--<br>

                    </div>

                    <div>Patrick Dowler<br>

                    </div>

                    Canadian Astronomy Data Centre<br>

                  </div>

                  Victoria, BC, Canada<br>

                </div>

              </div>

            </div>

          </div>

        </div>

        <br>

      </div>

      <br>

      <div class="gmail_quote">

        <div dir="ltr" class="gmail_attr">On Fri, 20 Dec 2019 at 07:46,

          François Bonnarel &lt;<a

            href="mailto:francois.bonnarel@astro.unistra.fr"

            moz-do-not-send="true">francois.bonnarel@astro.unistra.fr</a>&gt;

          wrote:<br>

        </div>

        <blockquote class="gmail_quote">

          <div>

            <p> </p>

            <div lang="x-unicode"> </div>

            <p>This email was sent yesterday in another thread.</p>

            <p>Following Markus' recommendation I open now a new thread

              for this discussion of the "astronomical source

              observation results" use cases.</p>

            <p>Cheers</p>

            <p>François<br>

            </p>

            <p>Dear all,<br>

            </p>

            <ul>

              <li>When I proposed VEP0001 immediately after Groningen

                Interop I could not imagine that such a controversy

                discussion would occur. <br>

              </li>

              <ul>

                <li>Before considering the use case we have I would like

                  to go back to the current usages of DataLink I know.</li>

                <li>Then go back to the "new" use case</li>

                <li>And then check some of the proposed solutions on

                  this list</li>

                <li>And then argue for my preference</li>

              </ul>

              <li>According to DataLink 1.0 <br>

              </li>

              <ul>

                <li>the semantics field contains a "Term from a

                  controlled vocabulary describing the link" as stated

                  in Table 1 and </li>

                <li>section 3.2.6 reads :</li>

                <li>"The semantics column contains a single term from an

                  external RDF vocabulary that describes the meaning of

                  this linked resource relative to the identified

                  dataset. The semantics column is intended to be

                  machine-readable and assist automating data retrieval

                  and processing."</li>

                <li>Let's call the initial thing we are starting from

                  and to which we want to link resources "Main" and the

                  various linked resources "Target".</li>

                <ul>

                  <li>Two remarks  :</li>

                  <ul>

                    <li>The text in section 3.2.6, consistently with the

                      use cases described in the introduction considers

                      that the "Main" is a dataset</li>

                    <li>The  semantics field describes globally what the

                      target is "with respect to the main"</li>

                  </ul>

                  <li>More classical is the group of columns access_URL

                    , content_type, content_length which references and

                    describes the "Target" itself (independently from

                    the "Main")</li>

                  <li>Now I tried to look a little bit at the current

                    usage of DataLink using Aladin DeskTop as a client

                    and the three following SIAP2 servers  </li>

                  <ul>

                    <li>CADC : <br>

                    </li>

                    <ul>

                      <li>In the example I found The DataLink service

                        had "this" in semantics for the full retrieval

                        of the dataset,</li>

                      <li> "cutout" for a SODA service <br>

                      </li>

                      <li>and a couple of "auxiliary" Rows for

                        additional data such as PSF images, etc...</li>

                      <li> cutout is related to the fact that it is a

                        service, described as "service descriptor".

                        Aladin opens a specific menu in that case while

                        it downloads the datasets in the other cases

                        according to the fact its "content_type" is

                        application/fits</li>

                    </ul>

                    <li>GAVO :  <br>

                    </li>

                    <ul>

                      <li>In the example I found The DataLink service

                        had "this" in semantics,  and also "preview",

                        "proc" and "science".</li>

                      <li> "this" and "preview" are self-explanatory. <br>

                      </li>

                      <li>"proc" is actually related to a SODA service

                        (should be "cutout" maybe ?) <br>

                      </li>

                      <li>and science is a new term proposed by Markus

                        to take into account that it is related science

                        data  </li>

                    </ul>

                    <li>CASDA : <br>

                    </li>

                    <ul>

                      <li> In the example I found,  "Main" was a cube.

                        It had in semantics several "this", a "cutout

                        and a "proc".</li>

                      <li>  Each "this" row allowed the retrieval of the

                        full dataset from different servers sometimes in

                        synchronous mode and sometimes in asynchronous

                        mode.</li>

                      <li> The "cutout" row is related to a SODA

                        service. <br>

                      </li>

                      <li>The "proc" row links to a SODA-like service

                        extracting a single integrated spectrum from the

                        data cube.</li>

                    </ul>

                  </ul>

                  <li>This shows that semantics is not only there in

                    DataLink for selection among rows in the {links}

                    response table but also helps the client to figure

                    out what to do with the target in combination with

                    content-type, content_length and service descriptor

                    (if any is defined).   </li>

                  <li>This also shows that semantics terms work like a

                    flat vocabulary despite their tree presentation in

                    the rdf document. </li>

                  <ul>

                    <li>Auxiliary is a head term for bias, dark, flat

                      but can also be used on its own for non registered

                      cases.</li>

                    <li>Same for proc and cutout. </li>

                    <li>The tree structure of the vocabulary is actually

                      only descriptive. It's not functional at the time

                      of writing. </li>

                  </ul>

                </ul>

              </ul>

              <li>New Uses cases:</li>

              <ul>

                <li>Short after DataLink became an official IVOA

                  recommendation, some data providers were interested 

                  in using the DataLink functionalities for use cases

                  where the "Main" was a source in a catalogue.</li>

                <li> This can work, of course, and proposal are

                  currently discussed to integrate these use cases

                  within the scope of DataLink-1.1, but no adapted

                  semantics terms describing this kind of relationship

                  between the "Main" and the "Target" were available in

                  the previous vocabulary.</li>

                <li>Often  the "Target" related to the source "Main" is

                  the result of an observation of the source, actually a

                  dataset (image, spectrum, lightcurve, etc..)</li>

                <ul>

                  <li> In vizieR we had a similar situation for what we

                    call "associated data" to catalogue "rows". </li>

                  <li>these "associated data" can indeed be images,

                    TimeSeries, cubes, spectra...</li>

                </ul>

                <li> Hence the VEP0001 proposal as it was presented in

                  October the 15th<br>

                </li>

                <ul>

                  <li>An associated_image is actually "an image of main"

                    which is a source.</li>

                  <li> An associated_lightcurve is similarly " a light

                    curve of Main"   which is a source.</li>

                </ul>

                <li> It is to be en-lighted that this term informs the

                  client that it is an image or a light curve and that

                  it is an Observation result of the source.  </li>

                <li>The proposal to define an item in the associated

                  branch for each value of dataproduct_type and even

                  more for each subtype of TimeSeries introduced the

                  idea to combine associated_data with the ObsCore

                  vocabulary.</li>

                <ul>

                  <li> It was pointed out (By Markus) that other head

                    terms such has "progenitor" or "derived" could need

                    this too and this could lead to a combinatory

                    explosion. </li>

                </ul>

                <li>By the way the term "associated_data" itself has

                  been criticized to describe the concept of observation

                  result of a source.</li>

              </ul>

              <li>The 4 concepts proposal</li>

              <ul>

                <li>Ada proposed to separate the description of the

                  links in 4 different concepts</li>

                <ul>

                  <li>"4 independent levels or categories: </li>

                  <li>Level 0 - Data-format (fits, VOTable, PDF, png, …)</li>

                  <li>Level 1 - Data-type (tabular, image, spectrum,

                    cube, text, …)</li>

                  <li>Level 2 - Data-information (Documentation,

                    Calibration, Log, Preview, …)</li>

                  <li>Level 3 - Data-relation (Derived from, Progenitor

                    of, Sibling of, ...)"</li>

                </ul>

                <li>I think this introduces an effort for a  real data

                  modelling of DataLink. It would be obviously a major

                  improvement in the way we link resources. But it may

                  take sometimes to achieve.</li>

                <li>At the moment I don't see a clear distinction

                  between level 2 and level 3 because the "information"

                  we have in the "Target"  is always "relative" to a

                  "Main" so not  that far from level 3. At least it may

                  be sometimes difficult to know  in which "level"

                  falls  a given category value </li>

                <li>On the other side for links to dynamical services I

                  am not sure to which category their characterization

                  belongs. Is that  a fifth level to add ? Data-type in

                  the context of DataLink should have a much wider scope

                  than ObsCore "dataproduct_type" because there are

                  targets which are not data products. Various metadata,

                  auxiliary data, texts, plots, etc... If

                  data_product_type is standardized, what about the

                  other stuff ? <br>

                </li>

                <li>To me It looks like the levels proposed by ada (an

                  maybe a few others) are more like matrix description

                  tant a flat one. <br>

                </li>

                <li>Account taken of all the above, I think the

                  levelling of the categories can be a project for

                  DataLink 2 which will be really interesting. if we

                  want to have a quick solution I think we have to

                  consider more modest solutions.</li>

              </ul>

              <li>Among different Proposals :</li>

              <ul>

                <li>I see two possible simple solutions to tackle the

                  use case</li>

                <ul>

                  <li>go back to a simplified version of VEP001.  </li>

                  <ul>

                    <li>Instead to reproduce the full ObsCore

                      "dataproduct_type" variability we only define the

                      terms we currently need  and we will see in the

                      future if we need more.</li>

                    <li>At the same time I get rid both of

                      "associated_data" and "sibling" head term and

                      choose to use "Observation_Result_of_source"</li>

                    <li>ESO and SVO use cases :   "image_of_source"",

                      "Spectrum_of_source"</li>

                    <li>TimeDomain/Gaia use cases : 

                      "LightCurve_Of_Source",

                      "RadialVelocityCurve_Of_Source",

                      "Movie_Of_Source", "SpectroChronogram_Of_Source"</li>

                    <ul>

                      <li>"TimeSeries_Of_Source" may be used as a head

                        term for the four above, or when we don't know

                        exactly what is varying in time.</li>

                    </ul>

                  </ul>

                  <li>adopt proposal made by Pat Dowler. Use the media

                    type in content_type to give the type or product

                    type using the parameter "content="</li>

                  <ul>

                    <li>application/fits;content=image</li>

                    <li>application/fits;content=spectrum</li>

                    <li> application/fits;content=lightcurve or

                      application/fits;content=timeseries;subtype=lightcurve</li>

                    <li>application/fits;content=movie or

                      applicaton/fits;content=timeseries;subtype=movie</li>

                    <li>etc ...</li>

                  </ul>

                  <ul>

                    <li>the standard structure of media types allows to

                      reuse the current "dataproduct_type" vocabularu 

                      as a vlaue of the content parameter and then to

                      use an additional "subtype" parameter, or

                      alternatively  to directly use the timseries

                      subtype in "content=".</li>

                    <li>a variant would be to create a new

                      dataproduct_type parameter in the media type when

                      appropriate<br>

                    </li>

                    <li> If we adopt that, semantics will only be

                      "Observation_Result_of_source" in parallel for all

                      these possibilities<br>

                    </li>

                  </ul>

                  <li> In the first solution we directly introduce some

                    kind of datatype in the "meaning of target relative

                    to the main" semantics field which I think it's fine

                    except that it doesn't explicitely reuse ObsCore

                    dataproducttype.</li>

                  <li>In the second solution clients will have to parse

                    the media type to discover not only the format of

                    the target but also its content. We still have to

                    decide how to do subtype. <br>

                  </li>

                  <ul>

                    <li>This has probably to be explicitly explained in

                      the next DataLink-1.1 version</li>

                  </ul>

                </ul>

                <li>What do implementers / service providers prefer ?</li>

              </ul>

            </ul>

            <p><br>

            </p>

            <p>I wish you all happy holidays for the coming days</p>

            <p>Cheers</p>

            <p>François<br>

            </p>

            <p><br>

            </p>

            <p><br>

              <br>

                              <br>

              <br>

              <br>

              <br>

              <br>

              <br>

                   <br>

                   <br>

              <br>

              <br>

              <br>

            </p>

            <br>

          </div>

        </blockquote>

      </div>

    </blockquote>

    <br>

  </body>

</html>