<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hi all,a</p>
<p>Trying to go further on this point (also related to Markus email
VEP003 yesterday)</p>
<p>2 things...</p>
<p>I ) After discussing with a couple of people, I think the
productype for these associated datasets can be set in the content
param of the media-type-value of DataLink content_type eg<br>
</p>
application/fits;content=timeseries;subtype=lightcurve<br>
<br>
This will probably require a new change proposal in the DataLink
spec itself. It's not a big one . I will prepare it tommorrow.<br>
<br>
II ) For the semantics term we need to relate a dataset to a source
in a catalog, I think the most general thing we cans say about it is
that it is "cross-correlation". I propose the term "CrossedDataset".
This can be the head term for #sibling, #contains, #folowup, etc...<br>
<br>
Cheers<br>
François<br>
<br>
<div class="moz-cite-prefix">Le 07/01/2020 à 23:04, Patrick Dowler a
écrit :<br>
</div>
<blockquote type="cite"
cite="mid:CAFK8nrpfMDO5M5tfDNCaG4MfFjs5to9MANLfey0FfbLEHnBRRg@mail.gmail.com">
<meta http-equiv="Context-Type" content="text/html; charset=UTF-8">
<div dir="ltr">
<div class="gmail_default"><br>
</div>
<div class="gmail_default">First, although DataLink was
conceived with an implicit "resource is a dataset" that leaked
into the terminology and examples, I agree that there is no
reason that it cannot be used for other kinds of entities.
Using that particular word does conjure up provenance, but
datalink and provenance are already related (#progenitor)
conceptually.<br>
</div>
<div class="gmail_default"><br>
</div>
<div class="gmail_default">The way I am still seeing this,
dataproduct_type (from ObsCore) says what something *is* and
that is not a relationship per se. Aside: on the issue of
subtype, I would prefer/like to make dataproduct_type a
vocabulary so people could extend it rather than using a
two-level type/subtype mechanism -- but only if we can figure
out a sane/nice way to query vocabulary terms via TAP that
actually works.<br>
</div>
<div class="gmail_default"><br>
</div>
<div class="gmail_default">I can think of several relationships
from a source in a catalogue to a dataset and I still feel
that the concept behind "Observation_Result_of_source" is
eluding me. The relation could be:</div>
<div class="gmail_default"><br>
</div>
<div class="gmail_default"> #progenitor : some/all source
properties were measured in that dataset</div>
<div class="gmail_default">#derivation : the dataset was created
from the source properties<br>
</div>
<div class="gmail_default"><br>
</div>
<div class="gmail_default">other possible relationships:</div>
<div class="gmail_default"><br>
</div>
<div class="gmail_default">contains : the dataset contains the
source (seems like this is a top-level very general and vague
statement; I would interpret this to also mean "and not
progenitor")<br>
</div>
<div class="gmail_default"><br>
</div>
<div class="gmail_default">followup : the existence/discovery of
the source caused a new observation to occur (child of
contains, causal relation)</div>
<div class="gmail_default"><br>
</div>
<div class="gmail_default">So, for someone with a source
(catalogue) and a realted image|spectrum|lightcurve, is that
data one of these or is it some other concept?<br>
</div>
<div class="gmail_default"><br>
</div>
<div class="gmail_default"><br>
</div>
<div>
<div dir="ltr" class="gmail_signature"
data-smartmail="gmail_signature">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div>--<br>
</div>
<div>Patrick Dowler<br>
</div>
Canadian Astronomy Data Centre<br>
</div>
Victoria, BC, Canada<br>
</div>
</div>
</div>
</div>
</div>
<br>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, 20 Dec 2019 at 07:46,
François Bonnarel <<a
href="mailto:francois.bonnarel@astro.unistra.fr"
moz-do-not-send="true">francois.bonnarel@astro.unistra.fr</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote">
<div>
<p> </p>
<div lang="x-unicode"> </div>
<p>This email was sent yesterday in another thread.</p>
<p>Following Markus' recommendation I open now a new thread
for this discussion of the "astronomical source
observation results" use cases.</p>
<p>Cheers</p>
<p>François<br>
</p>
<p>Dear all,<br>
</p>
<ul>
<li>When I proposed VEP0001 immediately after Groningen
Interop I could not imagine that such a controversy
discussion would occur. <br>
</li>
<ul>
<li>Before considering the use case we have I would like
to go back to the current usages of DataLink I know.</li>
<li>Then go back to the "new" use case</li>
<li>And then check some of the proposed solutions on
this list</li>
<li>And then argue for my preference</li>
</ul>
<li>According to DataLink 1.0 <br>
</li>
<ul>
<li>the semantics field contains a "Term from a
controlled vocabulary describing the link" as stated
in Table 1 and </li>
<li>section 3.2.6 reads :</li>
<li>"The semantics column contains a single term from an
external RDF vocabulary that describes the meaning of
this linked resource relative to the identified
dataset. The semantics column is intended to be
machine-readable and assist automating data retrieval
and processing."</li>
<li>Let's call the initial thing we are starting from
and to which we want to link resources "Main" and the
various linked resources "Target".</li>
<ul>
<li>Two remarks :</li>
<ul>
<li>The text in section 3.2.6, consistently with the
use cases described in the introduction considers
that the "Main" is a dataset</li>
<li>The semantics field describes globally what the
target is "with respect to the main"</li>
</ul>
<li>More classical is the group of columns access_URL
, content_type, content_length which references and
describes the "Target" itself (independently from
the "Main")</li>
<li>Now I tried to look a little bit at the current
usage of DataLink using Aladin DeskTop as a client
and the three following SIAP2 servers </li>
<ul>
<li>CADC : <br>
</li>
<ul>
<li>In the example I found The DataLink service
had "this" in semantics for the full retrieval
of the dataset,</li>
<li> "cutout" for a SODA service <br>
</li>
<li>and a couple of "auxiliary" Rows for
additional data such as PSF images, etc...</li>
<li> cutout is related to the fact that it is a
service, described as "service descriptor".
Aladin opens a specific menu in that case while
it downloads the datasets in the other cases
according to the fact its "content_type" is
application/fits</li>
</ul>
<li>GAVO : <br>
</li>
<ul>
<li>In the example I found The DataLink service
had "this" in semantics, and also "preview",
"proc" and "science".</li>
<li> "this" and "preview" are self-explanatory. <br>
</li>
<li>"proc" is actually related to a SODA service
(should be "cutout" maybe ?) <br>
</li>
<li>and science is a new term proposed by Markus
to take into account that it is related science
data </li>
</ul>
<li>CASDA : <br>
</li>
<ul>
<li> In the example I found, "Main" was a cube.
It had in semantics several "this", a "cutout
and a "proc".</li>
<li> Each "this" row allowed the retrieval of the
full dataset from different servers sometimes in
synchronous mode and sometimes in asynchronous
mode.</li>
<li> The "cutout" row is related to a SODA
service. <br>
</li>
<li>The "proc" row links to a SODA-like service
extracting a single integrated spectrum from the
data cube.</li>
</ul>
</ul>
<li>This shows that semantics is not only there in
DataLink for selection among rows in the {links}
response table but also helps the client to figure
out what to do with the target in combination with
content-type, content_length and service descriptor
(if any is defined). </li>
<li>This also shows that semantics terms work like a
flat vocabulary despite their tree presentation in
the rdf document. </li>
<ul>
<li>Auxiliary is a head term for bias, dark, flat
but can also be used on its own for non registered
cases.</li>
<li>Same for proc and cutout. </li>
<li>The tree structure of the vocabulary is actually
only descriptive. It's not functional at the time
of writing. </li>
</ul>
</ul>
</ul>
<li>New Uses cases:</li>
<ul>
<li>Short after DataLink became an official IVOA
recommendation, some data providers were interested
in using the DataLink functionalities for use cases
where the "Main" was a source in a catalogue.</li>
<li> This can work, of course, and proposal are
currently discussed to integrate these use cases
within the scope of DataLink-1.1, but no adapted
semantics terms describing this kind of relationship
between the "Main" and the "Target" were available in
the previous vocabulary.</li>
<li>Often the "Target" related to the source "Main" is
the result of an observation of the source, actually a
dataset (image, spectrum, lightcurve, etc..)</li>
<ul>
<li> In vizieR we had a similar situation for what we
call "associated data" to catalogue "rows". </li>
<li>these "associated data" can indeed be images,
TimeSeries, cubes, spectra...</li>
</ul>
<li> Hence the VEP0001 proposal as it was presented in
October the 15th<br>
</li>
<ul>
<li>An associated_image is actually "an image of main"
which is a source.</li>
<li> An associated_lightcurve is similarly " a light
curve of Main" which is a source.</li>
</ul>
<li> It is to be en-lighted that this term informs the
client that it is an image or a light curve and that
it is an Observation result of the source. </li>
<li>The proposal to define an item in the associated
branch for each value of dataproduct_type and even
more for each subtype of TimeSeries introduced the
idea to combine associated_data with the ObsCore
vocabulary.</li>
<ul>
<li> It was pointed out (By Markus) that other head
terms such has "progenitor" or "derived" could need
this too and this could lead to a combinatory
explosion. </li>
</ul>
<li>By the way the term "associated_data" itself has
been criticized to describe the concept of observation
result of a source.</li>
</ul>
<li>The 4 concepts proposal</li>
<ul>
<li>Ada proposed to separate the description of the
links in 4 different concepts</li>
<ul>
<li>"4 independent levels or categories: </li>
<li>Level 0 - Data-format (fits, VOTable, PDF, png, …)</li>
<li>Level 1 - Data-type (tabular, image, spectrum,
cube, text, …)</li>
<li>Level 2 - Data-information (Documentation,
Calibration, Log, Preview, …)</li>
<li>Level 3 - Data-relation (Derived from, Progenitor
of, Sibling of, ...)"</li>
</ul>
<li>I think this introduces an effort for a real data
modelling of DataLink. It would be obviously a major
improvement in the way we link resources. But it may
take sometimes to achieve.</li>
<li>At the moment I don't see a clear distinction
between level 2 and level 3 because the "information"
we have in the "Target" is always "relative" to a
"Main" so not that far from level 3. At least it may
be sometimes difficult to know in which "level"
falls a given category value </li>
<li>On the other side for links to dynamical services I
am not sure to which category their characterization
belongs. Is that a fifth level to add ? Data-type in
the context of DataLink should have a much wider scope
than ObsCore "dataproduct_type" because there are
targets which are not data products. Various metadata,
auxiliary data, texts, plots, etc... If
data_product_type is standardized, what about the
other stuff ? <br>
</li>
<li>To me It looks like the levels proposed by ada (an
maybe a few others) are more like matrix description
tant a flat one. <br>
</li>
<li>Account taken of all the above, I think the
levelling of the categories can be a project for
DataLink 2 which will be really interesting. if we
want to have a quick solution I think we have to
consider more modest solutions.</li>
</ul>
<li>Among different Proposals :</li>
<ul>
<li>I see two possible simple solutions to tackle the
use case</li>
<ul>
<li>go back to a simplified version of VEP001. </li>
<ul>
<li>Instead to reproduce the full ObsCore
"dataproduct_type" variability we only define the
terms we currently need and we will see in the
future if we need more.</li>
<li>At the same time I get rid both of
"associated_data" and "sibling" head term and
choose to use "Observation_Result_of_source"</li>
<li>ESO and SVO use cases : "image_of_source"",
"Spectrum_of_source"</li>
<li>TimeDomain/Gaia use cases :
"LightCurve_Of_Source",
"RadialVelocityCurve_Of_Source",
"Movie_Of_Source", "SpectroChronogram_Of_Source"</li>
<ul>
<li>"TimeSeries_Of_Source" may be used as a head
term for the four above, or when we don't know
exactly what is varying in time.</li>
</ul>
</ul>
<li>adopt proposal made by Pat Dowler. Use the media
type in content_type to give the type or product
type using the parameter "content="</li>
<ul>
<li>application/fits;content=image</li>
<li>application/fits;content=spectrum</li>
<li> application/fits;content=lightcurve or
application/fits;content=timeseries;subtype=lightcurve</li>
<li>application/fits;content=movie or
applicaton/fits;content=timeseries;subtype=movie</li>
<li>etc ...</li>
</ul>
<ul>
<li>the standard structure of media types allows to
reuse the current "dataproduct_type" vocabularu
as a vlaue of the content parameter and then to
use an additional "subtype" parameter, or
alternatively to directly use the timseries
subtype in "content=".</li>
<li>a variant would be to create a new
dataproduct_type parameter in the media type when
appropriate<br>
</li>
<li> If we adopt that, semantics will only be
"Observation_Result_of_source" in parallel for all
these possibilities<br>
</li>
</ul>
<li> In the first solution we directly introduce some
kind of datatype in the "meaning of target relative
to the main" semantics field which I think it's fine
except that it doesn't explicitely reuse ObsCore
dataproducttype.</li>
<li>In the second solution clients will have to parse
the media type to discover not only the format of
the target but also its content. We still have to
decide how to do subtype. <br>
</li>
<ul>
<li>This has probably to be explicitly explained in
the next DataLink-1.1 version</li>
</ul>
</ul>
<li>What do implementers / service providers prefer ?</li>
</ul>
</ul>
<p><br>
</p>
<p>I wish you all happy holidays for the coming days</p>
<p>Cheers</p>
<p>François<br>
</p>
<p><br>
</p>
<p><br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
</p>
<br>
</div>
</blockquote>
</div>
</blockquote>
<br>
</body>
</html>