DataLink target meaning : "observation results of a source" use case

Patrick Dowler pdowler.cadc at gmail.com
Tue Jan 7 23:04:18 CET 2020


First, although DataLink was conceived with an implicit "resource is a
dataset" that leaked into the terminology and examples, I agree that there
is no reason that it cannot be used for other kinds of entities. Using that
particular word does conjure up provenance, but datalink and provenance are
already related (#progenitor) conceptually.

The way I am still seeing this, dataproduct_type (from ObsCore) says what
something *is* and that is not a relationship per se. Aside: on the issue
of subtype, I would prefer/like to make dataproduct_type a vocabulary so
people could extend it rather than using a two-level type/subtype mechanism
-- but only if we can figure out a sane/nice way to query vocabulary terms
via TAP that actually works.

I can think of several relationships from a source in a catalogue to a
dataset and I still feel that the concept behind
"Observation_Result_of_source" is eluding me. The relation could be:

#progenitor : some/all source properties were measured in that dataset
#derivation : the dataset was created from the source properties

other possible relationships:

contains : the dataset contains the source (seems like this is a top-level
very general and vague statement; I would interpret this to also mean "and
not progenitor")

followup : the existence/discovery of the source caused a new observation
to occur (child of contains, causal relation)

So, for someone with a source (catalogue) and a realted
image|spectrum|lightcurve, is that data one of these or is it some other
concept?


--
Patrick Dowler
Canadian Astronomy Data Centre
Victoria, BC, Canada


On Fri, 20 Dec 2019 at 07:46, François Bonnarel <
francois.bonnarel at astro.unistra.fr> wrote:

> This email was sent yesterday in another thread.
>
> Following Markus' recommendation I open now a new thread for this
> discussion of the "astronomical source observation results" use cases.
>
> Cheers
>
> François
>
> Dear all,
>
>    - When I proposed VEP0001 immediately after Groningen Interop I could
>    not imagine that such a controversy discussion would occur.
>    - Before considering the use case we have I would like to go back to
>       the current usages of DataLink I know.
>       - Then go back to the "new" use case
>       - And then check some of the proposed solutions on this list
>       - And then argue for my preference
>    - According to DataLink 1.0
>    - the semantics field contains a "Term from a controlled vocabulary
>       describing the link" as stated in Table 1 and
>       - section 3.2.6 reads :
>       - "The semantics column contains a single term from an external RDF
>       vocabulary that describes the meaning of this linked resource relative to
>       the identified dataset. The semantics column is intended to be
>       machine-readable and assist automating data retrieval and processing."
>       - Let's call the initial thing we are starting from and to which we
>       want to link resources "Main" and the various linked resources "Target".
>          - Two remarks  :
>             - The text in section 3.2.6, consistently with the use cases
>             described in the introduction considers that the "Main" is a dataset
>             - The  semantics field describes globally what the target is
>             "with respect to the main"
>          - More classical is the group of columns access_URL ,
>          content_type, content_length which references and describes the "Target"
>          itself (independently from the "Main")
>          - Now I tried to look a little bit at the current usage of
>          DataLink using Aladin DeskTop as a client and the three following SIAP2
>          servers
>             - CADC :
>             - In the example I found The DataLink service had "this" in
>                semantics for the full retrieval of the dataset,
>                -  "cutout" for a SODA service
>                - and a couple of "auxiliary" Rows for additional data
>                such as PSF images, etc...
>                -  cutout is related to the fact that it is a service,
>                described as "service descriptor". Aladin opens a specific menu in that
>                case while it downloads the datasets in the other cases according to the
>                fact its "content_type" is application/fits
>             - GAVO :
>             - In the example I found The DataLink service had "this" in
>                semantics,  and also "preview", "proc" and "science".
>                -  "this" and "preview" are self-explanatory.
>                - "proc" is actually related to a SODA service (should be
>                "cutout" maybe ?)
>                - and science is a new term proposed by Markus to take
>                into account that it is related science data
>             - CASDA :
>             -  In the example I found,  "Main" was a cube. It had in
>                semantics several "this", a "cutout and a "proc".
>                -   Each "this" row allowed the retrieval of the full
>                dataset from different servers sometimes in synchronous mode and sometimes
>                in asynchronous mode.
>                -  The "cutout" row is related to a SODA service.
>                - The "proc" row links to a SODA-like service extracting a
>                single integrated spectrum from the data cube.
>             - This shows that semantics is not only there in DataLink for
>          selection among rows in the {links} response table but also helps the
>          client to figure out what to do with the target in combination with
>          content-type, content_length and service descriptor (if any is defined).
>          - This also shows that semantics terms work like a flat
>          vocabulary despite their tree presentation in the rdf document.
>             - Auxiliary is a head term for bias, dark, flat but can also
>             be used on its own for non registered cases.
>             - Same for proc and cutout.
>             - The tree structure of the vocabulary is actually only
>             descriptive. It's not functional at the time of writing.
>          - New Uses cases:
>       - Short after DataLink became an official IVOA recommendation, some
>       data providers were interested  in using the DataLink functionalities for
>       use cases where the "Main" was a source in a catalogue.
>       -  This can work, of course, and proposal are currently discussed
>       to integrate these use cases within the scope of DataLink-1.1, but no
>       adapted semantics terms describing this kind of relationship between the
>       "Main" and the "Target" were available in the previous vocabulary.
>       - Often  the "Target" related to the source "Main" is the result of
>       an observation of the source, actually a dataset (image, spectrum,
>       lightcurve, etc..)
>          -  In vizieR we had a similar situation for what we call
>          "associated data" to catalogue "rows".
>          - these "associated data" can indeed be images, TimeSeries,
>          cubes, spectra...
>       -  Hence the VEP0001 proposal as it was presented in October the
>       15th
>       - An associated_image is actually "an image of main" which is a
>          source.
>          -  An associated_lightcurve is similarly " a light curve of
>          Main"   which is a source.
>       -  It is to be en-lighted that this term informs the client that it
>       is an image or a light curve and that it is an Observation result of the
>       source.
>       - The proposal to define an item in the associated branch for each
>       value of dataproduct_type and even more for each subtype of TimeSeries
>       introduced the idea to combine associated_data with the ObsCore vocabulary.
>          -  It was pointed out (By Markus) that other head terms such has
>          "progenitor" or "derived" could need this too and this could lead to a
>          combinatory explosion.
>       - By the way the term "associated_data" itself has been criticized
>       to describe the concept of observation result of a source.
>    - The 4 concepts proposal
>       - Ada proposed to separate the description of the links in 4
>       different concepts
>          - "4 independent levels or categories:
>          - Level 0 - Data-format (fits, VOTable, PDF, png, …)
>          - Level 1 - Data-type (tabular, image, spectrum, cube, text, …)
>          - Level 2 - Data-information (Documentation, Calibration, Log,
>          Preview, …)
>          - Level 3 - Data-relation (Derived from, Progenitor of, Sibling
>          of, ...)"
>       - I think this introduces an effort for a  real data modelling of
>       DataLink. It would be obviously a major improvement in the way we link
>       resources. But it may take sometimes to achieve.
>       - At the moment I don't see a clear distinction between level 2 and
>       level 3 because the "information" we have in the "Target"  is always
>       "relative" to a "Main" so not  that far from level 3. At least it may be
>       sometimes difficult to know  in which "level" falls  a given category value
>       - On the other side for links to dynamical services I am not sure
>       to which category their characterization belongs. Is that  a fifth level to
>       add ? Data-type in the context of DataLink should have a much wider scope
>       than ObsCore "dataproduct_type" because there are targets which are not
>       data products. Various metadata, auxiliary data, texts, plots, etc... If
>       data_product_type is standardized, what about the other stuff ?
>       - To me It looks like the levels proposed by ada (an maybe a few
>       others) are more like matrix description tant a flat one.
>       - Account taken of all the above, I think the levelling of the
>       categories can be a project for DataLink 2 which will be really
>       interesting. if we want to have a quick solution I think we have to
>       consider more modest solutions.
>    - Among different Proposals :
>       - I see two possible simple solutions to tackle the use case
>          - go back to a simplified version of VEP001.
>             - Instead to reproduce the full ObsCore "dataproduct_type"
>             variability we only define the terms we currently need  and we will see in
>             the future if we need more.
>             - At the same time I get rid both of "associated_data" and
>             "sibling" head term and choose to use "Observation_Result_of_source"
>             - ESO and SVO use cases :   "image_of_source"",
>             "Spectrum_of_source"
>             - TimeDomain/Gaia use cases :  "LightCurve_Of_Source",
>             "RadialVelocityCurve_Of_Source", "Movie_Of_Source",
>             "SpectroChronogram_Of_Source"
>                - "TimeSeries_Of_Source" may be used as a head term for
>                the four above, or when we don't know exactly what is varying in time.
>             - adopt proposal made by Pat Dowler. Use the media type in
>          content_type to give the type or product type using the parameter "content="
>             - application/fits;content=image
>             - application/fits;content=spectrum
>             -  application/fits;content=lightcurve or
>             application/fits;content=timeseries;subtype=lightcurve
>             - application/fits;content=movie or
>             applicaton/fits;content=timeseries;subtype=movie
>             - etc ...
>          - the standard structure of media types allows to reuse the
>             current "dataproduct_type" vocabularu  as a vlaue of the content parameter
>             and then to use an additional "subtype" parameter, or alternatively  to
>             directly use the timseries subtype in "content=".
>             - a variant would be to create a new dataproduct_type
>             parameter in the media type when appropriate
>             -  If we adopt that, semantics will only be
>             "Observation_Result_of_source" in parallel for all these possibilities
>             -  In the first solution we directly introduce some kind of
>          datatype in the "meaning of target relative to the main" semantics field
>          which I think it's fine except that it doesn't explicitely reuse ObsCore
>          dataproducttype.
>          - In the second solution clients will have to parse the media
>          type to discover not only the format of the target but also its content. We
>          still have to decide how to do subtype.
>          - This has probably to be explicitly explained in the next
>             DataLink-1.1 version
>          - What do implementers / service providers prefer ?
>
>
> I wish you all happy holidays for the coming days
>
> Cheers
>
> François
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20200107/635bf7de/attachment-0001.html>


More information about the dal mailing list