DataLink target meaning : "observation results of a source" use case
François Bonnarel
francois.bonnarel at astro.unistra.fr
Thu Feb 13 19:13:49 CET 2020
Hi all,a
Trying to go further on this point (also related to Markus email VEP003
yesterday)
2 things...
I ) After discussing with a couple of people, I think the productype for
these associated datasets can be set in the content param of the
media-type-value of DataLink content_type eg
application/fits;content=timeseries;subtype=lightcurve
This will probably require a new change proposal in the DataLink spec
itself. It's not a big one . I will prepare it tommorrow.
II ) For the semantics term we need to relate a dataset to a source in a
catalog, I think the most general thing we cans say about it is that it
is "cross-correlation". I propose the term "CrossedDataset". This can be
the head term for #sibling, #contains, #folowup, etc...
Cheers
François
Le 07/01/2020 à 23:04, Patrick Dowler a écrit :
>
> First, although DataLink was conceived with an implicit "resource is a
> dataset" that leaked into the terminology and examples, I agree that
> there is no reason that it cannot be used for other kinds of entities.
> Using that particular word does conjure up provenance, but datalink
> and provenance are already related (#progenitor) conceptually.
>
> The way I am still seeing this, dataproduct_type (from ObsCore) says
> what something *is* and that is not a relationship per se. Aside: on
> the issue of subtype, I would prefer/like to make dataproduct_type a
> vocabulary so people could extend it rather than using a two-level
> type/subtype mechanism -- but only if we can figure out a sane/nice
> way to query vocabulary terms via TAP that actually works.
>
> I can think of several relationships from a source in a catalogue to a
> dataset and I still feel that the concept behind
> "Observation_Result_of_source" is eluding me. The relation could be:
>
> #progenitor : some/all source properties were measured in that dataset
> #derivation : the dataset was created from the source properties
>
> other possible relationships:
>
> contains : the dataset contains the source (seems like this is a
> top-level very general and vague statement; I would interpret this to
> also mean "and not progenitor")
>
> followup : the existence/discovery of the source caused a new
> observation to occur (child of contains, causal relation)
>
> So, for someone with a source (catalogue) and a realted
> image|spectrum|lightcurve, is that data one of these or is it some
> other concept?
>
>
> --
> Patrick Dowler
> Canadian Astronomy Data Centre
> Victoria, BC, Canada
>
>
> On Fri, 20 Dec 2019 at 07:46, François Bonnarel
> <francois.bonnarel at astro.unistra.fr
> <mailto:francois.bonnarel at astro.unistra.fr>> wrote:
>
> This email was sent yesterday in another thread.
>
> Following Markus' recommendation I open now a new thread for this
> discussion of the "astronomical source observation results" use cases.
>
> Cheers
>
> François
>
> Dear all,
>
> * When I proposed VEP0001 immediately after Groningen Interop I
> could not imagine that such a controversy discussion would occur.
> o Before considering the use case we have I would like to go
> back to the current usages of DataLink I know.
> o Then go back to the "new" use case
> o And then check some of the proposed solutions on this list
> o And then argue for my preference
> * According to DataLink 1.0
> o the semantics field contains a "Term from a controlled
> vocabulary describing the link" as stated in Table 1 and
> o section 3.2.6 reads :
> o "The semantics column contains a single term from an
> external RDF vocabulary that describes the meaning of this
> linked resource relative to the identified dataset. The
> semantics column is intended to be machine-readable and
> assist automating data retrieval and processing."
> o Let's call the initial thing we are starting from and to
> which we want to link resources "Main" and the various
> linked resources "Target".
> + Two remarks :
> # The text in section 3.2.6, consistently with the
> use cases described in the introduction considers
> that the "Main" is a dataset
> # The semantics field describes globally what the
> target is "with respect to the main"
> + More classical is the group of columns access_URL ,
> content_type, content_length which references and
> describes the "Target" itself (independently from the
> "Main")
> + Now I tried to look a little bit at the current usage
> of DataLink using Aladin DeskTop as a client and the
> three following SIAP2 servers
> # CADC :
> * In the example I found The DataLink service
> had "this" in semantics for the full retrieval
> of the dataset,
> * "cutout" for a SODA service
> * and a couple of "auxiliary" Rows for
> additional data such as PSF images, etc...
> * cutout is related to the fact that it is a
> service, described as "service descriptor".
> Aladin opens a specific menu in that case
> while it downloads the datasets in the other
> cases according to the fact its "content_type"
> is application/fits
> # GAVO :
> * In the example I found The DataLink service
> had "this" in semantics, and also "preview",
> "proc" and "science".
> * "this" and "preview" are self-explanatory.
> * "proc" is actually related to a SODA service
> (should be "cutout" maybe ?)
> * and science is a new term proposed by Markus
> to take into account that it is related
> science data
> # CASDA :
> * In the example I found, "Main" was a cube.
> It had in semantics several "this", a "cutout
> and a "proc".
> * Each "this" row allowed the retrieval of the
> full dataset from different servers sometimes
> in synchronous mode and sometimes in
> asynchronous mode.
> * The "cutout" row is related to a SODA service.
> * The "proc" row links to a SODA-like service
> extracting a single integrated spectrum from
> the data cube.
> + This shows that semantics is not only there in
> DataLink for selection among rows in the {links}
> response table but also helps the client to figure out
> what to do with the target in combination with
> content-type, content_length and service descriptor
> (if any is defined).
> + This also shows that semantics terms work like a flat
> vocabulary despite their tree presentation in the rdf
> document.
> # Auxiliary is a head term for bias, dark, flat but
> can also be used on its own for non registered cases.
> # Same for proc and cutout.
> # The tree structure of the vocabulary is actually
> only descriptive. It's not functional at the time
> of writing.
> * New Uses cases:
> o Short after DataLink became an official IVOA
> recommendation, some data providers were interested in
> using the DataLink functionalities for use cases where the
> "Main" was a source in a catalogue.
> o This can work, of course, and proposal are currently
> discussed to integrate these use cases within the scope of
> DataLink-1.1, but no adapted semantics terms describing
> this kind of relationship between the "Main" and the
> "Target" were available in the previous vocabulary.
> o Often the "Target" related to the source "Main" is the
> result of an observation of the source, actually a dataset
> (image, spectrum, lightcurve, etc..)
> + In vizieR we had a similar situation for what we call
> "associated data" to catalogue "rows".
> + these "associated data" can indeed be images,
> TimeSeries, cubes, spectra...
> o Hence the VEP0001 proposal as it was presented in October
> the 15th
> + An associated_image is actually "an image of main"
> which is a source.
> + An associated_lightcurve is similarly " a light curve
> of Main" which is a source.
> o It is to be en-lighted that this term informs the client
> that it is an image or a light curve and that it is an
> Observation result of the source.
> o The proposal to define an item in the associated branch
> for each value of dataproduct_type and even more for each
> subtype of TimeSeries introduced the idea to combine
> associated_data with the ObsCore vocabulary.
> + It was pointed out (By Markus) that other head terms
> such has "progenitor" or "derived" could need this too
> and this could lead to a combinatory explosion.
> o By the way the term "associated_data" itself has been
> criticized to describe the concept of observation result
> of a source.
> * The 4 concepts proposal
> o Ada proposed to separate the description of the links in 4
> different concepts
> + "4 independent levels or categories:
> + Level 0 - Data-format (fits, VOTable, PDF, png, …)
> + Level 1 - Data-type (tabular, image, spectrum, cube,
> text, …)
> + Level 2 - Data-information (Documentation,
> Calibration, Log, Preview, …)
> + Level 3 - Data-relation (Derived from, Progenitor of,
> Sibling of, ...)"
> o I think this introduces an effort for a real data
> modelling of DataLink. It would be obviously a major
> improvement in the way we link resources. But it may take
> sometimes to achieve.
> o At the moment I don't see a clear distinction between
> level 2 and level 3 because the "information" we have in
> the "Target" is always "relative" to a "Main" so not
> that far from level 3. At least it may be sometimes
> difficult to know in which "level" falls a given
> category value
> o On the other side for links to dynamical services I am not
> sure to which category their characterization belongs. Is
> that a fifth level to add ? Data-type in the context of
> DataLink should have a much wider scope than ObsCore
> "dataproduct_type" because there are targets which are not
> data products. Various metadata, auxiliary data, texts,
> plots, etc... If data_product_type is standardized, what
> about the other stuff ?
> o To me It looks like the levels proposed by ada (an maybe a
> few others) are more like matrix description tant a flat one.
> o Account taken of all the above, I think the levelling of
> the categories can be a project for DataLink 2 which will
> be really interesting. if we want to have a quick solution
> I think we have to consider more modest solutions.
> * Among different Proposals :
> o I see two possible simple solutions to tackle the use case
> + go back to a simplified version of VEP001.
> # Instead to reproduce the full ObsCore
> "dataproduct_type" variability we only define the
> terms we currently need and we will see in the
> future if we need more.
> # At the same time I get rid both of
> "associated_data" and "sibling" head term and
> choose to use "Observation_Result_of_source"
> # ESO and SVO use cases : "image_of_source"",
> "Spectrum_of_source"
> # TimeDomain/Gaia use cases :
> "LightCurve_Of_Source",
> "RadialVelocityCurve_Of_Source",
> "Movie_Of_Source", "SpectroChronogram_Of_Source"
> * "TimeSeries_Of_Source" may be used as a head
> term for the four above, or when we don't know
> exactly what is varying in time.
> + adopt proposal made by Pat Dowler. Use the media type
> in content_type to give the type or product type using
> the parameter "content="
> # application/fits;content=image
> # application/fits;content=spectrum
> # application/fits;content=lightcurve or
> application/fits;content=timeseries;subtype=lightcurve
> # application/fits;content=movie or
> applicaton/fits;content=timeseries;subtype=movie
> # etc ...
> # the standard structure of media types allows to
> reuse the current "dataproduct_type" vocabularu as
> a vlaue of the content parameter and then to use
> an additional "subtype" parameter, or
> alternatively to directly use the timseries
> subtype in "content=".
> # a variant would be to create a new
> dataproduct_type parameter in the media type when
> appropriate
> # If we adopt that, semantics will only be
> "Observation_Result_of_source" in parallel for all
> these possibilities
> + In the first solution we directly introduce some kind
> of datatype in the "meaning of target relative to the
> main" semantics field which I think it's fine except
> that it doesn't explicitely reuse ObsCore dataproducttype.
> + In the second solution clients will have to parse the
> media type to discover not only the format of the
> target but also its content. We still have to decide
> how to do subtype.
> # This has probably to be explicitly explained in
> the next DataLink-1.1 version
> o What do implementers / service providers prefer ?
>
>
> I wish you all happy holidays for the coming days
>
> Cheers
>
> François
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20200213/1f4f3d7d/attachment-0001.html>
More information about the dal
mailing list