VEP-003 review

François Bonnarel francois.bonnarel at astro.unistra.fr
Wed May 27 19:34:56 CEST 2020


Dear colleagues,

Well I have a little concern to be considered "author"  of a proposal I 
am not agreeing fully with.

Let's try to explain that and propose a compromise.

* I fully agree with the definition, rationale, etc... This has been a 
progress to distinguish use  cases where the "linked" item has the same 
progenitor than the main item  from use cases where the linked item is 
simply a counterpart.

* Gaia use case is one among others use cases where providers wanted to 
associated a record in a source catalog to a "dataset" or "dataproduct".

* sibling however seems to say much more than having been prepared from 
the same original data (for a non native english reader all the 
definitions , citations, you can find  etc... speak about "brother or 
sister"). To my eyes it means that they also are from the the same 
"type" or "species". I am reluctant to say that a TimeSeries is the 
"sister" of a source record in a catalog. And the same would be for an 
image or a spectrum. On the other side, record to record, or image to 
image would be fine for sibling.

* Looking for a solution I turned to what kind of terms provenance data 
model proposes for these situations. I was looking only to the name of 
relationships, not to the name of rich content classes. I attach here a 
simple diagram view of the model. If the main item is seen as an entity 
the relationship towards its progenitor can be done in two ways either 
through the activity which generated it an used the progenitor entities 
or directly (a shortcut bypassing activity) by a WasDerivedFrom 
relationship. The good thing with that is that we already had "derived" 
as a semantics word in DataLink. And we had the reverse "progenitor". 
Hence my proposal  of "coderived" which has the advantage of letting us 
ignore which activity generated the two related "products".

* The granularity ( ie how many steps we may have between progenitor and 
products) is arbitrary and let to the choice of the provider.

* In these conditions I think that sibling could be seen   as a child 
term of coderived. "sibling" would be a "coderived" dataproduct of the 
same type than the main item. In such a way that if you want to 
associate a spectrum to a record it will be simply "coderived" while if 
you want to associate an image to another image produced in the same 
pipeline with different parameters it could be "sibling"

Cheers

François


Le 22/05/2020 à 17:17, Markus Demleitner a écrit :
> Dear TCG,
>
> After a fairly long review, here's VEP-003 (#sibling in datalink) for
> your review.  According to the Vocabularies in the VO 2 WD, it is up
> to the TCG to approve the new term -- or to send it back for further
> discussion.  It would be wonderful if we could pass a decision either
> way at the next meeting, so, without further ado:
>
> Vocabulary: http://ivoa.net/rdf/datalink/core
> Author: François Bonnarel, Markus Demleitner, msdemlei at ari.uni-heidelberg.de
> Date: 2019-12-06
> Supercedes: VEP-001
>
> New Term: sibling
> Action: Addition
> Label: Sibling Data
> Description: Data products derived from the same progenitor as #this.
>    This could be a lightcure for an object catalog derived from repeated
>    observations, the dataset processed using a different pipeline, or the
>    like.
> Used-in:
>    http://dc.g-vo.org/gaia/q2/tsdl/dlmeta?ID=ivo://org.gavo.dc/~?gaia/q2/199286482883072/BP
>    This is GAVO's rendition of the Gaia DR2 epoch photometry, where
>    users retrieve a time series in a specific band; the time series
>    in the other bands are the siblings of that.
>
> Rationale:
>    It is fairly common in complex pipelines that multiple data products
>    result from a single observation.  Often, this is true even in a
>    single pipeline step, and hence the data products are not in a
>    progenitor-derivation relationship.  Still, researchers will want to
>    know about these data products; for instance, while exploring a source
>    in Gaia, a quick way to access epoch photometry or the RP/BP spectra
>    is obviously valuable; such artefacts are not really progenitors of
>    the catalog entry, though.  In such cases, #sibling (or perhaps one of
>    its future child terms) should be used.
>
>    Clients should offer #sibling links in a context of scientific
>    exploitation of the dataset (as opposed to, say, debugging).
>
> Discussion:
>    In the discussion, it was the need for the concept as such ("other
>    things that were produced from the observations that led up to #this")
>    was not disputed, though the discussion was somewhat delayed by
>    an investigation of possible shortcomings in the datalink data model
>    (http://mail.ivoa.net/pipermail/dal/2019-December/008248.html) and
>    whether additional cases should or should not be included in it
>    (http://mail.ivoa.net/pipermail/dal/2020-February/008262.html).
>    
>    However, the main points of contention were the choice of the term and
>    label ("sibling").  Objections included that astronomers might not
>    understand the provenance-inspired nomenclature, that a very rough
>    view of provenance must be adopted to actually talk about siblings
>    (because, really, #this and the #sibling items just share common
>    ancestors, not (necessarily) the parents), or that it is confusing to
>    define, say, a spectrum to be the sibling of a catalogue row
>    (http://mail.ivoa.net/pipermail/semantics/2020-May/002700.html).
>
>    Possible alternatives investigated include #see-also (which was
>    rejected as being too general), #co-generated (which was disliked
>    because the implication that the two artefacts were built at the same
>    time by the same processing step is even stronger than with #sibling),
>    and #coderived (which wide acceptance but was strongly rejected by one
>    party arguing it would strongly distort the meaning of "derived".
>
>    In the end, #sibling was accepted as being acceptable and in use after
>    a splinter discussion during the May 2020 Virtual Interop.
>
> Thanks,
>
>          Markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/semantics/attachments/20200527/3b81bf07/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kkadembdenchomoa.png
Type: image/png
Size: 218600 bytes
Desc: not available
URL: <http://mail.ivoa.net/pipermail/semantics/attachments/20200527/3b81bf07/attachment-0001.png>


More information about the semantics mailing list