VEP-003 review
François Bonnarel
francois.bonnarel at astro.unistra.fr
Wed May 27 19:34:56 CEST 2020
Dear colleagues,
Well I have a little concern to be considered "author" of a proposal I
am not agreeing fully with.
Let's try to explain that and propose a compromise.
* I fully agree with the definition, rationale, etc... This has been a
progress to distinguish use cases where the "linked" item has the same
progenitor than the main item from use cases where the linked item is
simply a counterpart.
* Gaia use case is one among others use cases where providers wanted to
associated a record in a source catalog to a "dataset" or "dataproduct".
* sibling however seems to say much more than having been prepared from
the same original data (for a non native english reader all the
definitions , citations, you can find etc... speak about "brother or
sister"). To my eyes it means that they also are from the the same
"type" or "species". I am reluctant to say that a TimeSeries is the
"sister" of a source record in a catalog. And the same would be for an
image or a spectrum. On the other side, record to record, or image to
image would be fine for sibling.
* Looking for a solution I turned to what kind of terms provenance data
model proposes for these situations. I was looking only to the name of
relationships, not to the name of rich content classes. I attach here a
simple diagram view of the model. If the main item is seen as an entity
the relationship towards its progenitor can be done in two ways either
through the activity which generated it an used the progenitor entities
or directly (a shortcut bypassing activity) by a WasDerivedFrom
relationship. The good thing with that is that we already had "derived"
as a semantics word in DataLink. And we had the reverse "progenitor".
Hence my proposal of "coderived" which has the advantage of letting us
ignore which activity generated the two related "products".
* The granularity ( ie how many steps we may have between progenitor and
products) is arbitrary and let to the choice of the provider.
* In these conditions I think that sibling could be seen as a child
term of coderived. "sibling" would be a "coderived" dataproduct of the
same type than the main item. In such a way that if you want to
associate a spectrum to a record it will be simply "coderived" while if
you want to associate an image to another image produced in the same
pipeline with different parameters it could be "sibling"
Cheers
François
Le 22/05/2020 à 17:17, Markus Demleitner a écrit :
> Dear TCG,
>
> After a fairly long review, here's VEP-003 (#sibling in datalink) for
> your review. According to the Vocabularies in the VO 2 WD, it is up
> to the TCG to approve the new term -- or to send it back for further
> discussion. It would be wonderful if we could pass a decision either
> way at the next meeting, so, without further ado:
>
> Vocabulary: http://ivoa.net/rdf/datalink/core
> Author: François Bonnarel, Markus Demleitner, msdemlei at ari.uni-heidelberg.de
> Date: 2019-12-06
> Supercedes: VEP-001
>
> New Term: sibling
> Action: Addition
> Label: Sibling Data
> Description: Data products derived from the same progenitor as #this.
> This could be a lightcure for an object catalog derived from repeated
> observations, the dataset processed using a different pipeline, or the
> like.
> Used-in:
> http://dc.g-vo.org/gaia/q2/tsdl/dlmeta?ID=ivo://org.gavo.dc/~?gaia/q2/199286482883072/BP
> This is GAVO's rendition of the Gaia DR2 epoch photometry, where
> users retrieve a time series in a specific band; the time series
> in the other bands are the siblings of that.
>
> Rationale:
> It is fairly common in complex pipelines that multiple data products
> result from a single observation. Often, this is true even in a
> single pipeline step, and hence the data products are not in a
> progenitor-derivation relationship. Still, researchers will want to
> know about these data products; for instance, while exploring a source
> in Gaia, a quick way to access epoch photometry or the RP/BP spectra
> is obviously valuable; such artefacts are not really progenitors of
> the catalog entry, though. In such cases, #sibling (or perhaps one of
> its future child terms) should be used.
>
> Clients should offer #sibling links in a context of scientific
> exploitation of the dataset (as opposed to, say, debugging).
>
> Discussion:
> In the discussion, it was the need for the concept as such ("other
> things that were produced from the observations that led up to #this")
> was not disputed, though the discussion was somewhat delayed by
> an investigation of possible shortcomings in the datalink data model
> (http://mail.ivoa.net/pipermail/dal/2019-December/008248.html) and
> whether additional cases should or should not be included in it
> (http://mail.ivoa.net/pipermail/dal/2020-February/008262.html).
>
> However, the main points of contention were the choice of the term and
> label ("sibling"). Objections included that astronomers might not
> understand the provenance-inspired nomenclature, that a very rough
> view of provenance must be adopted to actually talk about siblings
> (because, really, #this and the #sibling items just share common
> ancestors, not (necessarily) the parents), or that it is confusing to
> define, say, a spectrum to be the sibling of a catalogue row
> (http://mail.ivoa.net/pipermail/semantics/2020-May/002700.html).
>
> Possible alternatives investigated include #see-also (which was
> rejected as being too general), #co-generated (which was disliked
> because the implication that the two artefacts were built at the same
> time by the same processing step is even stronger than with #sibling),
> and #coderived (which wide acceptance but was strongly rejected by one
> party arguing it would strongly distort the meaning of "derived".
>
> In the end, #sibling was accepted as being acceptable and in use after
> a splinter discussion during the May 2020 Virtual Interop.
>
> Thanks,
>
> Markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/semantics/attachments/20200527/3b81bf07/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kkadembdenchomoa.png
Type: image/png
Size: 218600 bytes
Desc: not available
URL: <http://mail.ivoa.net/pipermail/semantics/attachments/20200527/3b81bf07/attachment-0001.png>
More information about the semantics
mailing list