VEP-003 review
Markus Demleitner
msdemlei at ari.uni-heidelberg.de
Fri May 22 17:17:46 CEST 2020
Dear TCG,
After a fairly long review, here's VEP-003 (#sibling in datalink) for
your review. According to the Vocabularies in the VO 2 WD, it is up
to the TCG to approve the new term -- or to send it back for further
discussion. It would be wonderful if we could pass a decision either
way at the next meeting, so, without further ado:
Vocabulary: http://ivoa.net/rdf/datalink/core
Author: François Bonnarel, Markus Demleitner, msdemlei at ari.uni-heidelberg.de
Date: 2019-12-06
Supercedes: VEP-001
New Term: sibling
Action: Addition
Label: Sibling Data
Description: Data products derived from the same progenitor as #this.
This could be a lightcure for an object catalog derived from repeated
observations, the dataset processed using a different pipeline, or the
like.
Used-in:
http://dc.g-vo.org/gaia/q2/tsdl/dlmeta?ID=ivo://org.gavo.dc/~?gaia/q2/199286482883072/BP
This is GAVO's rendition of the Gaia DR2 epoch photometry, where
users retrieve a time series in a specific band; the time series
in the other bands are the siblings of that.
Rationale:
It is fairly common in complex pipelines that multiple data products
result from a single observation. Often, this is true even in a
single pipeline step, and hence the data products are not in a
progenitor-derivation relationship. Still, researchers will want to
know about these data products; for instance, while exploring a source
in Gaia, a quick way to access epoch photometry or the RP/BP spectra
is obviously valuable; such artefacts are not really progenitors of
the catalog entry, though. In such cases, #sibling (or perhaps one of
its future child terms) should be used.
Clients should offer #sibling links in a context of scientific
exploitation of the dataset (as opposed to, say, debugging).
Discussion:
In the discussion, it was the need for the concept as such ("other
things that were produced from the observations that led up to #this")
was not disputed, though the discussion was somewhat delayed by
an investigation of possible shortcomings in the datalink data model
(http://mail.ivoa.net/pipermail/dal/2019-December/008248.html) and
whether additional cases should or should not be included in it
(http://mail.ivoa.net/pipermail/dal/2020-February/008262.html).
However, the main points of contention were the choice of the term and
label ("sibling"). Objections included that astronomers might not
understand the provenance-inspired nomenclature, that a very rough
view of provenance must be adopted to actually talk about siblings
(because, really, #this and the #sibling items just share common
ancestors, not (necessarily) the parents), or that it is confusing to
define, say, a spectrum to be the sibling of a catalogue row
(http://mail.ivoa.net/pipermail/semantics/2020-May/002700.html).
Possible alternatives investigated include #see-also (which was
rejected as being too general), #co-generated (which was disliked
because the implication that the two artefacts were built at the same
time by the same processing step is even stronger than with #sibling),
and #coderived (which wide acceptance but was strongly rejected by one
party arguing it would strongly distort the meaning of "derived".
In the end, #sibling was accepted as being acceptable and in use after
a splinter discussion during the May 2020 Virtual Interop.
Thanks,
Markus
More information about the semantics
mailing list