VEP-003 review

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Fri May 22 17:17:46 CEST 2020


Dear TCG,

After a fairly long review, here's VEP-003 (#sibling in datalink) for
your review.  According to the Vocabularies in the VO 2 WD, it is up
to the TCG to approve the new term -- or to send it back for further
discussion.  It would be wonderful if we could pass a decision either
way at the next meeting, so, without further ado:

Vocabulary: http://ivoa.net/rdf/datalink/core
Author: François Bonnarel, Markus Demleitner, msdemlei at ari.uni-heidelberg.de
Date: 2019-12-06
Supercedes: VEP-001

New Term: sibling
Action: Addition
Label: Sibling Data
Description: Data products derived from the same progenitor as #this.
  This could be a lightcure for an object catalog derived from repeated
  observations, the dataset processed using a different pipeline, or the
  like.
Used-in: 
  http://dc.g-vo.org/gaia/q2/tsdl/dlmeta?ID=ivo://org.gavo.dc/~?gaia/q2/199286482883072/BP
  This is GAVO's rendition of the Gaia DR2 epoch photometry, where
  users retrieve a time series in a specific band; the time series
  in the other bands are the siblings of that.

Rationale: 
  It is fairly common in complex pipelines that multiple data products
  result from a single observation.  Often, this is true even in a
  single pipeline step, and hence the data products are not in a
  progenitor-derivation relationship.  Still, researchers will want to
  know about these data products; for instance, while exploring a source
  in Gaia, a quick way to access epoch photometry or the RP/BP spectra
  is obviously valuable; such artefacts are not really progenitors of
  the catalog entry, though.  In such cases, #sibling (or perhaps one of
  its future child terms) should be used.

  Clients should offer #sibling links in a context of scientific
  exploitation of the dataset (as opposed to, say, debugging).

Discussion:
  In the discussion, it was the need for the concept as such ("other
  things that were produced from the observations that led up to #this")
  was not disputed, though the discussion was somewhat delayed by
  an investigation of possible shortcomings in the datalink data model
  (http://mail.ivoa.net/pipermail/dal/2019-December/008248.html) and
  whether additional cases should or should not be included in it
  (http://mail.ivoa.net/pipermail/dal/2020-February/008262.html).
  
  However, the main points of contention were the choice of the term and
  label ("sibling").  Objections included that astronomers might not
  understand the provenance-inspired nomenclature, that a very rough
  view of provenance must be adopted to actually talk about siblings
  (because, really, #this and the #sibling items just share common
  ancestors, not (necessarily) the parents), or that it is confusing to
  define, say, a spectrum to be the sibling of a catalogue row
  (http://mail.ivoa.net/pipermail/semantics/2020-May/002700.html).

  Possible alternatives investigated include #see-also (which was
  rejected as being too general), #co-generated (which was disliked
  because the implication that the two artefacts were built at the same
  time by the same processing step is even stronger than with #sibling),
  and #coderived (which wide acceptance but was strongly rejected by one
  party arguing it would strongly distort the meaning of "derived".

  In the end, #sibling was accepted as being acceptable and in use after
  a splinter discussion during the May 2020 Virtual Interop.

Thanks,

        Markus


More information about the semantics mailing list