[VEP-003]: datalink/core#sibling

Mark Taylor M.B.Taylor at bristol.ac.uk
Fri Dec 6 22:14:21 CET 2019


Ada, Francois and others,

this distinction of 4 levels looks quite helpful.
I haven't really followed the details of the discussions, some of
which require more knowledge of the scientific and data landscape
than I have, but from my applications point of view, I'd say:

  - level 0 and 1 are the ones I'm interested in; these tell me what
    I can do with the data products, or what options I can sensibly
    offer to users.  So it would be nice to have them in some kind
    of machine readable form.

  - level 2 and 3 look like human-directed documentation items:
    they may certainly be an important part of the metadata, but
    I'd expect an application to present that information to the
    user as is, without further analysis or manipulation.

In terms of how this information is encoded in datalink I'm fairly
agnostic.  A separate datalink column for each item (level) is
probably going to be easiest to handle, but other arrangements
are probably OK as long as it's going to be possible to extract
the relevant information, especially the parts that need to be
machine readable (level 0+1) in a well-defined way.

Mark

On Fri, 6 Dec 2019, ada nebot wrote:

> Hi All, 
> 
> As I see it, the things we are discussing concerning Datalink fall into 4 independent levels or categories: 
> Level 0 - Data-format (fits, VOTable, PDF, png, …)
> Level 1 - Data-type (tabular, image, spectrum, cube, text, …)
> Level 2 - Data-information (Documentation, Calibration, Log, Preview, …)
> Level 3 - Data-relation (Derived from, Progenitor of, Sibling of, ...)
> 
> I see these as orthogonal levels since a **list of links** can be of any type (level 1) with any kind of format (level 0), 
> any kind of relation (level 3) and could have any type of associated information to describe it (level 2).   
> 
> Today the list of links returned by datalink is described in the columns content-type and semantics. 
> These two columns cover the above levels only up to some degree.   
> - Content-type: covers level 0 mainly, with some exceptions such as VOTable (which is also level 1). 
> - Semantics: covers level 2 mainly (e.g. preview), but also level 3 (e.g. derivation, progenitor). 
> 
> Datalink at the moment has no field properly covering level 1 and applications (—> users) would benefit from having that well covered. 
> 
> So, in my opinion, if I had to redo Datalink I would keep these different levels separated instead of putting everything into the semantics field. 
> But applications might have a different point of view here —> Shouldn't we add Apps to this discussion? 
> 
> Timeseries would be in level 3, since it is a relation. And I don’t think we would need the use of sibling or progenitor or anything like that for timeseries. 
> What we need is to be able to say is: 
> - This list of links are timeseries of tabular type 
> - This list of links are timeseries of spectrum type
>> 
> But if were to add terms such as sibling and so on, there is already an IVOA relationship vocabulary: 
> http://ivoa.net/rdf/voresource/relationship_type/2016-08-17/relationship_type.html <http://ivoa.net/rdf/voresource/relationship_type/2016-08-17/relationship_type.html>
> 
> Comments? 
> 
> Cheers,
> Ada
> 
> 
> --
> Astronome Adjointe
> CDS, Observatoire Astronomique de Strasbourg (ObAS)
> UMR 7550 Universite de Strasbourg 
> 11, rue de l'Universite, F-67000 Strasbourg
> 
> > On 6 Dec 2019, at 11:27, Markus Demleitner <msdemlei at ari.uni-heidelberg.de> wrote:
> > 
> > Dear DAL, dear Semantics,
> > 
> > Following the discussions on VEP-001, its author François agreed to
> > first try with a VEP that only has the core term and to defer the
> > child terms intended for SAMP resolution until we've investigated
> > alternative solutions for that requirement in datalink.  Also, Pat's
> > remark that all data in a datalink document better be "associated"
> > suggested a change of terms.
> > 
> > Therefore, we've retired VEP-001 (there's a discussion of the details
> > in
> > https://volute.g-vo.org/svn/trunk/projects/semantics/veps/VEP-001.txt).
> > 
> > Instead, there's now VEP-003:
> > 
> > ocabulary: http://ivoa.net/rdf/datalink/core
> > Author: François Bonnarel, Markus Demleitner, msdemlei at ari.uni-heidelberg.de
> > Date: 2019-12-06
> > Supercedes: VEP-001
> > 
> > New Term: sibling
> > Action: Addition
> > Label: Sibling Data
> > Description: Data products derived from the same progenitor as #this.
> >  This could be a lightcure for an object catalog derived from repeated
> >  observations, the dataset processed using a different pipeline, or the
> >  like.
> > Used-in: 
> >  http://dc.g-vo.org/gaia/q2/tsdl/dlmeta?ID=ivo://org.gavo.dc/~?gaia/q2/199286482883072/BP
> >  This is GAVO's rendition of the Gaia DR2 epoch photometry, where
> >  users retrieve a time series in a specific band; the time series
> >  in the other bands are the siblings of that.
> > 
> > Rationale: 
> >  It is fairly common in complex pipelines that multiple data products
> >  result from a single observation.  Often, this is true even in a
> >  single pipeline step, and hence the data products are not in a
> >  progenitor-derivation relationship.  Still, researchers will want to
> >  know about these data products; for instance, while exploring a source
> >  in Gaia, a quick way to access epoch photometry or the RP/BP spectra
> >  is obviously valuable; such artefacts are not really progenitors of
> >  the catalog entry, though.  In such cases, #sibling (or perhaps one of
> >  its future child terms) should be used.
> > 
> >  Clients should offer #sibling links in a context of scientific
> >  exploitation of the dataset (as opposed to, say, debugging).
> > 
> > 
> > Opinions?  Comments?  
> > 
> > I'd suggest to keep this discusson on the DAL list.
> > 
> > Thanks,
> > 
> >           Markus
> 
> 

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/


More information about the dal mailing list