Datalink vocabulary additions
Mireille Louys
mireille.louys at unistra.fr
Mon Jun 6 18:22:17 CEST 2016
Hi Alberto, Hi all,
I agree it is important to check with existing vocabularies.
at some point we will need to decide the scope of such an approach, what
level of information we need to convey in the 'semantics' field.
The Provenance W3C also has some terms dedicated on the links from one
'document' ( called an entity) to its progenitors.
'wasDerivedFrom' is a relation between two entities, like
<isDerivedFrom> in DataCite.
Below my comments in text .
Cheers, Mireille
Le 03/06/2016 à 22:28, Accomazzi, Alberto a écrit :
> Hi Markus,
>
> Thanks for resurrecting the topic. I have a few comments below, but
> first as a meta-comment, should we consider reusing, when possible,
> the relationTypes found in the DataCite schema
> (http://schema.datacite.org/meta/kernel-3/doc/DataCite-MetadataKernel_v3.1.pdf)?
> This may be useful for two reasons: it makes our life easier if and
> when the day comes to cross-walk between DataLink and DataCite and,
> perhaps more importantly, it borrows semantics generated by a
> cross-disciplinary community of practitioners who had to come up with
> schemas for describing data, which is really what we are trying to do
> here. I'll use some of them below.
>
>
> On Thu, Jun 2, 2016 at 7:38 AM, Markus Demleitner
> <msdemlei at ari.uni-heidelberg.de
> <mailto:msdemlei at ari.uni-heidelberg.de>> wrote:
>
>
> (1) I'd like to have a term for larger chunks of metadata in separate
> files. I'd need that to link to observation logs, but I could also
> see logs a pipeline has written, or an extensive provenance, or
> similar.
>
> Proposed term(s): #metadata? #documentation? (as a child of
> #auxiliary, I guess)
>
>
> I dislike both terms you suggest because they sound so general that
> they could be used for most anything. But if we have to stay general
> because of the potentially different types of resources we need to
> point to, how about #Documents?
yes , I think it is very general.
>
>
>
> (2) I'd like to have a term for things like a rebinned (higher S/N)
> version of the dataset, or perhaps the data in a different
> waveband on a
> multi-band instrument, or the same observation with a different
> instrument setup (as in V500/COMB vs. V1200 in Califa), etc.
> Essentially:
> Science data that was obtained "together with" #this but that's not
> identical with #this.
>
> Proposed term(s): #science? (but that's a bit too broad) #alternate?
> (as a child of #this?)
>
>
> maybe #isVariantFormOf or #isOriginalFormOf
all three examples proposed here point to different datasets: the
measured values have been obtained with specific settings
or transformed from some original dataset, so to me these are different
'entities' in the Provenance world.
so rather
case 1 & 3 : <isDerivedFrom> as a role and some term to qualify how it
is derived , as a sub-category : #cutout, #regrid
case 2: I would propose <?siblingOf?> a relation like "sibling",
related to the same observation but offering different physical
properties .
this helps to browse sister/brother datasets in the observation-dataset
genealogy.
>
> (3) I'd like to have a term for a different representation of the same
> dataset, e.g., a spectrum that was originally a FITS image
> formatted as
> a FITS table, an SDM VOTable, or a CSV file (where of course the SDM
> VOTable would be the #this). Essentially, the same data as #this
> modulo
> the different expressivenesses of container formats.
>
> Proposed term(s): #alt-format? (as a child of #this?)
>
>
> #isVariantFormOf or #isOriginalFormOf
yes, exactly same content but different representation. I agree.
>
>
> (4) I'd like to have a term for a previous version of a dataset.
> I have
> that in califa, where I'd like to have *some* way to get DR1 and DR2
> data, but I really don't want to clutter all-VO SSA or obscore
> searches
> with these guys. So, I'm adding links to old files (where they exist)
> in datalink results for new files. This isn't really #progenitor,
> since
> the old files aren't in the provenance chain of the new files
> (which are
> generated from yet other data files). It's... well, a previous
> version,
> and hence I'd like to see
>
> Proposed term: #previous-version (as child of #auxiliary?)
>
>
> we should be careful with the semantics that DataCite assigns to these
> but #isPreviousVersionOf and #isNewVersionOf might be appropriate here
agreed
>
>
> That concludes the proposed concepts for this time; #fault from the
> original proposals I've dropped. One other thing I'd like:
>
> (5) #proc currently has "Server-side data processing result" as its
> explanation. What really is in such datalink rows is, I submit,
> better described by "reference to a server-side processing service"
> -- so, can we change that explanation?
>
>
Again , this processing-service is considered as an Activity in
Provenance DM .
I think it is worth then to look also in the PROV-W3C ontology and see
if we can combine terms .
My vague understanding is that we address the same problem with
different tools.
Probably we need to clarify the coverage of each on the structure side
(DM) and on the semantic side (Vocabulary) .
Thoughts?
> No objection here.
>
> Thanks,
> -- Alberto
>
>
> Opinions? Proposals for sharper descriptions, better terms? Any
> contributions are welcome.
>
> Thanks,
>
> Markus
>
>
> [1] http://mail.ivoa.net/pipermail/semantics/2015-November/002495.html
>
>
>
>
> --
> Dr. Alberto Accomazzi
> Principal Investigator
> NASA Astrophysics Data System - http://ads.harvard.edu
> Harvard-Smithsonian Center for Astrophysics - http://www.cfa.harvard.edu
> 60 Garden St, MS 83, Cambridge, MA 02138, USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/semantics/attachments/20160606/6be53629/attachment-0001.html>
More information about the semantics
mailing list