Datalink vocabulary additions

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Mon Jun 6 10:07:46 CEST 2016


Hi Alberto,

On Fri, Jun 03, 2016 at 04:28:03PM -0400, Accomazzi, Alberto wrote:
> Thanks for resurrecting the topic.  I have a few comments below, but first
> as a meta-comment, should we consider reusing, when possible, the
> relationTypes found in the DataCite schema (
> http://schema.datacite.org/meta/kernel-3/doc/DataCite-MetadataKernel_v3.1.pdf)?

Excellent point -- routine-blinded as I am, I've only considered
those for "resources" in the Registry sense, but of course you're
right, they should work for the datalink things, too.

Of course, I'm not sure how I'd phrase this.  If we were to adopt
all/a majority of their terms, we'd undertake a hierarchical
organisation of their terms (including things like "IsPartOf", which
would, I think, subsume most of the existing datalink terms), and I
think they'd not like this.

So, if we restrict to stealing what we need...

> On Thu, Jun 2, 2016 at 7:38 AM, Markus Demleitner <
> msdemlei at ari.uni-heidelberg.de> wrote:
> > (1) I'd like to have a term for larger chunks of metadata in separate
> > files.  I'd need that to link to observation logs, but I could also
> > see logs a pipeline has written, or an extensive provenance, or
> > similar.
> >
> > Proposed term(s): #metadata?  #documentation?  (as a child of
> > #auxiliary, I guess)
> >
> 
> I dislike both terms you suggest because they sound so general that they
> could be used for most anything.  But if we have to stay general because of
> the potentially different types of resources we need to point to, how about
> #Documents?

Well, datacite offers #IsMetadataFor, which, as a top-level child of
#auxiliary, would work nicely for me.  It's not much more specific,
though.

Can we find something less generic?  In my example of the observation
log, is there a term between "Metadata" and "observation log" in some
conceivable thesaurus?  (I've not found anything pertinent the IVOA
thesaurus -- but then it deosn't even have metadata...).

> > (2) I'd like to have a term for things like a rebinned (higher S/N)
> > version of the dataset, or perhaps the data in a different waveband on a
> > multi-band instrument, or the same observation with a different
> > instrument setup (as in V500/COMB vs.  V1200 in Califa), etc.  Essentially:
> > Science data that was obtained "together with" #this but that's not
> > identical with #this.
> >
> > Proposed term(s): #science? (but that's a bit too broad)  #alternate?
> >   (as a child of #this?)
> >
> 
> maybe #isVariantFormOf or #isOriginalFormOf

#isVariantFormOf I like.  I think #isOriginalFormForm collides with
the existing #progenitor.  That would be another datacite term I'd be
reluctant to just accept.

> > (3) I'd like to have a term for a different representation of the same
> > dataset, e.g., a spectrum that was originally a FITS image formatted  as
> > a FITS table, an SDM VOTable, or a CSV file (where of course the SDM
> > VOTable would be the #this).  Essentially, the same data as #this modulo
> > the different expressivenesses of container formats.
> >
> > Proposed term(s): #alt-format?  (as a child of #this?)
> >
> 
> #isVariantFormOf or #isOriginalFormOf

Ok, #isVariantFormOf would again work.

> > (4) I'd like to have a term for a previous version of a dataset.  I have
> > that in califa, where I'd like to have *some* way to get DR1 and DR2
> > data, but I really don't want to clutter all-VO SSA or obscore searches
> > with these guys.  So, I'm adding links to old files (where they exist)
> > in datalink results for new files.  This isn't really #progenitor, since
> > the old files aren't in the provenance chain of the new files (which are
> > generated from yet other data files).  It's... well, a previous version,
> > and hence I'd like to see
> >
> > Proposed term: #previous-version (as child of #auxiliary?)
> >
> 
> we should be careful with the semantics that DataCite assigns to these but
> #isPreviousVersionOf and #isNewVersionOf might be appropriate here

They would match my use case nicely.  For the reverse relationship
I'd have used *#science (where "*" means "doesn't exist"), but I
guess Datacite is right and we should have #isPreviousVersionOf, too.

Cheers,

          Markus


More information about the semantics mailing list