datalink-terms

Patrick Dowler patrick.dowler at nrc-cnrc.gc.ca
Mon Oct 20 20:37:34 CEST 2014


I am in the process of updating our DataLink service to provide values 
in the semantics column and this requires some work on the vocabulary.

1. During the RFC period we decided that semantics value was required 
and that the vocabulary must therefore have a term for "self" or "this" 
to describe links to the  data itself.

#this and #self make sense "inside" the thing they refer to, but the ID 
is in another field and is itself only an identifier, not the thing...

#data is pretty generic and indicative of "the data" no matter what type

The current draft has #sciencedata, probably coming from our CAOM 
vocabulary as I described it back in Heidelberg... but CAOM doesn't have 
any concept of "self" directly (you can infer it by looking at the 
science or calibration tags on specific resources and seeing they belong 
to a science or calibration "observation" (using observation loosely 
here). I'm not sure #sciencedata is very useful in the vocabulary, but I 
certainly would use #this instead of #science or #calibration to 
describe our data.

When I look up "self" in a thesaurus, I see only human- or 
creature-centric synonyms... I would probably go with #this. It is 
somewhat a style issue and I like #this for the way it indicates/emphasizes.


2. In general, the draft vocabulary has a bunch if stuff that was 
collected together during the exploration phase of DataLink -- to help 
us understand the problem/context. Now, I think we should trim it down 
to the bare essentials for immediate use. Providers are free to use 
fully-qualified custom terms and to propose they be added later (say if 
they can get some traction with their terms in the community). Right 
now, I think the minimum we need:

#this

#auxiliary (/weight /error /noise) ("map" implies specific structure)

#calibration(/bias /dark /flat)

#preview (/image /plot)

#proc (/cutout)


note: #auxiliary seems to be a more commonly used term than #ancillary

The only new term I have introduced here is #proc to be used to describe 
services that perform processing. I thought about #ssdp (Hi Markus!) but 
acronyms don't look right here.

On actual usage, the /things above are terms in their own right that are 
children of a parent concept (eg. #image is a child of #preview). IIRC, 
this is how normal RDF vocabularies are done and if you want to find 
parent relations in a machine-readable way you have the RDF file to do 
that. So in usage we have a flat vocabulary with #terms. The complete 
vocabulary I propose we start with in datalink/core is thus:

#this
#auxiliary (generic)
#weight
#error
#noise
#calibration (generic)
#bias
#dark
#flat
#preview (generic)
#image
#plot
#proc (generic)
#cutout

Really going for minimal needs so we don't commit to anything we don't 
need right now and don't really understand. More terms can be added in a 
lightweight DAL-WG process as is done for SAMP mtypes once we know what 
we need and proven by usage.

If this looks OK, I can fix up the datalink-terms this week.

Comments?

-- 

Patrick Dowler
Canadian Astronomy Data Centre
National Research Council Canada
5071 West Saanich Road
Victoria, BC V9E 2E7

250-363-0044 (office) 250-363-0045 (fax)


More information about the dal mailing list