datalink-terms
Patrick Dowler
patrick.dowler at nrc-cnrc.gc.ca
Mon Oct 20 20:37:34 CEST 2014
I am in the process of updating our DataLink service to provide values
in the semantics column and this requires some work on the vocabulary.
1. During the RFC period we decided that semantics value was required
and that the vocabulary must therefore have a term for "self" or "this"
to describe links to the data itself.
#this and #self make sense "inside" the thing they refer to, but the ID
is in another field and is itself only an identifier, not the thing...
#data is pretty generic and indicative of "the data" no matter what type
The current draft has #sciencedata, probably coming from our CAOM
vocabulary as I described it back in Heidelberg... but CAOM doesn't have
any concept of "self" directly (you can infer it by looking at the
science or calibration tags on specific resources and seeing they belong
to a science or calibration "observation" (using observation loosely
here). I'm not sure #sciencedata is very useful in the vocabulary, but I
certainly would use #this instead of #science or #calibration to
describe our data.
When I look up "self" in a thesaurus, I see only human- or
creature-centric synonyms... I would probably go with #this. It is
somewhat a style issue and I like #this for the way it indicates/emphasizes.
2. In general, the draft vocabulary has a bunch if stuff that was
collected together during the exploration phase of DataLink -- to help
us understand the problem/context. Now, I think we should trim it down
to the bare essentials for immediate use. Providers are free to use
fully-qualified custom terms and to propose they be added later (say if
they can get some traction with their terms in the community). Right
now, I think the minimum we need:
#this
#auxiliary (/weight /error /noise) ("map" implies specific structure)
#calibration(/bias /dark /flat)
#preview (/image /plot)
#proc (/cutout)
note: #auxiliary seems to be a more commonly used term than #ancillary
The only new term I have introduced here is #proc to be used to describe
services that perform processing. I thought about #ssdp (Hi Markus!) but
acronyms don't look right here.
On actual usage, the /things above are terms in their own right that are
children of a parent concept (eg. #image is a child of #preview). IIRC,
this is how normal RDF vocabularies are done and if you want to find
parent relations in a machine-readable way you have the RDF file to do
that. So in usage we have a flat vocabulary with #terms. The complete
vocabulary I propose we start with in datalink/core is thus:
#this
#auxiliary (generic)
#weight
#error
#noise
#calibration (generic)
#bias
#dark
#flat
#preview (generic)
#image
#plot
#proc (generic)
#cutout
Really going for minimal needs so we don't commit to anything we don't
need right now and don't really understand. More terms can be added in a
lightweight DAL-WG process as is done for SAMP mtypes once we know what
we need and proven by usage.
If this looks OK, I can fix up the datalink-terms this week.
Comments?
--
Patrick Dowler
Canadian Astronomy Data Centre
National Research Council Canada
5071 West Saanich Road
Victoria, BC V9E 2E7
250-363-0044 (office) 250-363-0045 (fax)
More information about the dal
mailing list