WD-DataLink-1.0

Mon Dec 9 05:42:23 PST 2013

Dear DAL list,

On Mon, Dec 02, 2013 at 03:04:12PM -0800, Patrick Dowler wrote:
> On 11/13/2013 01:18 PM, Arnold Rots wrote:
> >Actually, my reservation about the list is that it is precisely someone's
> >known/custom/esoteric metadata set.
> >It takes a very narrow view of the calibration scene.
> >It works nicely for optical imaging data, but that's about it.
> >Radio, high energy, and generally spectral data are poorly served.
> 
> Which list exactly do you have reservations about? The one in the WD
> has only a few very general values:
> 
> science
> calibration
> preview
> info
> auxiliary
> 
> and is complete at least in the sense that you can throw anything
> into the last category :-)
> 
> The above list is broadly applicable. Other proposals are to add more
> detail, which inherently brings more
> engineering/energy-domain-specific kinds of terms into play.

Well, possibly.  On the other hand, there are some fairly generic
terms that might help having useful information in there; candidates
below.

> Can we just live with the above categories for DataLink-1.0? People
> that implement a DataLink service are free to add additional columns
> to the response if the need to add details... we could see from
> experience if additional detail was needed and feasible and add
> something later.

...except that having to deal with all kinds of custom extensions is
going to suck for clients.  If we do this, we should at least say
what column these "custom" terms reside in.  The ideal way to allow
extensions would, of course, again be a machine-readable,
tree-organised vocabulary ("thesaurus") that would let a client
figure out that "visibility" is some sort of "predecessor".  But
unless someone raises their arm to figure out the technicalities I'd
say we should make do with pedestrian tech rather than rocket
science.

So, here's what other categories I'd like to see:

* "self" (if you don't like that: "dataset") for a link to the full, 
  unprocessed dataset itself.
* "source" (if you don't like that: "raw" or "predecessor") for datasets
  that were used in the generation of the dataset (but not, as
  e.g., calibration would, in the generation of its siblings),
* "access" for cutout, rebinning, or similar "accessData"-type
  services
* "qa" for files containing errors and the like, where those are not
  part of the dataset itself (as is the case, for example, for CALIFA
  DR1)

I've tried to make a bit of sense of the terms proposed so far for
semantcis in the DaCHS developer documentation, see
http://docs.g-vo.org/DaCHS/ref.html#link-definitions, scroll a bit
down.  

If people agree some explanations for the terms are in order (and on
what I've made up, and explain to me what "info" is), I'll happily
de-DaCHS that list for inclusion in the spec.

Cheers,

        Markus