[VEP-0001] DataLink semantics vocabulary enhacement proposal

Tue Oct 22 15:12:44 CEST 2019

Hi Laurent,

On Tue, Oct 22, 2019 at 12:21:50PM +0200, Laurent MICHEL wrote:
> I agree that we must be careful not to put anything in the VO vocabulary,
> and that the endorsed vocabulary must correspond to concrete use-cases.
> 
> In fact, the VEP0001 list is compliant with this policy since it just sets a
> list of science products currently used in astronomy and then potentially
> tagged with VO vocabulary.

Hm -- my immediate question has been not so much about the
granularity than whether use case (b) ("to which SAMP client should I
send the data") is within the scope of the datalink semantics
vocabulary or should rather be dealt with in a separate column or
using content_type.

Is what you're saying here a vote for option (3) in my mail (below --
top-posting makes this a bit clumsy)?

> Typically, the "associated-timeseries" is too coarse, since it just tell
> that the associated data-set is set of (timestamp + anything) points.
> Telling more about what "anything" is will obviously help the clients.

Well... Will it?  I wonder how -- as Petr says, it's likely (and
certainly currently the case) that time series tools don't look too
closely at the observables; and we don't even have a SAMP mtype for
time series yet, let alone sub-types of time series.

So... how would a client exploit this extra information?

> Let's take my use case. The GRB monitor (SVOM) I'm working on will deliver
> TS of spectra. I suppose that generalist clients as Splat would be happy to
> be notified that this dataset must not be plot as a simple light curve.

But isn't that an argument for *not* distinguishing more deeply
(which, incidentally, is in line with current obscore; that's a clear
bonus for me).

[The pieces of the mail that I think Laurent referred to:]

> Le 22/10/2019 à 10:53, Markus Demleitner a écrit :
> > As far as I can see, there are two use cases in general for datalink
> > semantics:
> > 
> > (a) link filtering: The client, based on the semantics, selects a
> > subset of the links provided to present to its users -- for instance,
> > calibration data will not be shown outside of a debugging session.
> > Or they're just used for grouping.  This was, I think, the original
> > use case that triggered the introduction of the semantics column.
> > 
> > (b) figure out what do do with a link: When Aladin implemented
> > datalink, they found that based on what's in a datalink row, they
> > didn't know how to deal with a link: they'd like to send spectra to
> > clients listening to spectrum.load.ssa-generic, images to those
> > listening to image.load.fits and so forth.  The datalink content_type
> > column isn't quite sufficient for this, because
> > application/x-votable+xml can be a spectrum or an object catalog,
> > whereas image/fits might be some kind of cube or a plain image (or an
> > IRAF spectrum, or still something else).  That's the "SAMP sending
> > use case" that, I think, was largely missed when we wrote datalink.
> > 
[...]
> > my dangerous epiphanies.  That is, if we really want to deal with use
> > case (b) in semantics, we'll end up reproducing the distinction that
> > VEP-0001 proposes on in every branch: not only will we have
> > 
> > #associated-cube #associated-image #associated-radialvelocitycurve ...
> > 
> > but also
> > 
> > #derivation-cube #derivation-image #derivation-radialvelocitycurve ...
> > 
> > and (we've already seen use cases for that)
> > 
> > #progenitor-cube #progenitor-image #progenitor-radialvelocitycurve ...
> > 
[...]
> > I can see three options:
> > 
> > (1) The semantics column -- the consequences I've described above.
> > No disaster, but certainly ugly.
> > 
> > (2) The datalink content_type column.  As said above, media types
> > don't quite work out of the box, because dataproduct types and media
> > types don't really map onto each other.  However, RFC 6838 media
> > types have structure: You can add parameters.  We already exploit
> > this in datalink to say that datalink documents should come with a
> > media type of application/x-votable+xml;content=datalink.
> > 
> > What if we just said, in datalink: "Whereever possible, the
> > content_type should indicate the dataproduct type communicated, using
> > a content parameter taken from the vocabulary associated with obscore
> > dataproduct_type.  For instance, a spectrum in a VOTable would have
> > application/x-votable+xml;content=spectrum, whereas some kind of cube
> > in a FITS serialisation would be application/fits;content=cube."
> > 
> > We can immediately start doing this; there's strings attached,
> > though, in that I doubt too many clients parse media types at this
> > point, and these might become confused it we did this.
> > 
> > (3) Adding a dataproduct_type column in datalink.  If we started from
> > scratch, this is probably what I'd do.  As things are now... don't
> > know.  As for (2), this can start immediately (because datalink lets
> > you add extra columns), and at it would even have the advantage that
> > clients that don't parse media types would still understand
> > content_type.

            -- Markus