[VEP-0001] DataLink semantics vocabulary enhacement proposal

Mon Nov 4 14:56:51 CET 2019

Hi DAL,

On Tue, Oct 22, 2019 at 06:23:32PM +0200, François Bonnarel wrote:
> Le 22/10/2019 à 10:53, Markus Demleitner a écrit :
> > On Mon, Oct 21, 2019 at 05:38:32PM +0200, Petr Skoda wrote:
> > As far as I can see, there are two use cases in general for datalink
> > semantics:
> > 
> > (a) link filtering: The client, based on the semantics, selects a
[...]
> > 
> > (b) figure out what do do with a link: When Aladin implemented
> > datalink, they found that based on what's in a datalink row, they
> > didn't know how to deal with a link: they'd like to send spectra to
> > clients listening to spectrum.load.ssa-generic, images to those
> > listening to image.load.fits and so forth.  The datalink content_type
> > column isn't quite sufficient for this, because
> > application/x-votable+xml can be a spectrum or an object catalog,
> > whereas image/fits might be some kind of cube or a plain image (or an
> > IRAF spectrum, or still something else).  That's the "SAMP sending
> > use case" that, I think, was largely missed when we wrote datalink.
>
> Well, that's strange because from the beginning some of us (authors) had
> something like that in mind. Well not exactly "samp" but more generally.
> What will the client do with this link. Try to manage it herself and do

Be that as it may, the actual spec has failed to cover that use case
-- which is why we are here.

> > Having established this much, after a mail from Ada I had another of
> > my dangerous epiphanies.  That is, if we really want to deal with use
> > case (b) in semantics, we'll end up reproducing the distinction that
> > VEP-0001 proposes on in every branch: not only will we have
> > 
> > #associated-cube #associated-image #associated-radialvelocitycurve ...
> > 
> > but also
> > 
> > #derivation-cube #derivation-image #derivation-radialvelocitycurve ...
> > 
> > and (we've already seen use cases for that)
> > 
> > #progenitor-cube #progenitor-image #progenitor-radialvelocitycurve ...
>
> OK. This means that we are facing the three branches were the links targets
> to datasets or datasets exerpts.

I doubt it would be limited to these three; look at error-map, for
instance -- it stands to reason that error maps would, in general,
follow their "main" dataset's type, and hence you'd have

#error-cube #error-image #error-radialvelocitycurve...

I could make that point for noise and weight, again, and I suspect
for quite a few of the terms we may see in the future.

> > (3) Adding a dataproduct_type column in datalink.  If we started from
> > scratch, this is probably what I'd do.  As things are now... don't
> > know.  As for (2), this can start immediately (because datalink lets
> > you add extra columns), and at it would even have the advantage that
> > clients that don't parse media types would still understand
> > content_type.
> Well, some other people (Alberto for example) have asked for this. I'm
> reluctant because for most of the links this column will be unused (most of
> the links usecase are not "dataproducts" at all). In general I think we

That a column is empty for many links is not unusual in datalink (see
service_def and error_message in 1.0).  But also I suspect in most
datalink documents, the majority of links are actually "sendable" in
this sense: The progenitors and derivations of images and spectra, in
all likelihood, will be images and spectra again, as will #error,
#flat, #noise, #weight, and, of course, #this.

> should try to avoid adding columns in DataLink response and should try to
> keep it simple. And sepcialy when these columns come from another spec

About the simplicity, as someone wanting to put this stuff into pyVO,
my personal choice between

  Is semantics one of [#progenitor-image, #associated-image,
    #derviation-image, #noise-image, #bias-image, #dark-image, ...]?

and

  check the dataproduct_type column and, if there's a value, use that
  to determine the default SAMP destinations

is fairly clear (in particular because I'll need the second logic
for Obscore anyway).

The one big downside that I can see with the dataproduct_type column
is that datalink 1.0 services won't have it for a long time (though
of course you can always just add the column to a 1.0 service, too).

But then even with a semantics-based solution for the SAMP-sending
case, the clients would depend on operators adopting the new terms,
which I wouldn't expect to be instantaneous.

Again, I'd like to hear from Datalink producers and consumers what
they think.  Of for that, I'd still not count out the solution via
media type content paramenters; this would be mighty useful far
beyond Datalink...

        -- Markus