[VEP-0001] DataLink semantics vocabulary enhacement proposal

Wed Nov 6 18:14:27 CET 2019

I agree with Markus' analysis, re-iterating I think the main points:

1. associated-data: although the term itself if quite redundant (all links
are "associated" in datalink by definition) the concept of "sibling" data
is sound: other data (of the same target?). To be clear, I think Markus is
thinking that something is one of progenitor, derivation, or sibling. I'd
like to find the best word for this but I like it.

To check interpretation, I like to see if the tuple {link} {semantics} {ID}
can sensibly be spoken as a sentence (with some filler articles):

http://example.net/foo is-a-spectrum-of blah:123

In that sense, it seems one can use dataproduct_type(s) to describe a
relationship between a resource and an identified thing.

2. At the same time, the more SAMP-like use case of driving actions is
depending on knowing what the resource at the end of the access_url *is*,
not what the relationship is. That sounds more like a job for content-type
or a new column and not for semantics. It's also potentially orthogonal to
semantics (which I think gives rise to the explosion in number of terms
Markus' mentioned). Given that the current range of content types we work
with (application/fits, text/x-votable+xml, application/x-hdf5, eg) don't
say much of anything about the content to expect, parameterising like we do
with content=datalink is a pretty straightforward solution. I think this
works and conveys more information to clients independent of other
enhancements e might make to the vocabulary or datalink spec.
It could generally be a good thing to do wherever content-type is conveyed
(ObsCore access_format, DataLink content_type, http Content-Type headers,
etc).

As an aside, I have been thinking about how to enable semantics to contain
multiple tags. I have a few use cases where it would be nice to do that --
not sure how great an idea it is though. One thing it does is it more or
less removes the need/desire to produce very similiar looking trees of
terms with different root terms. I intend to create a VOTable issue explore
how exactly to convey a "bag of terms" in a single table cell and a
DataLink issue to explore multiple semantics tags. I wanted to mention it
here in case it tweaks someone's imagination and because it seems
peripherally related.

--
Patrick Dowler
Canadian Astronomy Data Centre
Victoria, BC, Canada

On Mon, 4 Nov 2019 at 05:57, Markus Demleitner <
msdemlei at ari.uni-heidelberg.de> wrote:

> Hi DAL,
>
> On Tue, Oct 22, 2019 at 06:23:32PM +0200, François Bonnarel wrote:
> > Le 22/10/2019 à 10:53, Markus Demleitner a écrit :
> > > On Mon, Oct 21, 2019 at 05:38:32PM +0200, Petr Skoda wrote:
> > > As far as I can see, there are two use cases in general for datalink
> > > semantics:
> > >
> > > (a) link filtering: The client, based on the semantics, selects a
> [...]
> > >
> > > (b) figure out what do do with a link: When Aladin implemented
> > > datalink, they found that based on what's in a datalink row, they
> > > didn't know how to deal with a link: they'd like to send spectra to
> > > clients listening to spectrum.load.ssa-generic, images to those
> > > listening to image.load.fits and so forth.  The datalink content_type
> > > column isn't quite sufficient for this, because
> > > application/x-votable+xml can be a spectrum or an object catalog,
> > > whereas image/fits might be some kind of cube or a plain image (or an
> > > IRAF spectrum, or still something else).  That's the "SAMP sending
> > > use case" that, I think, was largely missed when we wrote datalink.
> >
> > Well, that's strange because from the beginning some of us (authors) had
> > something like that in mind. Well not exactly "samp" but more generally.
> > What will the client do with this link. Try to manage it herself and do
>
> Be that as it may, the actual spec has failed to cover that use case
> -- which is why we are here.
>
> > > Having established this much, after a mail from Ada I had another of
> > > my dangerous epiphanies.  That is, if we really want to deal with use
> > > case (b) in semantics, we'll end up reproducing the distinction that
> > > VEP-0001 proposes on in every branch: not only will we have
> > >
> > > #associated-cube #associated-image #associated-radialvelocitycurve ...
> > >
> > > but also
> > >
> > > #derivation-cube #derivation-image #derivation-radialvelocitycurve ...
> > >
> > > and (we've already seen use cases for that)
> > >
> > > #progenitor-cube #progenitor-image #progenitor-radialvelocitycurve ...
> >
> > OK. This means that we are facing the three branches were the links
> targets
> > to datasets or datasets exerpts.
>
> I doubt it would be limited to these three; look at error-map, for
> instance -- it stands to reason that error maps would, in general,
> follow their "main" dataset's type, and hence you'd have
>
> #error-cube #error-image #error-radialvelocitycurve...
>
> I could make that point for noise and weight, again, and I suspect
> for quite a few of the terms we may see in the future.
>
> > > (3) Adding a dataproduct_type column in datalink.  If we started from
> > > scratch, this is probably what I'd do.  As things are now... don't
> > > know.  As for (2), this can start immediately (because datalink lets
> > > you add extra columns), and at it would even have the advantage that
> > > clients that don't parse media types would still understand
> > > content_type.
> > Well, some other people (Alberto for example) have asked for this. I'm
> > reluctant because for most of the links this column will be unused (most
> of
> > the links usecase are not "dataproducts" at all). In general I think we
>
> That a column is empty for many links is not unusual in datalink (see
> service_def and error_message in 1.0).  But also I suspect in most
> datalink documents, the majority of links are actually "sendable" in
> this sense: The progenitors and derivations of images and spectra, in
> all likelihood, will be images and spectra again, as will #error,
> #flat, #noise, #weight, and, of course, #this.
>
> > should try to avoid adding columns in DataLink response and should try to
> > keep it simple. And sepcialy when these columns come from another spec
>
> About the simplicity, as someone wanting to put this stuff into pyVO,
> my personal choice between
>
>   Is semantics one of [#progenitor-image, #associated-image,
>     #derviation-image, #noise-image, #bias-image, #dark-image, ...]?
>
> and
>
>   check the dataproduct_type column and, if there's a value, use that
>   to determine the default SAMP destinations
>
> is fairly clear (in particular because I'll need the second logic
> for Obscore anyway).
>
> The one big downside that I can see with the dataproduct_type column
> is that datalink 1.0 services won't have it for a long time (though
> of course you can always just add the column to a 1.0 service, too).
>
> But then even with a semantics-based solution for the SAMP-sending
> case, the clients would depend on operators adopting the new terms,
> which I wouldn't expect to be instantaneous.
>
> Again, I'd like to hear from Datalink producers and consumers what
> they think.  Of for that, I'd still not count out the solution via
> media type content paramenters; this would be mighty useful far
> beyond Datalink...
>
>         -- Markus
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20191106/0e907c5c/attachment.html>