[VEP-0001] DataLink semantics vocabulary enhacement proposal

Fri Nov 8 18:57:39 CET 2019

HI Pat, all,

Le 06/11/2019 à 18:14, Patrick Dowler a écrit :
> I agree with Markus' analysis, re-iterating I think the main points:
>
> 1. associated-data: although the term itself if quite redundant (all 
> links are "associated" in datalink by definition) the concept of 
> "sibling" data is sound: other data (of the same target?). To be 
> clear, I think Markus is thinking that something is one of progenitor, 
> derivation, or sibling. I'd like to find the best word for this but I 
> like it.
The term "associated-data" is experimented in VizieR since a couple of 
years. Outside DataLink usage. It means some dataproduct associated to a 
catalog or a row (source or whatever) in a catalogue.
I think GAVO is also using something like that.

Beside this is "sibling" appropriate to associate a row in a catalog to 
a dataproduct such as an image or a timeseries (underlying use cases) ??

Anyway we need a "top-branch" term widely admitted for this kind of 
use-cases. Should we open a page for proposals ?

>
> To check interpretation, I like to see if the tuple {link} {semantics} 
> {ID} can sensibly be spoken as a sentence (with some filler articles):
>
> http://example.net/foo is-a-spectrum-of blah:123
>
> In that sense, it seems one can use dataproduct_type(s) to describe a 
> relationship between a resource and an identified thing.
Yes exactly what we had  in mind for TimeDomain. All these are sub-terms 
of "associated-data/sibling"
But in addition timeseries require sub-types (lightcurgve, 
radialvelocitycurve, etc...)
>
>
> 2. At the same time, the more SAMP-like use case of driving actions is 
> depending on knowing what the resource at the end of the access_url 
> *is*, not what the relationship is. That sounds more like a job for 
> content-type or a new column and not for semantics. It's also 
> potentially orthogonal to semantics (which I think gives rise to the 
> explosion in number of terms Markus' mentioned). Given that the 
> current range of content types we work with (application/fits, 
> text/x-votable+xml, application/x-hdf5, eg) don't say much of anything 
> about the content to expect, parameterising like we do with 
> content=datalink is a pretty straightforward solution. I think this 
> works and conveys more information to clients independent of other 
> enhancements e might make to the vocabulary or datalink spec.
> It could generally be a good thing to do wherever content-type is 
> conveyed (ObsCore access_format, DataLink content_type, http 
> Content-Type headers, etc).
Just to understand : semantics will be "associated-data/sibling" and in 
that case you look at dataproduct_type string after the semicolumn in 
content-type ?
But the TimeDomain use cases (see Ada's talk at last interop) requires a 
sub-typing (in Obscore  and DataLink).
Cand we use further content-type for that ?
>
> As an aside, I have been thinking about how to enable semantics to 
> contain multiple tags. I have a few use cases where it would be nice 
> to do that -- not sure how great an idea it is though. One thing it 
> does is it more or less removes the need/desire to produce very 
> similiar looking trees of terms with different root terms. I intend to 
> create a VOTable issue explore how exactly to convey a "bag of terms" 
> in a single table cell and a DataLink issue to explore multiple 
> semantics tags. I wanted to mention it here in case it tweaks 
> someone's imagination and because it seems peripherally related.
Indeed, this could allow to use the dataproduct_type/dataproduct_subtype 
branches in semantics in combination with "sibling/associated-data", 
"progenitor etc ....

But you are right this probably requires a change in VOTable which has 
only a char (with dimension) datatype for strings.

More discussion on all this needed.

Cheers
François
>
> --
> Patrick Dowler
> Canadian Astronomy Data Centre
> Victoria, BC, Canada
>
>
> On Mon, 4 Nov 2019 at 05:57, Markus Demleitner 
> <msdemlei at ari.uni-heidelberg.de 
> <mailto:msdemlei at ari.uni-heidelberg.de>> wrote:
>
>     Hi DAL,
>
>     On Tue, Oct 22, 2019 at 06:23:32PM +0200, François Bonnarel wrote:
>     > Le 22/10/2019 à 10:53, Markus Demleitner a écrit :
>     > > On Mon, Oct 21, 2019 at 05:38:32PM +0200, Petr Skoda wrote:
>     > > As far as I can see, there are two use cases in general for
>     datalink
>     > > semantics:
>     > >
>     > > (a) link filtering: The client, based on the semantics, selects a
>     [...]
>     > >
>     > > (b) figure out what do do with a link: When Aladin implemented
>     > > datalink, they found that based on what's in a datalink row, they
>     > > didn't know how to deal with a link: they'd like to send
>     spectra to
>     > > clients listening to spectrum.load.ssa-generic, images to those
>     > > listening to image.load.fits and so forth.  The datalink
>     content_type
>     > > column isn't quite sufficient for this, because
>     > > application/x-votable+xml can be a spectrum or an object catalog,
>     > > whereas image/fits might be some kind of cube or a plain image
>     (or an
>     > > IRAF spectrum, or still something else).  That's the "SAMP sending
>     > > use case" that, I think, was largely missed when we wrote
>     datalink.
>     >
>     > Well, that's strange because from the beginning some of us
>     (authors) had
>     > something like that in mind. Well not exactly "samp" but more
>     generally.
>     > What will the client do with this link. Try to manage it herself
>     and do
>
>     Be that as it may, the actual spec has failed to cover that use case
>     -- which is why we are here.
>
>     > > Having established this much, after a mail from Ada I had
>     another of
>     > > my dangerous epiphanies.  That is, if we really want to deal
>     with use
>     > > case (b) in semantics, we'll end up reproducing the
>     distinction that
>     > > VEP-0001 proposes on in every branch: not only will we have
>     > >
>     > > #associated-cube #associated-image
>     #associated-radialvelocitycurve ...
>     > >
>     > > but also
>     > >
>     > > #derivation-cube #derivation-image
>     #derivation-radialvelocitycurve ...
>     > >
>     > > and (we've already seen use cases for that)
>     > >
>     > > #progenitor-cube #progenitor-image
>     #progenitor-radialvelocitycurve ...
>     >
>     > OK. This means that we are facing the three branches were the
>     links targets
>     > to datasets or datasets exerpts.
>
>     I doubt it would be limited to these three; look at error-map, for
>     instance -- it stands to reason that error maps would, in general,
>     follow their "main" dataset's type, and hence you'd have
>
>     #error-cube #error-image #error-radialvelocitycurve...
>
>     I could make that point for noise and weight, again, and I suspect
>     for quite a few of the terms we may see in the future.
>
>     > > (3) Adding a dataproduct_type column in datalink. If we
>     started from
>     > > scratch, this is probably what I'd do.  As things are now... don't
>     > > know.  As for (2), this can start immediately (because
>     datalink lets
>     > > you add extra columns), and at it would even have the
>     advantage that
>     > > clients that don't parse media types would still understand
>     > > content_type.
>     > Well, some other people (Alberto for example) have asked for
>     this. I'm
>     > reluctant because for most of the links this column will be
>     unused (most of
>     > the links usecase are not "dataproducts" at all). In general I
>     think we
>
>     That a column is empty for many links is not unusual in datalink (see
>     service_def and error_message in 1.0).  But also I suspect in most
>     datalink documents, the majority of links are actually "sendable" in
>     this sense: The progenitors and derivations of images and spectra, in
>     all likelihood, will be images and spectra again, as will #error,
>     #flat, #noise, #weight, and, of course, #this.
>
>     > should try to avoid adding columns in DataLink response and
>     should try to
>     > keep it simple. And sepcialy when these columns come from
>     another spec
>
>     About the simplicity, as someone wanting to put this stuff into pyVO,
>     my personal choice between
>
>       Is semantics one of [#progenitor-image, #associated-image,
>         #derviation-image, #noise-image, #bias-image, #dark-image, ...]?
>
>     and
>
>       check the dataproduct_type column and, if there's a value, use that
>       to determine the default SAMP destinations
>
>     is fairly clear (in particular because I'll need the second logic
>     for Obscore anyway).
>
>     The one big downside that I can see with the dataproduct_type column
>     is that datalink 1.0 services won't have it for a long time (though
>     of course you can always just add the column to a 1.0 service, too).
>
>     But then even with a semantics-based solution for the SAMP-sending
>     case, the clients would depend on operators adopting the new terms,
>     which I wouldn't expect to be instantaneous.
>
>     Again, I'd like to hear from Datalink producers and consumers what
>     they think.  Of for that, I'd still not count out the solution via
>     media type content paramenters; this would be mighty useful far
>     beyond Datalink...
>
>             -- Markus
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20191108/1f1dfaf4/attachment-0001.html>