[VEP-0001] DataLink semantics vocabulary enhacement proposal
François Bonnarel
francois.bonnarel at astro.unistra.fr
Fri Nov 8 18:57:39 CET 2019
HI Pat, all,
Le 06/11/2019 à 18:14, Patrick Dowler a écrit :
> I agree with Markus' analysis, re-iterating I think the main points:
>
> 1. associated-data: although the term itself if quite redundant (all
> links are "associated" in datalink by definition) the concept of
> "sibling" data is sound: other data (of the same target?). To be
> clear, I think Markus is thinking that something is one of progenitor,
> derivation, or sibling. I'd like to find the best word for this but I
> like it.
The term "associated-data" is experimented in VizieR since a couple of
years. Outside DataLink usage. It means some dataproduct associated to a
catalog or a row (source or whatever) in a catalogue.
I think GAVO is also using something like that.
Beside this is "sibling" appropriate to associate a row in a catalog to
a dataproduct such as an image or a timeseries (underlying use cases) ??
Anyway we need a "top-branch" term widely admitted for this kind of
use-cases. Should we open a page for proposals ?
>
> To check interpretation, I like to see if the tuple {link} {semantics}
> {ID} can sensibly be spoken as a sentence (with some filler articles):
>
> http://example.net/foo is-a-spectrum-of blah:123
>
> In that sense, it seems one can use dataproduct_type(s) to describe a
> relationship between a resource and an identified thing.
Yes exactly what we had in mind for TimeDomain. All these are sub-terms
of "associated-data/sibling"
But in addition timeseries require sub-types (lightcurgve,
radialvelocitycurve, etc...)
>
>
> 2. At the same time, the more SAMP-like use case of driving actions is
> depending on knowing what the resource at the end of the access_url
> *is*, not what the relationship is. That sounds more like a job for
> content-type or a new column and not for semantics. It's also
> potentially orthogonal to semantics (which I think gives rise to the
> explosion in number of terms Markus' mentioned). Given that the
> current range of content types we work with (application/fits,
> text/x-votable+xml, application/x-hdf5, eg) don't say much of anything
> about the content to expect, parameterising like we do with
> content=datalink is a pretty straightforward solution. I think this
> works and conveys more information to clients independent of other
> enhancements e might make to the vocabulary or datalink spec.
> It could generally be a good thing to do wherever content-type is
> conveyed (ObsCore access_format, DataLink content_type, http
> Content-Type headers, etc).
Just to understand : semantics will be "associated-data/sibling" and in
that case you look at dataproduct_type string after the semicolumn in
content-type ?
But the TimeDomain use cases (see Ada's talk at last interop) requires a
sub-typing (in Obscore and DataLink).
Cand we use further content-type for that ?
>
> As an aside, I have been thinking about how to enable semantics to
> contain multiple tags. I have a few use cases where it would be nice
> to do that -- not sure how great an idea it is though. One thing it
> does is it more or less removes the need/desire to produce very
> similiar looking trees of terms with different root terms. I intend to
> create a VOTable issue explore how exactly to convey a "bag of terms"
> in a single table cell and a DataLink issue to explore multiple
> semantics tags. I wanted to mention it here in case it tweaks
> someone's imagination and because it seems peripherally related.
Indeed, this could allow to use the dataproduct_type/dataproduct_subtype
branches in semantics in combination with "sibling/associated-data",
"progenitor etc ....
But you are right this probably requires a change in VOTable which has
only a char (with dimension) datatype for strings.
More discussion on all this needed.
Cheers
François
>
> --
> Patrick Dowler
> Canadian Astronomy Data Centre
> Victoria, BC, Canada
>
>
> On Mon, 4 Nov 2019 at 05:57, Markus Demleitner
> <msdemlei at ari.uni-heidelberg.de
> <mailto:msdemlei at ari.uni-heidelberg.de>> wrote:
>
> Hi DAL,
>
> On Tue, Oct 22, 2019 at 06:23:32PM +0200, François Bonnarel wrote:
> > Le 22/10/2019 à 10:53, Markus Demleitner a écrit :
> > > On Mon, Oct 21, 2019 at 05:38:32PM +0200, Petr Skoda wrote:
> > > As far as I can see, there are two use cases in general for
> datalink
> > > semantics:
> > >
> > > (a) link filtering: The client, based on the semantics, selects a
> [...]
> > >
> > > (b) figure out what do do with a link: When Aladin implemented
> > > datalink, they found that based on what's in a datalink row, they
> > > didn't know how to deal with a link: they'd like to send
> spectra to
> > > clients listening to spectrum.load.ssa-generic, images to those
> > > listening to image.load.fits and so forth. The datalink
> content_type
> > > column isn't quite sufficient for this, because
> > > application/x-votable+xml can be a spectrum or an object catalog,
> > > whereas image/fits might be some kind of cube or a plain image
> (or an
> > > IRAF spectrum, or still something else). That's the "SAMP sending
> > > use case" that, I think, was largely missed when we wrote
> datalink.
> >
> > Well, that's strange because from the beginning some of us
> (authors) had
> > something like that in mind. Well not exactly "samp" but more
> generally.
> > What will the client do with this link. Try to manage it herself
> and do
>
> Be that as it may, the actual spec has failed to cover that use case
> -- which is why we are here.
>
> > > Having established this much, after a mail from Ada I had
> another of
> > > my dangerous epiphanies. That is, if we really want to deal
> with use
> > > case (b) in semantics, we'll end up reproducing the
> distinction that
> > > VEP-0001 proposes on in every branch: not only will we have
> > >
> > > #associated-cube #associated-image
> #associated-radialvelocitycurve ...
> > >
> > > but also
> > >
> > > #derivation-cube #derivation-image
> #derivation-radialvelocitycurve ...
> > >
> > > and (we've already seen use cases for that)
> > >
> > > #progenitor-cube #progenitor-image
> #progenitor-radialvelocitycurve ...
> >
> > OK. This means that we are facing the three branches were the
> links targets
> > to datasets or datasets exerpts.
>
> I doubt it would be limited to these three; look at error-map, for
> instance -- it stands to reason that error maps would, in general,
> follow their "main" dataset's type, and hence you'd have
>
> #error-cube #error-image #error-radialvelocitycurve...
>
> I could make that point for noise and weight, again, and I suspect
> for quite a few of the terms we may see in the future.
>
> > > (3) Adding a dataproduct_type column in datalink. If we
> started from
> > > scratch, this is probably what I'd do. As things are now... don't
> > > know. As for (2), this can start immediately (because
> datalink lets
> > > you add extra columns), and at it would even have the
> advantage that
> > > clients that don't parse media types would still understand
> > > content_type.
> > Well, some other people (Alberto for example) have asked for
> this. I'm
> > reluctant because for most of the links this column will be
> unused (most of
> > the links usecase are not "dataproducts" at all). In general I
> think we
>
> That a column is empty for many links is not unusual in datalink (see
> service_def and error_message in 1.0). But also I suspect in most
> datalink documents, the majority of links are actually "sendable" in
> this sense: The progenitors and derivations of images and spectra, in
> all likelihood, will be images and spectra again, as will #error,
> #flat, #noise, #weight, and, of course, #this.
>
> > should try to avoid adding columns in DataLink response and
> should try to
> > keep it simple. And sepcialy when these columns come from
> another spec
>
> About the simplicity, as someone wanting to put this stuff into pyVO,
> my personal choice between
>
> Is semantics one of [#progenitor-image, #associated-image,
> #derviation-image, #noise-image, #bias-image, #dark-image, ...]?
>
> and
>
> check the dataproduct_type column and, if there's a value, use that
> to determine the default SAMP destinations
>
> is fairly clear (in particular because I'll need the second logic
> for Obscore anyway).
>
> The one big downside that I can see with the dataproduct_type column
> is that datalink 1.0 services won't have it for a long time (though
> of course you can always just add the column to a 1.0 service, too).
>
> But then even with a semantics-based solution for the SAMP-sending
> case, the clients would depend on operators adopting the new terms,
> which I wouldn't expect to be instantaneous.
>
> Again, I'd like to hear from Datalink producers and consumers what
> they think. Of for that, I'd still not count out the solution via
> media type content paramenters; this would be mighty useful far
> beyond Datalink...
>
> -- Markus
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20191108/1f1dfaf4/attachment-0001.html>
More information about the dal
mailing list