[VEP-0001] DataLink semantics vocabulary enhacement proposal

Patrick Dowler pdowler.cadc at gmail.com
Tue Nov 12 18:21:49 CET 2019


On the ObsCore dataproduct_type and subtype, I also have the feeling there
that the (optional) subtype isn't a highly useful construct when I contrast
it with the alternative of making dataproduct_type a real extensible
vocabulary. The catch, of course, would be to make it feasible for people
to query (eg in TAP+ADQL) a vocabulary column. Output is not a problem but
querying right now would be by exact match only... it would be really cool
if you could do something like "where ivo_vocab_match(dataproduct_type,
'cube')" and that would match "cube" and child terms... or you could be
more specific (down to dataproduct_type = 'specific type').

I think I could handle this feasibly if the vocab function just dealt with
the base terms, eg "where ivo_vocab_base(dataproduct_type) = 'cube' -- that
would give the same query power as now but allow extending the vocabulary
to more specific types. I think I like that better than a type/subtype
pair...

Thoughts?

--
Patrick Dowler
Canadian Astronomy Data Centre
Victoria, BC, Canada


On Fri, 8 Nov 2019 at 09:57, François Bonnarel <
francois.bonnarel at astro.unistra.fr> wrote:

> HI Pat, all,
>
> Le 06/11/2019 à 18:14, Patrick Dowler a écrit :
>
> I agree with Markus' analysis, re-iterating I think the main points:
>
> 1. associated-data: although the term itself if quite redundant (all links
> are "associated" in datalink by definition) the concept of "sibling" data
> is sound: other data (of the same target?). To be clear, I think Markus is
> thinking that something is one of progenitor, derivation, or sibling. I'd
> like to find the best word for this but I like it.
>
> The term "associated-data" is experimented in VizieR since a couple of
> years. Outside DataLink usage. It means some dataproduct associated to a
> catalog or a row (source or whatever) in a catalogue.
> I think GAVO is also using something like that.
>
> Beside this is "sibling" appropriate to associate a row in a catalog to a
> dataproduct such as an image or a timeseries (underlying use cases) ??
>
> Anyway we need a "top-branch" term widely admitted for this kind of
> use-cases. Should we open a page for proposals ?
>
>
>
> To check interpretation, I like to see if the tuple {link} {semantics}
> {ID} can sensibly be spoken as a sentence (with some filler articles):
>
> http://example.net/foo is-a-spectrum-of blah:123
>
> In that sense, it seems one can use dataproduct_type(s) to describe a
> relationship between a resource and an identified thing.
>
> Yes exactly what we had  in mind for TimeDomain. All these are sub-terms
> of "associated-data/sibling"
> But in addition timeseries require sub-types (lightcurgve,
> radialvelocitycurve, etc...)
>
>
>
> 2. At the same time, the more SAMP-like use case of driving actions is
> depending on knowing what the resource at the end of the access_url *is*,
> not what the relationship is. That sounds more like a job for content-type
> or a new column and not for semantics. It's also potentially orthogonal to
> semantics (which I think gives rise to the explosion in number of terms
> Markus' mentioned). Given that the current range of content types we work
> with (application/fits, text/x-votable+xml, application/x-hdf5, eg) don't
> say much of anything about the content to expect, parameterising like we do
> with content=datalink is a pretty straightforward solution. I think this
> works and conveys more information to clients independent of other
> enhancements e might make to the vocabulary or datalink spec.
> It could generally be a good thing to do wherever content-type is conveyed
> (ObsCore access_format, DataLink content_type, http Content-Type headers,
> etc).
>
> Just to understand : semantics will be "associated-data/sibling"   and in
> that case you look at dataproduct_type string after the semicolumn in
> content-type ?
> But the TimeDomain use cases (see Ada's talk at last interop) requires a
> sub-typing (in Obscore  and DataLink).
> Cand we use further content-type for that ?
>
>
> As an aside, I have been thinking about how to enable semantics to contain
> multiple tags. I have a few use cases where it would be nice to do that --
> not sure how great an idea it is though. One thing it does is it more or
> less removes the need/desire to produce very similiar looking trees of
> terms with different root terms. I intend to create a VOTable issue explore
> how exactly to convey a "bag of terms" in a single table cell and a
> DataLink issue to explore multiple semantics tags. I wanted to mention it
> here in case it tweaks someone's imagination and because it seems
> peripherally related.
>
> Indeed, this could allow to use the dataproduct_type/dataproduct_subtype
> branches in semantics in combination with "sibling/associated-data",
> "progenitor etc ....
>
> But you are right this probably requires a change in VOTable which has
> only a char (with dimension) datatype for strings.
>
>
>
> More discussion on all this needed.
>
> Cheers
> François
>
>
> --
> Patrick Dowler
> Canadian Astronomy Data Centre
> Victoria, BC, Canada
>
>
> On Mon, 4 Nov 2019 at 05:57, Markus Demleitner <
> msdemlei at ari.uni-heidelberg.de> wrote:
>
>> Hi DAL,
>>
>> On Tue, Oct 22, 2019 at 06:23:32PM +0200, François Bonnarel wrote:
>> > Le 22/10/2019 à 10:53, Markus Demleitner a écrit :
>> > > On Mon, Oct 21, 2019 at 05:38:32PM +0200, Petr Skoda wrote:
>> > > As far as I can see, there are two use cases in general for datalink
>> > > semantics:
>> > >
>> > > (a) link filtering: The client, based on the semantics, selects a
>> [...]
>> > >
>> > > (b) figure out what do do with a link: When Aladin implemented
>> > > datalink, they found that based on what's in a datalink row, they
>> > > didn't know how to deal with a link: they'd like to send spectra to
>> > > clients listening to spectrum.load.ssa-generic, images to those
>> > > listening to image.load.fits and so forth.  The datalink content_type
>> > > column isn't quite sufficient for this, because
>> > > application/x-votable+xml can be a spectrum or an object catalog,
>> > > whereas image/fits might be some kind of cube or a plain image (or an
>> > > IRAF spectrum, or still something else).  That's the "SAMP sending
>> > > use case" that, I think, was largely missed when we wrote datalink.
>> >
>> > Well, that's strange because from the beginning some of us (authors) had
>> > something like that in mind. Well not exactly "samp" but more generally.
>> > What will the client do with this link. Try to manage it herself and do
>>
>> Be that as it may, the actual spec has failed to cover that use case
>> -- which is why we are here.
>>
>> > > Having established this much, after a mail from Ada I had another of
>> > > my dangerous epiphanies.  That is, if we really want to deal with use
>> > > case (b) in semantics, we'll end up reproducing the distinction that
>> > > VEP-0001 proposes on in every branch: not only will we have
>> > >
>> > > #associated-cube #associated-image #associated-radialvelocitycurve ...
>> > >
>> > > but also
>> > >
>> > > #derivation-cube #derivation-image #derivation-radialvelocitycurve ...
>> > >
>> > > and (we've already seen use cases for that)
>> > >
>> > > #progenitor-cube #progenitor-image #progenitor-radialvelocitycurve ...
>> >
>> > OK. This means that we are facing the three branches were the links
>> targets
>> > to datasets or datasets exerpts.
>>
>> I doubt it would be limited to these three; look at error-map, for
>> instance -- it stands to reason that error maps would, in general,
>> follow their "main" dataset's type, and hence you'd have
>>
>> #error-cube #error-image #error-radialvelocitycurve...
>>
>> I could make that point for noise and weight, again, and I suspect
>> for quite a few of the terms we may see in the future.
>>
>> > > (3) Adding a dataproduct_type column in datalink.  If we started from
>> > > scratch, this is probably what I'd do.  As things are now... don't
>> > > know.  As for (2), this can start immediately (because datalink lets
>> > > you add extra columns), and at it would even have the advantage that
>> > > clients that don't parse media types would still understand
>> > > content_type.
>> > Well, some other people (Alberto for example) have asked for this. I'm
>> > reluctant because for most of the links this column will be unused
>> (most of
>> > the links usecase are not "dataproducts" at all). In general I think we
>>
>> That a column is empty for many links is not unusual in datalink (see
>> service_def and error_message in 1.0).  But also I suspect in most
>> datalink documents, the majority of links are actually "sendable" in
>> this sense: The progenitors and derivations of images and spectra, in
>> all likelihood, will be images and spectra again, as will #error,
>> #flat, #noise, #weight, and, of course, #this.
>>
>> > should try to avoid adding columns in DataLink response and should try
>> to
>> > keep it simple. And sepcialy when these columns come from another spec
>>
>> About the simplicity, as someone wanting to put this stuff into pyVO,
>> my personal choice between
>>
>>   Is semantics one of [#progenitor-image, #associated-image,
>>     #derviation-image, #noise-image, #bias-image, #dark-image, ...]?
>>
>> and
>>
>>   check the dataproduct_type column and, if there's a value, use that
>>   to determine the default SAMP destinations
>>
>> is fairly clear (in particular because I'll need the second logic
>> for Obscore anyway).
>>
>> The one big downside that I can see with the dataproduct_type column
>> is that datalink 1.0 services won't have it for a long time (though
>> of course you can always just add the column to a 1.0 service, too).
>>
>> But then even with a semantics-based solution for the SAMP-sending
>> case, the clients would depend on operators adopting the new terms,
>> which I wouldn't expect to be instantaneous.
>>
>> Again, I'd like to hear from Datalink producers and consumers what
>> they think.  Of for that, I'd still not count out the solution via
>> media type content paramenters; this would be mighty useful far
>> beyond Datalink...
>>
>>         -- Markus
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20191112/0c706251/attachment-0001.html>


More information about the dal mailing list