[VEP-0001] DataLink semantics vocabulary enhacement proposal

Fri Nov 8 18:35:39 CET 2019

HI Markus, all


Le 04/11/2019 à 14:56, Markus Demleitner a écrit :
> Hi DAL,
>
> On Tue, Oct 22, 2019 at 06:23:32PM +0200, François Bonnarel wrote:
>> Le 22/10/2019 à 10:53, Markus Demleitner a écrit :
>>> On Mon, Oct 21, 2019 at 05:38:32PM +0200, Petr Skoda wrote:
>>> As far as I can see, there are two use cases in general for datalink
>>> semantics:
>>>
>>> (a) link filtering: The client, based on the semantics, selects a
> [...]
>>> (b) figure out what do do with a link: When Aladin implemented
>>> datalink, they found that based on what's in a datalink row, they
>>> didn't know how to deal with a link: they'd like to send spectra to
>>> clients listening to spectrum.load.ssa-generic, images to those
>>> listening to image.load.fits and so forth.  The datalink content_type
>>> column isn't quite sufficient for this, because
>>> application/x-votable+xml can be a spectrum or an object catalog,
>>> whereas image/fits might be some kind of cube or a plain image (or an
>>> IRAF spectrum, or still something else).  That's the "SAMP sending
>>> use case" that, I think, was largely missed when we wrote datalink.
>> Well, that's strange because from the beginning some of us (authors) had
>> something like that in mind. Well not exactly "samp" but more generally.
>> What will the client do with this link. Try to manage it herself and do
> Be that as it may, the actual spec has failed to cover that use case
> -- which is why we are here.
I don't understand this statement.
When we have "semantics" equal to "progenitor" and content-type 
"application/fits" Client know what to do and in case it's relevant 
which SAMP communication to use.

We have now more use cases and so we need wider semantics (and maybe 
wider content-type)
>
>>> Having established this much, after a mail from Ada I had another of
>>> my dangerous epiphanies.  That is, if we really want to deal with use
>>> case (b) in semantics, we'll end up reproducing the distinction that
>>> VEP-0001 proposes on in every branch: not only will we have
>>>
>>> #associated-cube #associated-image #associated-radialvelocitycurve ...
>>>
>>> but also
>>>
>>> #derivation-cube #derivation-image #derivation-radialvelocitycurve ...
>>>
>>> and (we've already seen use cases for that)
>>>
>>> #progenitor-cube #progenitor-image #progenitor-radialvelocitycurve ...
>> OK. This means that we are facing the three branches were the links targets
>> to datasets or datasets exerpts.
> I doubt it would be limited to these three; look at error-map, for
> instance -- it stands to reason that error maps would, in general,
> follow their "main" dataset's type, and hence you'd have
>
> #error-cube #error-image #error-radialvelocitycurve...
>
> I could make that point for noise and weight, again, and I suspect
> for quite a few of the terms we may see in the future.
OK, that's true.
But the problem really occurs when the product type of the target and 
the product type of the initial item are different.
>
>>> (3) Adding a dataproduct_type column in datalink.  If we started from
>>> scratch, this is probably what I'd do.  As things are now... don't
>>> know.  As for (2), this can start immediately (because datalink lets
>>> you add extra columns), and at it would even have the advantage that
>>> clients that don't parse media types would still understand
>>> content_type.
>> Well, some other people (Alberto for example) have asked for this. I'm
>> reluctant because for most of the links this column will be unused (most of
>> the links usecase are not "dataproducts" at all). In general I think we
> That a column is empty for many links is not unusual in datalink (see
> service_def and error_message in 1.0).
For sure these three come together are three exclusive options of the 
same "linkage" fonctionality/
> But also I suspect in most
> datalink documents, the majority of links are actually "sendable" in
> this sense: The progenitors and derivations of images and spectra, in
> all likelihood, will be images and spectra again, as will #error,
> #flat, #noise, #weight, and, of course, #this.
I agree, and that's why I am a little reluctant to add this typuical 
Obscore column "dataproduct-type" in DataLink.
>
>> should try to avoid adding columns in DataLink response and should try to
>> keep it simple. And sepcialy when these columns come from another spec
> About the simplicity, as someone wanting to put this stuff into pyVO,
> my personal choice between
>
>    Is semantics one of [#progenitor-image, #associated-image,
>      #derviation-image, #noise-image, #bias-image, #dark-image, ...]?
>
> and
>
>    check the dataproduct_type column and, if there's a value, use that
>    to determine the default SAMP destinations
other advice on this DAL-folks ?
I thought checking a wider range of semantics field values was easier 
than to manage a possible additional column.
>
> is fairly clear (in particular because I'll need the second logic
> for Obscore anyway).
Humm in Obscore data_productype will not help you to know what to do 
with acces_url target until you check access_format anyway. (is it a 
DataLink or a direct access)
>
> The one big downside that I can see with the dataproduct_type column
> is that datalink 1.0 services won't have it for a long time
I agree with this one.
> (though
> of course you can always just add the column to a 1.0 service, too).
>
> But then even with a semantics-based solution for the SAMP-sending
> case, the clients would depend on operators adopting the new terms,
> which I wouldn't expect to be instantaneous.
>
> Again, I'd like to hear from Datalink producers and consumers what
> they think.  Of for that, I'd still not count out the solution via
> media type content paramenters; this would be mighty useful far
> beyond Datalink...
OK this point ion Pat's message

Cheers
François
>
>          -- Markus