DataLink RFC period annoucment

Mon Jul 21 03:10:43 PDT 2014

Hi Markus, all
some comments below, between the lines

2014-07-17 15:03 GMT+02:00 Markus Demleitner <msdemlei at ari.uni-heidelberg.de
>:

> Dear DAL,
>
> Jose provided feedback on Datalink over on interop at ivoa.net.  In case
> there's people on dal at ivoa.net not on interop at ivoa.net, I take the
> liberty of full-quoting Jose; I've also interspersed some comments of
> my own into his.
>
>
Thanks for forwarding to the right list, I just replied-all to the email of
the RFC announcement.
Which are the right steps/mailing lists to take when providing RFC feedback?

> >
> > Page 15. 3.2.4 service_def
> > I would use the value in <PARAM name=???accessURL??? to call the service
> > instead of the one present in field access_url provided by the DataLink
> > VOTable response. Why keeping two potentially different values of
> > accessURL? Maybe I'm missing or misunderstanding something that's not
> > clearly explained..
>
> The problem is that there are two usages for the service descriptors:
>
> (a) as part of a datalink response, where there is, as you say,
> access_url in the datalink table as for any other data link;
>
> (b) as part of a DAL response (say, a SIAP table), where you say "go
> here for postprocessed (cutout, resampled...) data" -- that's the
> thing with the PARAM name="ID" ref="".  In this case, no external
> access URL is available and hence the GROUP must contain it.
>
> One could stipulate that service descriptors within datalink
> documents have no accessURL PARAM and the others do, but I'd say
> that's an implementation complication that's not really warranted.
>
>
Ok, but what to do in the potential case of having different values for
access_url field and accessURL PARAM? This case is explicetely described in
the docs, and as I understand it the solution proposed is to use access_url
value to call the service (or at least it is not clear enough for me) If
this is true, I would prefer to use the value given in accessURL PARAM
instead, as it would be the case for calling a service after a DAL response
(case b)

> > Page 18. Table 2: Error Messages
> > I do not think a NotFoundError may be taken as an error, but as a zero
> > results response (as it is the case for most DAL services) Moreover, the
> > zero response result may allow the inspection of the number and nature of
> > the rows of the VOTable, in the case this response is always the same for
> > any ID.
>
> With not-found situations the server may want to add some explanation
> ("This identifier is not from this site" versus "We seem to have lost
> this file").  We should at least provide it with a means to do this,
> hence the NotFoundError.
>
>
Ok, though some could argue that DAL services in general do not provide
these kind of messages for "zero records found" responses in
multiple-valued params queries. I guess, they just simply skip to next
value.

> Whether it's a good idea to mandate at least one row per ID (up to
> the match limit) and have errors in every case may not be quite as
> clear-cut.  I have to say I'm on the side of one row per ID, but I
> don't have terribly strong arguments for that.  Well, of course
> there's  the general rule that silent failures are bad.  Except when
> they aren't and silent failures are what preserves what's left of the
> user's sanity.  Hm.  No easy answer.
>
>
If I follow your arguments, I would say we could have different
explanations for errors found when creating different links for the same
ID. (i.e. some services not designed to work with a specific dataset)

> > Page 21. -24.
> > 4.3 Example: Service Descriptor for an SIA-1.0 Service
> > 4.4 Example: Custom Access Data Service
> >
> > Should we add use="required" to PARAM tags describing mandatory input
> > params?
>
> use="required" isn't available in VOTable.  And I'd argue that's not
> a big loss anyway, as typically relations between parameters are more
> complex than that ("if you give RA_MAX, you cannot give any of
> PIX_*").  We know how to say these complex things in PDL, and I'd
> hope in a future version we can add VO-DML-based PDL annotation to
> the the PARAMs that would be able to express this kind of thing.
>
>
Ok, fair enough

>
> > Finally, a major point.
> > I think it would be very useful to give the possibility to add a <GROUP
> > name="outputParams"> describing in detail a tabular response of a Custom
> > Access Data Service. Self-described web services in terms of I/O params
> > opens the window to web services interoperability, going beyond data
> > interoperability.
>
> I'm not sure I find this convincing -- for one, most of the services
> described by datalink groups probably will put out data that's not
> obviously tabular in nature (i.e., images and such).

In my view, this is not a reason to forbid this optional use.
I could say many DataLink services will not provide links to adhoc
services, and this does not forbid the use of the adhoc services
description syntax in the DataLink response when it is needed.

> For two, the
> output column metadata in tabular data should really, really be
> contained in the response (as in VOTable and to some degree FITS
> binary), which is where the clients should get it from.
>
>
Yes. VOTables should be accurately self-described, though I do not see why
this should go against describing them also in services as their output.

> For *discovering* services by output table structure ("which services
> return normalised fluxes?"), that's admittedly not good enough, but
> that's a Registry problem (which I still don't consider terribly
> relevant to *data*link).
>
>
I see DataLink as a very powerful way to discover generic adhoc services
not present in the VO Registry. In this sense, I find DataLink somehow
related to service discovery usecases. This is why we are talking about
things like [use=required] (present in VOSI-capabilities but not in
VOTable), VO-DML-based PDL annotation, and descriptions of service outputs.

In my opinion, the description of a service would benefit from a syntax
that also allows the description of its outputs (goind beyond the
human-readable text in the description field of DataLink response), and for
tabular outputs the solution is quite straight-forward and simple, so why
forbiding it?

We have in DALI the MAXREC=0 mechanism to provide description of service
outputs, where the service is not required to execute any specific request
(just a mean to provide a simple hard-coded description of the tabular
outputs) I guess this mechanism has been adopted and approved because there
are use cases behind, nad I guess they may also be valid for DataLink..

> >
> > In the same spirit, I think we should agree on a optional mechanism to
> > provide a detailed description of the number and nature of the links
> given
> > by the datalink service (rows of the reponse VOTable), in the case this
> > response is always the same for any ID.
>
> This sounds interesting and the first requirement that might
> necessitate a registry extension for datalink.

Well, I'm not going that far..
I think this could be solved just adopting the <GROUP name="outputParams">
mechanism

> I don't think anyone
> is wild about having to define one, and the document has been careful
> not to introduce some dependency on it, but if we collect use cases
> that call for it, it's probably not prohibitively hard to do, either.
> What use cases do you have in mind that would be solved by such a
> description?
>
>
Ok, all use cases are based on those DataLink services that provide the
same number and the same nature of rows for any ID/ObsPubDID. These
DataLink services may be seen as serices that always offer the same
*specific pack of links* for every dataset.

For example, consider three different data providers as three SIA services.
One person would like to know that for the first SIA service the
complemetary DataLink provides a set of links with progenitors and
provenance metadata, for the second SIA service the proposed DataLink
service has a very different nature providing cutouts and one specific
analysis service, while the third DAL service offers only related
bibliography through a different DataLink service. These different natures
of these two DataLink services could be known in advance before actually
calling the DataLink services.

The specific nature of complementary DataLink services should not be at all
restricted or categorised, just think on any potentially accessible
resource in the web that could be *linked*, even outside the VO-world: related
bibliography (ADS), SIMBAD or NED objects in the FoV, non-VO services like
those coming from SDSS or SkyView, or even simple doc-like HTML pages..

Cheers !

--
Jose Enrique Ruiz
Instituto Astrofisica Andalucía - CSIC
Glorieta de la Astronomía s/n
18009 Granada, Spain
Tel: +34 958 230 618
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ivoa.net/pipermail/dal/attachments/20140721/8d9ad0f8/attachment-0001.html>