DataLink RFC period annoucment

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Thu Jul 17 06:03:51 PDT 2014


Dear DAL,

Jose provided feedback on Datalink over on interop at ivoa.net.  In case
there's people on dal at ivoa.net not on interop at ivoa.net, I take the
liberty of full-quoting Jose; I've also interspersed some comments of
my own into his.

On Mon, Jul 14, 2014 at 09:34:04AM +0200, Jose Enrique Ruiz wrote:
> Hi François, all
> please find some comments below:
> 
> 
> 
> ---
> All along the doc: several [ref] to fix.
> 
> page 1.
> http://www.ivoa.net/documents/DataLink/20140228/index.html is duplicated
> 
> page 5. 2nd p.
> I would stress on the fact that the service descriptor resource describes
> how to *query* a service. It does not describe in detail, for example, what
> the service returns.
> 
> Page 7. 3rd p.
> At the end of the paragraph: "[The s]"
> 
> Page 7. 4th p.
> Is this actually the same use case of 1.2.1?
> 
> Page 8. 2nd p.
> At the end of the paragraph: "Providers should be able to describe [...]"
> 
> Page 11
> Remove block describing param REQUEST, since it is no longer required.
> 
> Page 15. 2nd paragraph
> "This resource [is] typically describes.."
> 
> Page 15. 3.2.4 service_def
> I would use the value in <PARAM name=???accessURL??? to call the service
> instead of the one present in field access_url provided by the DataLink
> VOTable response. Why keeping two potentially different values of
> accessURL? Maybe I'm missing or misunderstanding something that's not
> clearly explained..

The problem is that there are two usages for the service descriptors:

(a) as part of a datalink response, where there is, as you say,
access_url in the datalink table as for any other data link;

(b) as part of a DAL response (say, a SIAP table), where you say "go
here for postprocessed (cutout, resampled...) data" -- that's the
thing with the PARAM name="ID" ref="".  In this case, no external
access URL is available and hence the GROUP must contain it.

One could stipulate that service descriptors within datalink
documents have no accessURL PARAM and the others do, but I'd say
that's an implementation complication that's not really warranted.

> 
> Page 16.
> http://www.ivoa.net/rdf/datalink does not exist. 404

In this case, a 404 is almost fine, as the URL really only defines a
name space, and in this role there's not requirement it resolves to
anything at all.  In our case, though, we promise there's an RDF file
there that would let people figure out semantic relationships between
the various terms that are there (e.g., a "flatfield" is some kind of
"file used in data reduction").

Things still work with the 404, but it'd suck if it were there at REC
time.  So, is anyone actively working on getting the vocabulary in?
Can we discuss it a bit, too?


> Page 16. 3.2.7 conte_type and content_length
> In the case the link is a pointer to a an ad-hoc service, it may happen
> that content_type and content_length cannot be defined before calling with
> a specific input params chosen by the user. I'm thinking of a service that
> generates images on-the-fly, and based on the input params this result
> image may be very different in size, and its format may be png, jpeg or
> fits. Which values for  conte_type and content_length for these cases?
> blank?

Yes to blank/null.  I'd argue that is implied in the required=no in
Table 1.  I seem to remember there once was prose making this a bit more
explicit in previous versions; I'm not sure how much I miss it now.


> Page 16. 3.2.8 content_length
> I would use unit="Kbyte", much more practical and user-friendly.

This is a protocol, and hence users will not usually see the raw
table, and hence the unit chosen doesn't really matter; it's up to
the clients to format and display this information, if at all.
Except with Kbyte we wouldn't leave the realm of 32-bit integers
quite as quickly.

Which made me notice we don't define the type of content_length yet.
I think we should at least make a recommendation.  My first choice
would be "long", which in VOTable is a 64 bit integer, and
unit="byte" will do fine then [quick: how many 2014 hard drives can
you fill before VOTable longs warp over with the number of bytes
stored?  Assuming one hard drive weighs 100g, express the mass of
that storage cluster in solar masses].

If people are worried about interoperability of such longs and were
to advocate int, I'd say unit should be kbyte (decimal prefix) with
commercial rounding or so.

For float, it wouldn't really matter and I'd go for byte again.

So, which would it be?

> Page 18. Table 2: Error Messages
> I do not think a NotFoundError may be taken as an error, but as a zero
> results response (as it is the case for most DAL services) Moreover, the
> zero response result may allow the inspection of the number and nature of
> the rows of the VOTable, in the case this response is always the same for
> any ID.

With not-found situations the server may want to add some explanation
("This identifier is not from this site" versus "We seem to have lost
this file").  We should at least provide it with a means to do this,
hence the NotFoundError.

Whether it's a good idea to mandate at least one row per ID (up to
the match limit) and have errors in every case may not be quite as
clear-cut.  I have to say I'm on the side of one row per ID, but I
don't have terribly strong arguments for that.  Well, of course
there's  the general rule that silent failures are bad.  Except when
they aren't and silent failures are what preserves what's left of the
user's sanity.  Hm.  No easy answer.

> Page 20. bottom of the page
> <PARAM name="resourceIdentifier" datatype="char" arraysize="*"
> Is resourceIdentifier really required/mandatory for a DataLink service?

No -- and you're right, that should be made clearer at this point.

> Page 21. top of the page
> value="ivo://ivoa.net/std/DataLink#links" />
> but value provided in Page 10. is  ivo://ivoa.net/std/DataLink#links-1.0
> 
> Page 21. -24.
> 4.3 Example: Service Descriptor for an SIA-1.0 Service
> 4.4 Example: Custom Access Data Service
> 
> Should we add use="required" to PARAM tags describing mandatory input
> params?

use="required" isn't available in VOTable.  And I'd argue that's not
a big loss anyway, as typically relations between parameters are more
complex than that ("if you give RA_MAX, you cannot give any of
PIX_*").  We know how to say these complex things in PDL, and I'd
hope in a future version we can add VO-DML-based PDL annotation to
the the PARAMs that would be able to express this kind of thing.

> I would add one example of ref="columnID" (other than the obs_publish_id)
> to one or several PARAM tags describing an input param whose value is taken
> from the tabular data present in <RESOURCE type=???results???>
> 
> I would stress on the fact that the Service Descriptor syntax allows also
> providing default values, which facilitates the use for a client.
> 
> Page 24. 3rd p.
> 9it is related to photometric or flux calibration).
> 
> Finally, a major point.
> I think it would be very useful to give the possibility to add a <GROUP
> name="outputParams"> describing in detail a tabular response of a Custom
> Access Data Service. Self-described web services in terms of I/O params
> opens the window to web services interoperability, going beyond data
> interoperability.

I'm not sure I find this convincing -- for one, most of the services
described by datalink groups probably will put out data that's not
obviously tabular in nature (i.e., images and such).  For two, the
output column metadata in tabular data should really, really be
contained in the response (as in VOTable and to some degree FITS
binary), which is where the clients should get it from.

For *discovering* services by output table structure ("which services
return normalised fluxes?"), that's admittedly not good enough, but
that's a Registry problem (which I still don't consider terribly
relevant to *data*link).

> 
> In the same spirit, I think we should agree on a optional mechanism to
> provide a detailed description of the number and nature of the links given
> by the datalink service (rows of the reponse VOTable), in the case this
> response is always the same for any ID.

This sounds interesting and the first requirement that might
necessitate a registry extension for datalink.  I don't think anyone
is wild about having to define one, and the document has been careful
not to introduce some dependency on it, but if we collect use cases
that call for it, it's probably not prohibitively hard to do, either.
What use cases do you have in mind that would be solved by such a
description?

> 
> --
> 
> Bonne fête du 14 juillet!
> 
> 
> 
> 
> --
> Jose Enrique Ruiz
> Instituto Astrofisica Andalucía - CSIC
> Glorieta de la Astronomía s/n
> 18009 Granada, Spain
> Tel: +34 958 230 618

Cheers,

            Markus



More information about the dal mailing list