TAP 1.0: Substantive comments.
Tom McGlynn
Thomas.A.McGlynn at nasa.gov
Wed Jul 15 12:44:38 PDT 2009
This message brings up a few issues I see in the actual TAP
specification. Editorial comments on the presentation in the current
document will be addressed elsewhere. I appreciate all the work that
went into the standard. Many thanks to the authors.
Tom McGlynn
1. I'm slightly concerned about the inflexibility of the URL structure
that is required for the sync/async and the asynchronous UWS hierarchy.
I can imagine that services wishing to treat these differently and while
I suppose redirects might address this is seems a little rigid. E.g., I
might want to have the async requests handled by an entirely different
server.
2. The in-line table upload feature uses the element name (e.g., as
specified in the <INPUT type=file name=xxx>) as the name of the table
but describes this outside the regular parameter discussion. Thus there
is no restriction on having table names which might conflict with
existing parameter names, e.g., request or query. However I think this
would cause problems with typical libraries that interpret CGI
parameters. [E.g., I can't see how Perl's CGI library would handle it
if the user had both a text element named REQUEST and a file upload
element named REQUEST.]
We should (imho) treat in-line uploads like all other parameters and
define a parameter namespace for uploaded in-line tables. E.g. the name
of the parameter is 'upload:xxxx', where xxxx is the name of the table.
(Any kind of prefix and separator would be fine, I'm just using the
first that comes to mind). If this namespace (e.g., 'upload:' in my
example) is reserved for file uploads, then the protocol can allow
in-line uploads using the standard POST encoding -- or even for tiny
'files' in GET requests. The relationship between the TAP protocol and
HTTP is much simpler. We have keywords and values and that's the only
thing we need to know. TAP is completely independent of the encoding used.
I think this precludes name clashes, simplifies the interface and makes
it a little more powerful. E.g., using a GET to upload a table will
usually be a toy, but it would make it easy to make a little tester URL
that could be accessed directly from a web page. Using POST without the
multipart-mime encoding, makes it easier to pass data script to script a
la the NVO portal and could be quite useful.
This is also more extensible in the future. E.g., right now all file
uploads are required to be tables. Someday we might want to upload
files for other purposes, e.g., some kind of authentication, a
customization file, ... If this suggestion is adopted we can specify
other name spaces for those purposes as and when they arise in either
TAP or in the the language documents.
3. I'm not sure I understand what the protocol is saying in general
about the HTTP status codes and requests. E.g., if I do an asynchronous
call and I get a redirect, should I expect to get a 303 next? Is that
legal? This may be more editorial, but I think this area should be
parsed out a little more.
4. Caching and getAvailability seem to bad things to have together.
Should the protocol explicitly forbid GET based getAvailability requests?
5. Returning a single result.
I'm sympathetic to Francois' comments here, but opening up the
possibility of multi-resource VOTables may substantially increase the
complexity of TAP clients. Would we allow resources within resources as
well? It's a little hard to know where to end.
A middle path might be possible though...
I believe that internally Vizier queries a master 'table' [in quotes
since I'm not sure this it is at all relational] where it finds tables
with potential matches and then queries only those tables with matches.
E.g., a URL like:
http://vizier.u-strasbg.fr/viz-bin/votable?-meta&-c=287.5%2b2.05&-c.r=5
returns a VOTable listing only the the tables which have data within
some radius (5" I think) of 287.5,2.05. This query does not actually
query these tables, but just gave metadata about the query. It's not
organized as a TAP result might be -- it puts each
table in a separate RESOURCE, but it would seem pretty easy to expose
something similar as a TAP service which returned a VOTable with a
single resource and table. Clients could either present the
intermediate results to users so they could select tables of interest,
or go ahead and query Vizier for the matching tables, recapitulating
what Vizier does internally.
The inventory service at IRSA also does this and internally the HEASARC
has a similar capability.
Some time ago there was considerable discussion of 'inventory' services
as a standard protocol at least within the NVO, perhaps that should be
revived more generally.
6. The protocol doesn't address corner cases and errors very well. E.g.,
the issue above with regard to upload names with the same name as a
parameter What are the valid characters for a table name? What happens
if invalid characters are given? It's worth trying to standardize at
least some of the common errors. Having services fail consistently is
really nice. E.g., repeated parameters are forbidden but the behavior
is undefined. It would be better to have it fail in some specified way.
More information about the dal
mailing list