TAP 1.0: Substantive comments.

Wed Jul 15 12:44:38 PDT 2009

This message brings up a few issues I see in the actual TAP 
specification.  Editorial comments on the presentation in the current 
document will be addressed elsewhere.  I appreciate all the work that 
went into the standard.  Many thanks to the authors.

	Tom McGlynn

1. I'm slightly concerned about the inflexibility of the URL structure 
that is required for the sync/async and the asynchronous UWS hierarchy.
I can imagine that services wishing to treat these differently and while 
I suppose redirects might address this is seems a little rigid.  E.g., I 
might want to have the async requests handled by an entirely different 
server.

2. The in-line table upload feature uses the element name (e.g., as 
specified in the <INPUT type=file name=xxx>)  as the name of the table 
but describes this outside the regular parameter discussion.  Thus there 
is no restriction on having table names which might conflict with 
existing parameter names, e.g., request or query.  However I think this 
would cause problems with typical libraries that interpret CGI 
parameters.  [E.g., I can't see how Perl's CGI library would handle it 
if the user had both a text element named REQUEST and a file upload 
element named REQUEST.]

We should (imho) treat in-line uploads like all other parameters and 
define a parameter namespace for uploaded in-line tables.  E.g. the name 
of the parameter is 'upload:xxxx', where xxxx is the name of the table. 
  (Any kind of prefix and separator would be fine, I'm just using the 
first that comes to mind).  If this namespace (e.g., 'upload:' in my 
example) is reserved for file uploads, then the protocol can allow 
in-line uploads using the standard POST encoding -- or even for tiny 
'files' in GET requests.  The relationship between the TAP protocol and 
HTTP is much simpler.  We have keywords and values and that's the only 
thing we need to know.  TAP is completely independent of the encoding used.

I think this precludes name clashes, simplifies the interface and makes 
it a little more powerful.  E.g., using a GET to upload a table will 
usually be a toy, but it would make it easy to make a little tester URL 
that could be accessed directly from a web page.  Using POST without the 
multipart-mime encoding, makes it easier to pass data script to script a 
la the NVO portal and could be quite useful.

This is also more extensible in the future.  E.g., right now all file 
uploads are required to be tables.  Someday we might want to upload 
files for other purposes, e.g., some kind of authentication, a 
customization file, ... If this suggestion is adopted we can specify 
other name spaces for those purposes as and when they arise in either 
TAP or in the the language documents.

3. I'm not sure I understand what the protocol is saying in general 
about the HTTP status codes and requests.  E.g., if I do an asynchronous 
call and I get a redirect, should I expect to get a 303 next?  Is that 
legal?  This may be more editorial, but I think this area should be 
parsed out a little more.

4. Caching and getAvailability seem to bad things to have together. 
Should the protocol explicitly forbid GET based getAvailability requests?

5. Returning a single result.

I'm sympathetic to Francois' comments here, but opening up the 
possibility of multi-resource VOTables may substantially increase the 
complexity of TAP clients.  Would we allow resources within resources as 
well?  It's a little hard to know where to end.

A middle path might be possible though...

I believe that internally Vizier queries a master 'table' [in quotes 
since I'm not sure this it is at all relational] where it finds tables 
with potential matches and then queries only those tables with matches.

E.g., a URL like:

http://vizier.u-strasbg.fr/viz-bin/votable?-meta&-c=287.5%2b2.05&-c.r=5

returns a VOTable listing only the the tables which have data within 
some radius (5" I think) of 287.5,2.05.  This query does not actually 
query these tables, but just gave metadata about the query.  It's not 
organized as a TAP result might be -- it puts each
table in a separate RESOURCE, but it would seem pretty easy to expose 
something similar as a TAP service which returned a VOTable with a 
single resource and table.  Clients could either present the 
intermediate results to users so they could select tables of interest, 
or go ahead and query Vizier for the matching tables, recapitulating 
what Vizier does internally.

The inventory service at IRSA also does this and internally the HEASARC 
has a similar capability.

Some time ago there was considerable discussion of 'inventory' services 
as a standard protocol at least within the NVO, perhaps that should be 
revived more generally.

6. The protocol doesn't address corner cases and errors very well. E.g., 
the issue above with regard to upload names with the same name as a 
parameter  What are the valid characters for a table name? What happens 
if invalid characters are given?  It's worth trying to standardize at 
least some of the common errors.  Having services fail consistently is 
really nice.  E.g., repeated parameters are forbidden but the behavior 
is undefined.  It would be better to have it fail in some specified way.