TAP 1.0: Substantive comments.

Tom McGlynn Thomas.A.McGlynn at nasa.gov
Mon Aug 31 07:36:07 PDT 2009


Hi Pat,

Comments on comments on comments...  The first issue (i.e., #2) is the 
big one for me.  I was on vacation last week so sorry for the delay.

	Tom

Patrick Dowler wrote:
...

>> 2. The in-line table upload feature uses the element name (e.g., as
>> specified in the <INPUT type=file name=xxx>)  as the name of the table
>> but describes this outside the regular parameter discussion.  Thus there
>> is no restriction on having table names which might conflict with
>> existing parameter names, e.g., request or query.  However I think this
>> would cause problems with typical libraries that interpret CGI
>> parameters.  [E.g., I can't see how Perl's CGI library would handle it
>> if the user had both a text element named REQUEST and a file upload
>> element named REQUEST.]
> 
> The only limitation on table name is that it is a legal table name in the 
> query language used. It could be that we are currently overly restrictive and 
> say that it has to be a legal ADQL table name... will check and clarify if 
> needed.
> 
> The UPLOAD parameter specifies a pair of values: table name and URI to the 
> content (could be http url, vospace uri, etc). That usage is simple and clear. 
> For the inline method, we opted for using an existing feature (the name), In 
> both cases, the name and content have to be tightly coupled because a query 
> can in principle upload multiple tables and join them all. 
> 
>> We should (imho) treat in-line uploads like all other parameters and
>> define a parameter namespace for uploaded in-line tables.  E.g. the name
>> of the parameter is 'upload:xxxx', where xxxx is the name of the table.
>>   (Any kind of prefix and separator would be fine, I'm just using the
>> first that comes to mind).  If this namespace (e.g., 'upload:' in my
>> example) is reserved for file uploads, then the protocol can allow
>> in-line uploads using the standard POST encoding -- or even for tiny
>> 'files' in GET requests.  The relationship between the TAP protocol and
>> HTTP is much simpler.  We have keywords and values and that's the only
>> thing we need to know.  TAP is completely independent of the encoding used.
> 
> I am not too sure about this. If the parameter names are dynamic like this, 
> services have to iterate through all the parameter values and parse/match 
> parameter names (in addition to parsing values, which is necessary now). 
> 
> Anyway, the table name is on  the value side now, so collisions are not an 
> issue as far as I can see. It doesn't seem overly http-specific to use
> 
> UPLOAD=myTable,http://example/com/mytable.xml
> 
> as it is just a value which is a pair... no more http-specific than any other 
> use of key=value parameters anyway.
> 

As I mentioned, but perhaps should have been more explicit about, I am 
only concerned with the in-line table upload feature here (section 
2.5.2) not the external file upload (2.5.1) with which I have no problems.

As I have thought about this further I am convinced that 2.5.2 is a 
seriously flawed design.  Let me go over the issues as I now see them:

- The current design strongly couples the semantics and the transport 
protocol.   I.e., all of the data that is transported using a particular 
syntax in HTTP POST's is to be treated as a file upload.  This has two 
major negative consequences.  It makes TAP more HTTP dependent since 
rather than using the abstraction of keyword/value pairs which could 
easily be implemented on some other protocol we need to look at exactly 
how we encoded data within the transport protocol to understand what 
certain parameters are.  If we want to layer TAP on some new XXXTP we 
have to explicitly think about how file uploads work.  It would come for 
free if we just use the same keyword/value abstraction we use everywhere 
else.

Even worse, we've now precluded TAP, and all protocols layered on top of 
it (ADQL, PQL, ...), from using in-line file uploads for any purpose 
other than table uploads.  What happens when we want to layer in a 
security protocol that needs to include a certificate as a file upload? 
  Or a user customization file or a file that ADQL6 uses to allow users 
to dynamically define a metadata structure or ....  We need to leave 
space for this kind of growth.

- The current design precludes in-line uploads except in a particular 
kind of encoding in POSTs.  Within the NVO we have already had cases 
where VOTables are sent using the standard POST encoding when we are 
sending data from one portal site to another.  I can even see niche uses 
of in-line uploads using GET requests (e.g., for testing).  The current 
design precludes this for no apparent gain.

In practice I expect that the majority of uses of in-line uploads will 
use the multi-part mime encoding, but I don't see where we gain any 
advantage from locking ourselves into that.

- Given the way the document is written it would be possible to have an 
uploaded table with a name that conflicts with the other keywords.  This 
will be difficult for some [all?] standard CGI handling packages to 
address.  E.g., suppose I have a parameter MTIME and a file upload for a 
table MTIME.  In Perl when I call
    my $x = CGI->new->param("MTIME");
I think I'm now in trouble since I don't think Perl defines what I'm to get.

- You suggest that using a namespaced approach, e.g., using a name like
'UPLOAD:xxx' to upload a table xxx, is difficult because users will need 
to scan all parameters to find which ones are the uploads.  But this is 
already the case since users will need to go through all of the 
parameters and look for those which are encoded in a particular way.  I 
suspect that in general it will be at least as easy to recognize file 
uploads using some syntax in the parameter name than looking for those 
whose values were encoded using a particular syntax.

The is no difficulty in uploading any number of tables using the 
approach I suggest so I don't think that discriminates between the two 
methods.

In the Java CGI class that I use, file uploads are not distinguishable 
from other parameters without a considerable degree of pain.  The class 
explicitly tries to hide the underlying HTTP encoding -- just as it 
hides the difference between parameters encoded in GET's versus POST's.


>> 3. I'm not sure I understand what the protocol is saying in general
>> about the HTTP status codes and requests.  E.g., if I do an asynchronous
>> call and I get a redirect, should I expect to get a 303 next?  Is that
>> legal?  This may be more editorial, but I think this area should be
>> parsed out a little more.
> 
> 303 is a redirect code (See Other) and UWS uses this in preference to the more 
> usual 302 (Temporarily Moved?).Maybe the text is just poorly worded, but the 
> intent is simply that redirects are done with a 303 instead of a 302.
> 

My concern here is that I don't think the document makes it clear what 
the normal and error status codes should be for each request.  An 
explicit table of such would be helpful.  This is particularly important 
when we, by policy, should return something other than a normal success 
code for what is in fact a successful request.  As a writer of a TAP 
server or client I'm not quite sure what I'm supposed to return or look 
for using the current document.

>> 4. Caching and getAvailability seem to bad things to have together.
>> Should the protocol explicitly forbid GET based getAvailability requests?
> 
> I think GET is correct from a REST perspective. Services can (should) deal 
> with caching issues following the http standard, eg:
> 
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9
> 
> if their availability data is volatile (which imo it is, even though the goal 
> is that it is not :-)
> 

My problem here is not with either the client or server where we have 
control of what we do but in the intermediaries where we may not.  The 
document referred to indicates that an HTTP 1.0 server need not be 
responsive to any cache control requests (the wording with regard to 
NO-CACHE pragma uses SHOULDs).  Even if we believed that all computers 
in the chain would respond properly to a no-cache request (or pragma), 
we probably need to note this issue in our document so that clients are 
aware of what they need to do. I don't think we use GETs for volatile 
data elsewhere.



More information about the dal mailing list