TAP Implementation issues (cont'd): UWS

Fri Oct 30 08:58:47 PDT 2009

Guy Rixon wrote:
> 
> On 30 Oct 2009, at 13:45, Tom McGlynn wrote:
> 
>> I've had a few questions with the implementation of the asynchronous 
>> access for TAP.  Most of these are relevant to UWS document generally 
>> rather than just TAP so I've copied the GWS group in this mail.
>>
>> Tom McGlynn
>>
>> -- UWS general questions --
>>
>> - As defined a user needs to always do two web actions to start the 
>> service.  Is there some reason that the user cannot simply request the 
>> service to start running immediately?  I suspect that that is what the 
>> user wants to do in 99% of the cases.  It would be much easier for 
>> clients too.  The example given in the UWS document of starting a job 
>> omits the error checking that the a user presumably should do after 
>> starting the job.  Why not allow
>>     {root}/tap/async?request=doQuery&query=...&phase=RUN
>> to both create and start a query?  [Describe it as a POST if you prefer.]
> 
> That would work, but would have to be POSTed.
> 
> 
While this is what I want, I think it's not what the standard currently 
specifies.  E.g.,
   UWS 2.1.3  PENDING ... This is the state into which the job enters 
when it is first created.

   UWS 2.2.3.5 A job may be started by POSTing to the 
/{$jobs}/{job_id}/phase URI. ...

There is no other way of starting the job specified.  Note also that 
2.2.3.5 says nothing about the current state of the job (vis a vis the 
discussion a couple of points below).

>>
>> - I'm continue to be confused by the benefits conferred by various 
>> practices.  Why do we require POSTs specifically in a number of 
>> intances?  E.g., what would be wrong with using
>>    ../jobid/phase?phase=RUN
>> as an HTTP GET rather than an HTTP POST.  Since my code cannot tell 
>> the difference between these I certainly will be supporting both, but, 
>> other than bowing to the mantra REST, I'm not sure why it's supposed 
>> to matter.
> 
> Whether or not a web service follows REST principles, it /has /to 
> distinguish between requests that change the service state and requests 
> that are idempotent. This is a basic part of HTTP. Starting a job 
> changes the service state by creating web resources for that job. 
> Sending the same query twice gets you two jobs doing the same query; 
> they have separate web-resources. Therefore, not idempotent; therefore a 
> POSTed request.
> 
> GET responses can be cached, and the caching is out of your control as a 
> service provider - it may be on the user's LAN (HTTP proxy) or in their 
> client (browser cache). If you send the same query twice then via GET, 
> for the second request you could get the response for the first, pulled 
> from the cache, and no new job. This doesn't happen too often but when 
> it does it's brain-bendingly harder to debug.
> 
> I suggest that your code must not accept UWS create-job requests via 
> HTTP GET. Your users won't like it if they get given the wrong answer 
> from a cache. And Google tend to spider all the GETable URLs so you 
> don't want them creating jobs.
> 
> 
>
There may be costs associated with having to deal with the caching of 
GET requests and I should have been more temperate here.  I've 
occasionally run into this myself when building AJAX services.  But 
there are also substantial benefits to being able to use GET requests 
and in practice I find that these greatly outweigh the costs in all the 
cases that I've had to deal with.

>>
>> - Similarly I don't think that there should be a strict limitation of 
>> the coding used in sending requests.  There may be a requirement that 
>> a given encoding be supported but there should not be a requirement 
>> that it be used.  As with POST/GET this level of HTTP detail is 
>> handled below the UWS logic in our implementation, so for services 
>> that the HEASARC supports we'll be allowing multipart/form encoding as 
>> well unless someone can tell us why we should reject such requests.
> 
> You're free to support this as well as the stated encoding because it 
> doesn't break anything. However, if you write a client that assumes this 
> encoding then it won't work on all implementations. So it seems 
> pointless to add the feature even in the service.  Personally, I think 
> that supporting broken clients in this way is not helpful.

My concern here is that you are coupling the UWS standard to a lower 
level of detail in the HTTP protocol than is necessary.  E.g., suppose 
we have a UWS service that includes file upload parameters.  Such a 
service is going to use mulitpart/form encoding for some of its 
interactions.  As the standard is currently written it must switch back 
and forth between encodings depending upon what's being done.
> 
>>
>> - From what states is a user allowed to start a job?  E.g., can a user 
>> attempt to restart a job that has previously had an error or aborted? 
>> Could the user change the parameters and then rerun the same job?  I'm 
>> guessing this isn't supposed to happen, but I didn't see where it was 
>> forbidden.
> 
> You can only change things while it's pending. If you need to re-run a 
> failed job then you have to resubmit it.
> 
I don't see this stated in the protocol anywhere.  There is a statement 
for phase ERROR that "... No further work will be done..." which might 
be taken to imply you cannot do anything with a job that failed with an 
error, but there is nothing anywhere else.

2.2.4 describes a message pattern, but the diagram is labeled 'Typical 
Message Pattern' and there are clearly a number of exceptions (e.g.,
when there is an error or abort before execution)

A statement somewhere that phases are ordered like
     PENDING
     QUEUED
     EXECUTING
     COMPLETED-ERROR-ABORTED
and that you can change only to a later state would clarify this.
> 
>>
>> - What is supposed to happen if there is a problem in creating the job.
>> Should a job be created with an immediate status of ERROR?  Is there 
>> any way of flagging an error if the system cannot create even an error 
>> job?  E.g., we're going to use the database to store all job 
>> information. What are we supposed to do if the database is down?  It 
>> would be nice to be able to inform the user of an error in a standard way.
> 
> If you can create the job at all then you should immediately set the 
> phase to ERROR and make the error document available. If you can't do 
> this, then I guess giving up with a 500 "I'm completely stuffed" error 
> is reasonable. By extension, UWS clients need to deal minimally with 500 
> errors as well as with proper error-documents.
> 
...
>>
>> -- TAP specific questions. --
>>
>> - The description of where to get the TAP result in an async request 
>> is not given (as far as I can see) in what is described as the 
>> normative parts of the document.  There it says that result will be in
>>   root/async/jobid/results/
>> but this is the list of results and can, in principle, contain a 
>> number of results. Only in section 5.2 which is described as 
>> informative does it say that the result document is 
>> .../results/result.  Is this actually  a requirement or can the result 
>> be named anything?
> 
> In my service implementation, I take it to be a requirement. In my 
> client implementation it currently assumes the one result with the 
> standard name but I plan to make it parse the list. (In case we add to 
> the results list in future TAP versions.)
> 
I think we have to take it as a requirement now, but it really should be 
specified in a normative section of the document (or changed in response 
to the issue I raised below).

> 
>>
>> - The UWS standard discusses the naming of results.  Does TAP require 
>> a specific name for the result?  In fact it looks like the way UWS is 
>> supposed to be used the jobid/results returns a document that looks like
>>
>>    <results>
>>       <result id='someid' xlink:href='someurl' />
>>       <result id='anotherid' xlink:href='anotherurl' />
>>    </results>
>>
>> and the user is supposed to find the id of the desired result and use 
>> whatever URL is given there, not use a specifically defined URL.  I'm 
>> guessing the the ID attributes of the <result> fields is the UWS name. 
>> The UWS standard says
>>  "When a protocol specifies standard results it must do so by
>>   naming those results; the names appear in the Results list in
>>   addition to the URI's.  Not all results need to be named, sometimes
>>   the meaning of the result is obvious from the context and the
>>   name is omitted."
>> Since the second sentence here seems to contradict the first it's a 
>> bit hard to follow, but my reading of this is that it would be better 
>> for TAP to specify a name for the output result rather than a specific 
>> URL.
> 
> For a given service-protocol incorporating UWS, a result can be in one 
> of thread cases:
> 
>  - formally named and mandated by the protocol: name is fixed; result 
> must be present when status=COMPLETED; clients can assume these things 
> and bypass the results list;
> 
>  - formally named and made optional by the protocol: name is fixed, 
> result might not be present on job completion; clients can either use 
> the results list to find whether it's there or just get its and handle 
> the 404 if it's missing;
> 
>  - not formally named: neither URI nor presence is predictable: clients 
> must use the links in the results lists to find these results.
> 
> TAP has one result that is both named and mandated and nothing in the 
> other two categories.

According to the UWS protocol -- where I grant it is a bit unclear so 
I'm working partially from the UWS <job> example though the text quoted 
above certainly supports it- the name of the result is independent of 
the URI used to access it.  Thus as far as I can tell TAP mandates a 
result, but does not -- in this UWS sense -- name it.  TAP specifies 
only the URI.  That seems a violation of the UWS standard.