TAP Implementation issues (cont'd): UWS

Fri Oct 30 09:28:10 PDT 2009

On 30 Oct 2009, at 15:58, Tom McGlynn wrote:

> Guy Rixon wrote:
>> On 30 Oct 2009, at 13:45, Tom McGlynn wrote:
>>> I've had a few questions with the implementation of the  
>>> asynchronous access for TAP.  Most of these are relevant to UWS  
>>> document generally rather than just TAP so I've copied the GWS  
>>> group in this mail.
>>>
>>> Tom McGlynn
>>>
>>> -- UWS general questions --
>>>
>>> - As defined a user needs to always do two web actions to start  
>>> the service.  Is there some reason that the user cannot simply  
>>> request the service to start running immediately?  I suspect that  
>>> that is what the user wants to do in 99% of the cases.  It would  
>>> be much easier for clients too.  The example given in the UWS  
>>> document of starting a job omits the error checking that the a  
>>> user presumably should do after starting the job.  Why not allow
>>>    {root}/tap/async?request=doQuery&query=...&phase=RUN
>>> to both create and start a query?  [Describe it as a POST if you  
>>> prefer.]
>> That would work, but would have to be POSTed.
> While this is what I want, I think it's not what the standard  
> currently specifies.  E.g.,
>  UWS 2.1.3  PENDING ... This is the state into which the job enters  
> when it is first created.
>
>  UWS 2.2.3.5 A job may be started by POSTing to the /{$jobs}/ 
> {job_id}/phase URI. ...
>
> There is no other way of starting the job specified.  Note also that  
> 2.2.3.5 says nothing about the current state of the job (vis a vis  
> the discussion a couple of points below).

Agreed. It's a possible revision to UWS.

>
>>>
>>> - I'm continue to be confused by the benefits conferred by various  
>>> practices.  Why do we require POSTs specifically in a number of  
>>> intances?  E.g., what would be wrong with using
>>>   ../jobid/phase?phase=RUN
>>> as an HTTP GET rather than an HTTP POST.  Since my code cannot  
>>> tell the difference between these I certainly will be supporting  
>>> both, but, other than bowing to the mantra REST, I'm not sure why  
>>> it's supposed to matter.
>> Whether or not a web service follows REST principles, it /has /to  
>> distinguish between requests that change the service state and  
>> requests that are idempotent. This is a basic part of HTTP.  
>> Starting a job changes the service state by creating web resources  
>> for that job. Sending the same query twice gets you two jobs doing  
>> the same query; they have separate web-resources. Therefore, not  
>> idempotent; therefore a POSTed request.
>> GET responses can be cached, and the caching is out of your control  
>> as a service provider - it may be on the user's LAN (HTTP proxy) or  
>> in their client (browser cache). If you send the same query twice  
>> then via GET, for the second request you could get the response for  
>> the first, pulled from the cache, and no new job. This doesn't  
>> happen too often but when it does it's brain-bendingly harder to  
>> debug.
>> I suggest that your code must not accept UWS create-job requests  
>> via HTTP GET. Your users won't like it if they get given the wrong  
>> answer from a cache. And Google tend to spider all the GETable URLs  
>> so you don't want them creating jobs.
>>
> There may be costs associated with having to deal with the caching  
> of GET requests and I should have been more temperate here.  I've  
> occasionally run into this myself when building AJAX services.  But  
> there are also substantial benefits to being able to use GET  
> requests and in practice I find that these greatly outweigh the  
> costs in all the cases that I've had to deal with.

It's not a trade-off. You cannot provide state-changing operations via  
HTTP-GET and comply to the HTTP RFC. It's simply illegal.

>
>>>
>>> - Similarly I don't think that there should be a strict limitation  
>>> of the coding used in sending requests.  There may be a  
>>> requirement that a given encoding be supported but there should  
>>> not be a requirement that it be used.  As with POST/GET this level  
>>> of HTTP detail is handled below the UWS logic in our  
>>> implementation, so for services that the HEASARC supports we'll be  
>>> allowing multipart/form encoding as well unless someone can tell  
>>> us why we should reject such requests.
>> You're free to support this as well as the stated encoding because  
>> it doesn't break anything. However, if you write a client that  
>> assumes this encoding then it won't work on all implementations. So  
>> it seems pointless to add the feature even in the service.   
>> Personally, I think that supporting broken clients in this way is  
>> not helpful.
>
> My concern here is that you are coupling the UWS standard to a lower  
> level of detail in the HTTP protocol than is necessary.  E.g.,  
> suppose we have a UWS service that includes file upload parameters.   
> Such a service is going to use mulitpart/form encoding for some of  
> its interactions.  As the standard is currently written it must  
> switch back and forth between encodings depending upon what's being  
> done.

The encoding is only mandated for POSTs to start a job, to abort a job  
and to change the destruction time. It's not specified for the POST to  
start a job because that POST is specified in the application of UWS,  
e.g. in TAP.

We have to specify some encoding; we can't leave it out or we get no  
interoperability. That's because this is about web services. If it  
were a web-browser interface with its own HTML forms then we'd leave  
the encoding out of the spec and fix it in the HTML because both the  
service and the form would be in one implementation.

>>>
>>> - From what states is a user allowed to start a job?  E.g., can a  
>>> user attempt to restart a job that has previously had an error or  
>>> aborted? Could the user change the parameters and then rerun the  
>>> same job?  I'm guessing this isn't supposed to happen, but I  
>>> didn't see where it was forbidden.
>> You can only change things while it's pending. If you need to re- 
>> run a failed job then you have to resubmit it.
> I don't see this stated in the protocol anywhere.  There is a  
> statement for phase ERROR that "... No further work will be done..."  
> which might be taken to imply you cannot do anything with a job that  
> failed with an error, but there is nothing anywhere else.
>
> 2.2.4 describes a message pattern, but the diagram is labeled  
> 'Typical Message Pattern' and there are clearly a number of  
> exceptions (e.g.,
> when there is an error or abort before execution)
>
> A statement somewhere that phases are ordered like
>    PENDING
>    QUEUED
>    EXECUTING
>    COMPLETED-ERROR-ABORTED
> and that you can change only to a later state would clarify this.

OK, this is "just" a change in the description and we can add this  
clarificaction.

>>>
>>> - What is supposed to happen if there is a problem in creating the  
>>> job.
>>> Should a job be created with an immediate status of ERROR?  Is  
>>> there any way of flagging an error if the system cannot create  
>>> even an error job?  E.g., we're going to use the database to store  
>>> all job information. What are we supposed to do if the database is  
>>> down?  It would be nice to be able to inform the user of an error  
>>> in a standard way.
>> If you can create the job at all then you should immediately set  
>> the phase to ERROR and make the error document available. If you  
>> can't do this, then I guess giving up with a 500 "I'm completely  
>> stuffed" error is reasonable. By extension, UWS clients need to  
>> deal minimally with 500 errors as well as with proper error- 
>> documents.
> ...
>>>
>>> -- TAP specific questions. --
>>>
>>> - The description of where to get the TAP result in an async  
>>> request is not given (as far as I can see) in what is described as  
>>> the normative parts of the document.  There it says that result  
>>> will be in
>>>  root/async/jobid/results/
>>> but this is the list of results and can, in principle, contain a  
>>> number of results. Only in section 5.2 which is described as  
>>> informative does it say that the result document is .../results/ 
>>> result.  Is this actually  a requirement or can the result be  
>>> named anything?
>> In my service implementation, I take it to be a requirement. In my  
>> client implementation it currently assumes the one result with the  
>> standard name but I plan to make it parse the list. (In case we add  
>> to the results list in future TAP versions.)
> I think we have to take it as a requirement now, but it really  
> should be specified in a normative section of the document (or  
> changed in response to the issue I raised below).
>
>>>
>>> - The UWS standard discusses the naming of results.  Does TAP  
>>> require a specific name for the result?  In fact it looks like the  
>>> way UWS is supposed to be used the jobid/results returns a  
>>> document that looks like
>>>
>>>   <results>
>>>      <result id='someid' xlink:href='someurl' />
>>>      <result id='anotherid' xlink:href='anotherurl' />
>>>   </results>
>>>
>>> and the user is supposed to find the id of the desired result and  
>>> use whatever URL is given there, not use a specifically defined  
>>> URL.  I'm guessing the the ID attributes of the <result> fields is  
>>> the UWS name. The UWS standard says
>>> "When a protocol specifies standard results it must do so by
>>>  naming those results; the names appear in the Results list in
>>>  addition to the URI's.  Not all results need to be named, sometimes
>>>  the meaning of the result is obvious from the context and the
>>>  name is omitted."
>>> Since the second sentence here seems to contradict the first it's  
>>> a bit hard to follow, but my reading of this is that it would be  
>>> better for TAP to specify a name for the output result rather than  
>>> a specific URL.
>> For a given service-protocol incorporating UWS, a result can be in  
>> one of thread cases:
>> - formally named and mandated by the protocol: name is fixed;  
>> result must be present when status=COMPLETED; clients can assume  
>> these things and bypass the results list;
>> - formally named and made optional by the protocol: name is fixed,  
>> result might not be present on job completion; clients can either  
>> use the results list to find whether it's there or just get its and  
>> handle the 404 if it's missing;
>> - not formally named: neither URI nor presence is predictable:  
>> clients must use the links in the results lists to find these  
>> results.
>> TAP has one result that is both named and mandated and nothing in  
>> the other two categories.
>
> According to the UWS protocol -- where I grant it is a bit unclear  
> so I'm working partially from the UWS <job> example though the text  
> quoted above certainly supports it- the name of the result is  
> independent of the URI used to access it.  Thus as far as I can tell  
> TAP mandates a result, but does not -- in this UWS sense -- name  
> it.  TAP specifies only the URI.  That seems a violation of the UWS  
> standard.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.ivoa.net/pipermail/grid/attachments/20091030/b61dadea/attachment-0007.html>