UWS 1.1 alternative, WAIT

Dave Morris dave.morris at metagrid.co.uk
Thu Jun 5 06:14:43 PDT 2014


Hi Paul,

On 2014-06-05 07:56, Paul Harrison wrote:

> ... I believe that existing implementations on the client and server 
> would need more extensive reworking in existing functionality, whereas 
> the new end point could be “added in” as extra functionality, which 
> seems more appropriate for a point change in the standard. I also think 
> that for the client to deal with a mixed ecology of 1.0 and 1.1 servers 
> is more difficult if you allow this behaviour on existing endpoints, 
> rather than the simple 404 on the single endpoint if the block 
> functionality is not there.

Allowing the WAIT param on _all_ the endpoints was something of a 'what 
if ..' thought experiment, and I haven't worked out all of the 
implications yet.

One of the advantages of allowing the WAIT parameter on the existing 
endpoints is that it avoids adding another redirect step. A GET to the 
/jobs/{jobid} endpoint with a WAIT would return the full job response, 
rather than requiring the client to follow a redirect.

Answers to some specific points below, but perhaps the pragmatic 
solution for v1.1 is to only allow the WAIT parameter on GET requests to 
the /jobs/{jobid} and /jobs/{jobid}/phase endpoints ?

This would seem to meet the majority of use cases while avoiding the 
potential side effects.

If we wanted to allow WAIT on any of the other endpoints, then there 
would be nothing preventing us from adding it in future versions.

> The ?PHASE=RUN is a command to the UWS to start the job if possible - 
> however, if the UWS is busy then the job might just be QUEUED rather 
> than immediately EXECUTING.
> The client would probably want to behave differently in these two 
> cases, and would prefer to know immediately which would mean that WAIT 
> is superfluous for the initial POST.

I agree that in most cases the client would want the initial POST to 
return immediately. I which case, if they don't need to add the WAIT 
param the method would respond as normal.

However, there may be some clients who might want the initial POST to 
block for a short period, to catch things like a really simple SELECT in 
TAP, that complete within a few seconds.
I which case, they could add the WAIT param, and if the query competes 
within the time limit they get a COMPLETED job in the response. The 
client - server interation would be the same as a v1.0 service where the 
job had completed immediately, without going through the QUEUED or 
EXECUTING phases (is that actually allowed in v1.0 ?).

> I also think that waiting for a response in the case of the initial 
> creation of the job blurs the distinction between the acceptance (or 
> not) of the job and the UWS and some sort of network failure, and makes 
> the logic that needs to be employed by the client more difficult.

If the job was rejected for whatever reason, the request would return 
immediately.

> it is simpler if the initial POST for job creation returns immediately 
> and the bloc> king (or not) interactions start from there.

I agree, in most cases the client would want the initial POST to return 
immediately. However, that isn't necessarily a reason for explicitly 
*not* allowing some clients to ask for a WAIT on that endpoint if they 
wanted it.

> We would need to think about “allowing” a server to respond before the 
> 60s is up, because in extremis that is just like saying that the server 
> can not block at all, and this is another reason for not having the 
> WAIT on all endpoints in a 1.1 version.

There were a number of reasons for allowing a server to respond early.

Firstly, as you say, an existing 1.0 server would ignore the wait and 
respond immediately.

Secondly, if we use a broad definition of 'state change', then a server 
may respond to a 'state change' that the client isn't interested in. In 
the TAP example, the server would trigger a 'state change' every time 
the row count is updated, but a client that didn't know about the row 
count would just see the server responding early for no reason.

I'm not proposing anything specific here, just suggesting we choose the 
wording to be as flexible as possible. There will always be cases where 
the server responds erlier that the client expects, for any number of 
reasons - up to and including a broken clock.

* If we write the specification to explicitly say the server _must_ 
delay the response for the specified time, then some client writers may 
try to rely on that.

* If we write the specification to say the server _may_ delay the 
response upto to the specified time, then we allow for things we haven't 
thought of yet and client writers shouldn't try to rely on things we 
haven't specified.

* In general, if the either server or client developer is in doubt, 
assume that replying early is better than replying later.

> In addition it is perfectly possible that there is no change after the 
> 60s

Once the time limit is reached the server would always return with 
whatever response the original v1.0 endpoint would have returned.

> and it is for this reason that I prefer the idea of a single special 
> blocking endpoint that always redirects to the /jobs/{jobid} endpoint - 
> all the information about what has changed or not is in the full job 
> response.

I agree allowing the WAIT on all the endpoints raises some concerns that 
need to be looked at in more detail.

The normal use case would probably be to do the inital POST with no 
WAIT, and then use a polling loop with ?WAIT=60 GET requests to the 
/jobs/{jobid} endpoint. Allowing a WAIT parameter on GET request to the 
/jobs/{jobid} endpoint avoids the additional redirect step needed by the 
special blocking endpoint.

Alternatively, could the new blocking endpoint just return the full job 
response rather than the redirect ? We could add a 
RESPONSE=[REDIRECT|INLINE] parameter to allow the client to choose a 
redirect or an inline response.

----
Using a separate blocking endpoint.

* Block until something 'interesting' happens, and then redirect to the 
job details.

         GET /jobs/{jobid}/blocking
         GET /jobs/{jobid}/blocking?RESPONSE=REDIRECT

* Block until something 'interesting' happens, and then return the job 
details response.

         GET /jobs/{jobid}/blocking?RESPONSE=INLINE

* Block for up to to 10 seconds or something 'interesting' happens, and 
then redirect to the job details.

         GET /jobs/{jobid}/blocking?WAIT=10
         GET /jobs/{jobid}/blocking?WAIT=10&RESPONSE=REDIRECT

* Block for up to to 10 seconds or something 'interesting' happens, and 
then return the job details response.

         GET /jobs/{jobid}/blocking?WAIT=10&RESPONSE=INLINE

----
Using the existing job details and phase endpoints.

* Block for up to to 10 seconds or something 'interesting' happens, and 
then return the normal response for this endpoint.

         GET /jobs/{jobid}?WAIT=10

* Block for up to to 10 seconds or something 'interesting' happens, and 
then return the normal response for this endpoint.

         GET /jobs/{jobid}/phase?WAIT=10

----

HTH, Dave

--------
Dave Morris
Software Developer
Wide Field Astronomy Unit
Institute for Astronomy
University of Edinburgh
--------



More information about the grid mailing list