UWS 1.1 alternative, WAIT
Dave Morris
dave.morris at metagrid.co.uk
Thu Jun 5 06:14:43 PDT 2014
Hi Paul,
On 2014-06-05 07:56, Paul Harrison wrote:
> ... I believe that existing implementations on the client and server
> would need more extensive reworking in existing functionality, whereas
> the new end point could be “added in” as extra functionality, which
> seems more appropriate for a point change in the standard. I also think
> that for the client to deal with a mixed ecology of 1.0 and 1.1 servers
> is more difficult if you allow this behaviour on existing endpoints,
> rather than the simple 404 on the single endpoint if the block
> functionality is not there.
Allowing the WAIT param on _all_ the endpoints was something of a 'what
if ..' thought experiment, and I haven't worked out all of the
implications yet.
One of the advantages of allowing the WAIT parameter on the existing
endpoints is that it avoids adding another redirect step. A GET to the
/jobs/{jobid} endpoint with a WAIT would return the full job response,
rather than requiring the client to follow a redirect.
Answers to some specific points below, but perhaps the pragmatic
solution for v1.1 is to only allow the WAIT parameter on GET requests to
the /jobs/{jobid} and /jobs/{jobid}/phase endpoints ?
This would seem to meet the majority of use cases while avoiding the
potential side effects.
If we wanted to allow WAIT on any of the other endpoints, then there
would be nothing preventing us from adding it in future versions.
> The ?PHASE=RUN is a command to the UWS to start the job if possible -
> however, if the UWS is busy then the job might just be QUEUED rather
> than immediately EXECUTING.
> The client would probably want to behave differently in these two
> cases, and would prefer to know immediately which would mean that WAIT
> is superfluous for the initial POST.
I agree that in most cases the client would want the initial POST to
return immediately. I which case, if they don't need to add the WAIT
param the method would respond as normal.
However, there may be some clients who might want the initial POST to
block for a short period, to catch things like a really simple SELECT in
TAP, that complete within a few seconds.
I which case, they could add the WAIT param, and if the query competes
within the time limit they get a COMPLETED job in the response. The
client - server interation would be the same as a v1.0 service where the
job had completed immediately, without going through the QUEUED or
EXECUTING phases (is that actually allowed in v1.0 ?).
> I also think that waiting for a response in the case of the initial
> creation of the job blurs the distinction between the acceptance (or
> not) of the job and the UWS and some sort of network failure, and makes
> the logic that needs to be employed by the client more difficult.
If the job was rejected for whatever reason, the request would return
immediately.
> it is simpler if the initial POST for job creation returns immediately
> and the bloc> king (or not) interactions start from there.
I agree, in most cases the client would want the initial POST to return
immediately. However, that isn't necessarily a reason for explicitly
*not* allowing some clients to ask for a WAIT on that endpoint if they
wanted it.
> We would need to think about “allowing” a server to respond before the
> 60s is up, because in extremis that is just like saying that the server
> can not block at all, and this is another reason for not having the
> WAIT on all endpoints in a 1.1 version.
There were a number of reasons for allowing a server to respond early.
Firstly, as you say, an existing 1.0 server would ignore the wait and
respond immediately.
Secondly, if we use a broad definition of 'state change', then a server
may respond to a 'state change' that the client isn't interested in. In
the TAP example, the server would trigger a 'state change' every time
the row count is updated, but a client that didn't know about the row
count would just see the server responding early for no reason.
I'm not proposing anything specific here, just suggesting we choose the
wording to be as flexible as possible. There will always be cases where
the server responds erlier that the client expects, for any number of
reasons - up to and including a broken clock.
* If we write the specification to explicitly say the server _must_
delay the response for the specified time, then some client writers may
try to rely on that.
* If we write the specification to say the server _may_ delay the
response upto to the specified time, then we allow for things we haven't
thought of yet and client writers shouldn't try to rely on things we
haven't specified.
* In general, if the either server or client developer is in doubt,
assume that replying early is better than replying later.
> In addition it is perfectly possible that there is no change after the
> 60s
Once the time limit is reached the server would always return with
whatever response the original v1.0 endpoint would have returned.
> and it is for this reason that I prefer the idea of a single special
> blocking endpoint that always redirects to the /jobs/{jobid} endpoint -
> all the information about what has changed or not is in the full job
> response.
I agree allowing the WAIT on all the endpoints raises some concerns that
need to be looked at in more detail.
The normal use case would probably be to do the inital POST with no
WAIT, and then use a polling loop with ?WAIT=60 GET requests to the
/jobs/{jobid} endpoint. Allowing a WAIT parameter on GET request to the
/jobs/{jobid} endpoint avoids the additional redirect step needed by the
special blocking endpoint.
Alternatively, could the new blocking endpoint just return the full job
response rather than the redirect ? We could add a
RESPONSE=[REDIRECT|INLINE] parameter to allow the client to choose a
redirect or an inline response.
----
Using a separate blocking endpoint.
* Block until something 'interesting' happens, and then redirect to the
job details.
GET /jobs/{jobid}/blocking
GET /jobs/{jobid}/blocking?RESPONSE=REDIRECT
* Block until something 'interesting' happens, and then return the job
details response.
GET /jobs/{jobid}/blocking?RESPONSE=INLINE
* Block for up to to 10 seconds or something 'interesting' happens, and
then redirect to the job details.
GET /jobs/{jobid}/blocking?WAIT=10
GET /jobs/{jobid}/blocking?WAIT=10&RESPONSE=REDIRECT
* Block for up to to 10 seconds or something 'interesting' happens, and
then return the job details response.
GET /jobs/{jobid}/blocking?WAIT=10&RESPONSE=INLINE
----
Using the existing job details and phase endpoints.
* Block for up to to 10 seconds or something 'interesting' happens, and
then return the normal response for this endpoint.
GET /jobs/{jobid}?WAIT=10
* Block for up to to 10 seconds or something 'interesting' happens, and
then return the normal response for this endpoint.
GET /jobs/{jobid}/phase?WAIT=10
----
HTH, Dave
--------
Dave Morris
Software Developer
Wide Field Astronomy Unit
Institute for Astronomy
University of Edinburgh
--------
More information about the grid
mailing list