UWS 1.1 alternative, WAIT

Thu Jun 5 07:58:20 PDT 2014

Hi Dave et al.

I am afraid that I am still not being convinced by the ?WAIT on more than a single specialised endpoint - it is in the spirit of keeping things as simple as possible, and not disturbing what is already established for a 1.1 update. I also think that avoiding redirects should not be an aim of any changes - redirects are in the original specification to make the interface a RESTy (with out being a zealot about it) and in most cases clients will automatically follow redirects. The original interface by having well defined responses for the various end-points with little or no conditionality means that some of the RESTy benefits (like cacheing) can be realised. 

On 2014-06 -05, at 14:14, Dave Morris <dave.morris at metagrid.co.uk> wrote:

> Hi Paul,
> 
> On 2014-06-05 07:56, Paul Harrison wrote:
> 
>> ... I believe that existing implementations on the client and server would need more extensive reworking in existing functionality, whereas the new end point could be “added in” as extra functionality, which seems more appropriate for a point change in the standard. I also think that for the client to deal with a mixed ecology of 1.0 and 1.1 servers is more difficult if you allow this behaviour on existing endpoints, rather than the simple 404 on the single endpoint if the block functionality is not there.
> 
> Allowing the WAIT param on _all_ the endpoints was something of a 'what if ..' thought experiment, and I haven't worked out all of the implications yet.
> 
> One of the advantages of allowing the WAIT parameter on the existing endpoints is that it avoids adding another redirect step. A GET to the /jobs/{jobid} endpoint with a WAIT would return the full job response, rather than requiring the client to follow a redirect.
> 
> Answers to some specific points below, but perhaps the pragmatic solution for v1.1 is to only allow the WAIT parameter on GET requests to the /jobs/{jobid} and /jobs/{jobid}/phase endpoints ?
> 
> This would seem to meet the majority of use cases while avoiding the potential side effects.
> 
> If we wanted to allow WAIT on any of the other endpoints, then there would be nothing preventing us from adding it in future versions.
> 
>> The ?PHASE=RUN is a command to the UWS to start the job if possible - however, if the UWS is busy then the job might just be QUEUED rather than immediately EXECUTING.
>> The client would probably want to behave differently in these two cases, and would prefer to know immediately which would mean that WAIT is superfluous for the initial POST.
> 
> I agree that in most cases the client would want the initial POST to return immediately. I which case, if they don't need to add the WAIT param the method would respond as normal.
> 
> However, there may be some clients who might want the initial POST to block for a short period, to catch things like a really simple SELECT in TAP, that complete within a few seconds.
> I which case, they could add the WAIT param, and if the query competes within the time limit they get a COMPLETED job in the response. 

But it might not complete in time, so then they are back in the situation of having to GET again - the client logic is made more difficult by allowing this. In fact what you are asking for here is the /sync interface that is already part of TAP…. - What might be worth considering is making a new job creation endpoint part of UWS that basically follows the (currently informative) pattern described http://www.ivoa.net/documents/UWS/20101010/REC-UWS-1.0-20101010.html#SynchronousService - though I think that it is better to leave it informative as it relies on there being only one result.

> The client - server interation would be the same as a v1.0 service where the job had completed immediately, without going through the QUEUED or EXECUTING phases (is that actually allowed in v1.0 ?).

In 1.0 there is no way to start the job at the initial POST, but it is certainly true that after the PHASE=RUN is POSTed then if the job is very quick it could complete immediately http://www.ivoa.net/documents/UWS/20101010/REC-UWS-1.0-20101010.html#d1e1439

> 
>> I also think that waiting for a response in the case of the initial creation of the job blurs the distinction between the acceptance (or not) of the job and the UWS and some sort of network failure, and makes the logic that needs to be employed by the client more difficult.
> 
> If the job was rejected for whatever reason, the request would return immediately.

The point is that the client has no way of knowing if the delay is because there is actually a network failure or because it requested it (when it has)- if we say that the initial POST is to return immediately (as is the current situation) then it knows that a delay of more than a couple of seconds is very likely to be a network/server problem. The initial POST is how the client knows that it has “made contact” with the UWS and all the subsequent interactions depend on this initial contact.

> 
>> it is simpler if the initial POST for job creation returns immediately and the bloc> king (or not) interactions start from there.
> 
> I agree, in most cases the client would want the initial POST to return immediately. However, that isn't necessarily a reason for explicitly *not* allowing some clients to ask for a WAIT on that endpoint if they wanted it.

in the spirit of simplicity I would not allow it, because on balance I don’t think that it offers any real benefits.

> 
>> We would need to think about “allowing” a server to respond before the 60s is up, because in extremis that is just like saying that the server can not block at all, and this is another reason for not having the WAIT on all endpoints in a 1.1 version.
> 
> There were a number of reasons for allowing a server to respond early.
> 
> Firstly, as you say, an existing 1.0 server would ignore the wait and respond immediately.
> 
> Secondly, if we use a broad definition of 'state change', then a server may respond to a 'state change' that the client isn't interested in. In the TAP example, the server would trigger a 'state change' every time the row count is updated, but a client that didn't know about the row count would just see the server responding early for no reason.
> 
> I'm not proposing anything specific here, just suggesting we choose the wording to be as flexible as possible. There will always be cases where the server responds erlier that the client expects, for any number of reasons - up to and including a broken clock.
> 
> * If we write the specification to explicitly say the server _must_ delay the response for the specified time, then some client writers may try to rely on that.
> 
> * If we write the specification to say the server _may_ delay the response upto to the specified time, then we allow for things we haven't thought of yet and client writers shouldn't try to rely on things we haven't specified.
> 
> * In general, if the either server or client developer is in doubt, assume that replying early is better than replying later.
> 

I agree with the above - I believe that in the terminology of RFC 2119 the server SHOULD delay the response up to the specified time.

>> In addition it is perfectly possible that there is no change after the 60s
> 
> Once the time limit is reached the server would always return with whatever response the original v1.0 endpoint would have returned.
> 
>> and it is for this reason that I prefer the idea of a single special blocking endpoint that always redirects to the /jobs/{jobid} endpoint - all the information about what has changed or not is in the full job response.
> 
> I agree allowing the WAIT on all the endpoints raises some concerns that need to be looked at in more detail.
> 
> The normal use case would probably be to do the inital POST with no WAIT, and then use a polling loop with ?WAIT=60 GET requests to the /jobs/{jobid} endpoint. Allowing a WAIT parameter on GET request to the /jobs/{jobid} endpoint avoids the additional redirect step needed by the special blocking endpoint.

Although I think avoiding the redirect is out of scope, if the GET on the  /jobs/{jobid} were the only place where ?WAIT were allowed (and hence blocking), then I think that it could be a viable alternative to having a special endpoint - the only disadvantage I can see is that there is no easy way for a 1.1 client to determine that it is talking to a 1.0 server and abandon all attempts at blocking interactions.

Cheers,
	Paul.