new UWS 1.1 WD

Mark Taylor M.B.Taylor at bristol.ac.uk
Tue Oct 7 15:45:02 CEST 2014


Paul & grid,

On Tue, 30 Sep 2014, Paul Harrison wrote:

> I have uploaded a new version of the UWS 1.1 WD to http://www.ivoa.net/documents/UWS/20140930/ which contains changes which are largely as a result of the discussion that happened in this thread http://mail.ivoa.net/pipermail/grid/2014-June/002609.html
> 
> In summary the main change from the previous version is that the blocking behaviour introduced in that version has been moved from a specific custom endpoint to being signalled my a ?WAIT query parameter on the /{jobs}/{job-id} endpoint.

I still think there is potential for a race condition here.

Consider this sequence of events:

   1. Client requests status from server
   2. Server returns status: it's QUEUED
   3. Server changes status to EXECUTING
   4. Client makes blocking call to server to find out when status changes
   5.   ... wait ...
   6. Server changes status to COMPLETED
   7. Blocking call returns, client finds out that status is COMPLETED

As far as the client is concerned, the job transitions from QUEUED
straight to COMPLETED, it never sees the (potentially long-lived)
EXECUTING phase.  With the existing REST API there's nothing the
client can do to reliably avoid this (well, it could asychronously
issue another non-blocking status request after the blocking one
has started to check it hasn't changed, but that's (a) messy and
(b) not bulletproof since you don't know when the blocking call
actually starts blocking, i.e. how long to wait after the start
of the blocking call before you do it).

Now this won't happen very often, it's only in the unlucky case that the
server changes phase just after the client makes the status request.
Also, the consequences are not disastrous, since the terminal phases
(here COMPLETED) doesn't block.  Maybe for those reasons we don't
care enough to do anything about it.  But, it's not Right.

To avoid the problem you need an an atomic operation that both
determines current phase and requests blocking until phase changes.
(An analogous issue is why in Java you are only allowed to call
Object.wait() if you have synchronized on that object).

One possibility would be for the Job document to include a blocking
URL that anybody is allowed to call to find out when the status
changes from the status reported in that document (if it's not
obvious how that could be implemented, I can provide a sketch).
Another is what I suggested when I first emailed about this issue
in relation to the previous WD here:
http://mail.ivoa.net/pipermail/grid/2014-June/002609.html

Mark
   
--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/


More information about the grid mailing list