UWS POST PHASE=RUN

Paul Harrison paul.harrison at manchester.ac.uk
Wed Nov 5 10:38:56 CET 2014


Hi,

Some more history for context - UWS tried to make a clean separation between control parameters ( destruction time etc.) and the JDL - so there was a two step process

1. job creation - by a POST of the JDL 
2. subsequent manipulation of the job by the job control parameters.

 This was done for two reasons.

1. to avoid any JDL parameter name clashes with the UWS job control parameters
2. to give the UWS a chance to assess what the resources a job needs so that it can react sensibly to requests to change the job control parameters.

However, there were people who wanted to be able to submit and run a job in one step so the compromise of being able to specify job control parameters at the same time as the initial POST was made - principally to be able to do a PHASE=RUN. I think that UWS 1.0  did not fully specify exactly how the job control parameters might interact with the JDL - which is why we are having  this current thread discussion, as allowing the JDL and control parameters at the same time is messy because their complexity needs are different.

On 2014-11 -04, at 11:04, Grégory Mantelet <gmantele at ari.uni-heidelberg.de> wrote:

> Hi,
> 
> From all what have been said until now, I see the 3 following methods to submit parameters:
> 
>    1/ "application/x-www-form-urlencoded"
> Only parameter/value pairs can be submitted with this method. However, these parameters can be the parameters needed to the job (JDL), the UWS parameters (execution duration, destruction time, PHASE=RUN, ...) or both.

one technical solution that might be seen as a bit of a kludge (but allowing the two parameter types at the same time is already nasty) to avoid any name clashes is to only allow the control parameters to be in the query part of the POST URL and the JDL parameters in the POST body - so in curl terms something like

curl --data "param1=value1&param2=value2" http://uwsserver.com/jobs?PHASE=RUN

The downside to this is that I know of at least one java server side toolkit which does not distinguish between the two sets of parameters and just merges the two sets of URL encoded parameters, so that there would still be the possibility of a name clash.
> 
>    2/ "multipart/form-data"
> Here, using DALI UPLOAD (like in TAP), parameters can be JDL, UWS parameters or both, as in 1/. So, the JDL can be submitted as parameter/value pairs or as one file (e.g. XML document). But it is also possible to submit other files which could be necessary to the job (for instance, if it is a job doing some processing over one or several files ; those files could be provided inline or as URL).
> 
>    3/ other
> Only the JDL can be provided. It is a blob, XML or other kind of document which is completely opaque and could not be interpreted by the UWS server. This document must be stored as provided and with no interpretation of the UWS server. It implies that UWS parameters (duration, destruction, ...) can not be provided in the initial POST and if the user want to modify their default value, it must do it after creation and before execution with one or several POST in "application/x-www-form-urlencoded”.

The UWS control parameters could be added to the URL as above. However, I am anyway comfortable with the idea that the client should have to make multiple calls to the server to perform UWS interactions - After all a web browser makes multiple calls to the server when displaying a typical web page, and no-one thinks that is strange...

> 
> Like Markus, I also think that 3/ could be a problem for large files and that's why 2/ is better. Then, allowing the 2 methods with the rule "if the file (JDL) is large, we use DALI UPLOAD", is for me source of problem for server, because giving a such choice to the user does generally not work. We can indeed see it in TAP with the choice of the synchronous and asynchronous execution. Generally users do not know the difference between the two modes or have not enough knowledge of DB query execution to decide which method should be used, and consequently, they generally stick to one method (which is often synchronous because the result is returned "immediately"). So, they have the choice, but are generally unable to decide which should be used...and that, is only when they know they have the choice and what is the rule to use.
> 

I don’t think that you cannot use potentially large files as a reason to reject option 3, as it is also possible to attach a very large file using multipart/form-data, so the same problem exists with option 2 and inline data. However, if this happens the UWS server can use HTTP mechanisms to reject the POST. You have to rely on the client being sensible and using “byReference” parameters where appropriate - UWS makes this easy as you can chain the output of one UWS with the input of another as a URL reference.

> So, I think that only the two first methods must be proposed, even if I understand the third one. The two first methods are simple, well-known and let cover all possible cases, whereas 3/ allows just one opaque file - eventually large - to be submitted and without any possibility to provide UWS parameters in the initial POST.


For me all three options are “natural” in the sense that they all conform to accepted HTTP norms, and thus should all be supported - I don’t think that we can reject option 3 as a possibility as we already have use of it in VOSpace I believe (and potentially elsewhere in standalone UWS servers). In addition It seems to me to be slightly unnatural to have to specify a parameter name on the client side for an opaque JDL if using multipart/form-data. The real question I think goes back to how to represent option 3 in the parameter API for access of UWS, and as I said in another response, it could be done  by the server giving a conventional name for the opaque JDL parameter and the “isPost" attribute.

Paul.


More information about the grid mailing list