Modification Of UWS 1 to 1.1

Pierre Le Sidaner pierre.lesidaner at obspm.fr
Sun Oct 16 05:54:16 PDT 2011


Hi Petr

I try to reply point by point
UWS is the language to manage job in asynchronous way at distance, we 
are clear that we define how to interface UWS service.
We want it to be simple just to simplify his implementation.

>
>
> Hi Pierre and others,
>
> As you remember we tried to use UWS for setting the VO-KOREL which is 
> a "cloud-like" service for running the one particular FORTRAN program 
> (korel) in a user-friendly environment - thus requiring the web 
> browser to interact with the service (including job control). It may, 
> of course be different from original requirements on UWS to provide 
> asynchronous communication for TAP queries ....
>
> However, I think that in the future our approach may be easily 
> followed providing the astronomers with nice wrappers of their "boring 
> " numerical code and so the "cloud" aspect will be more accented. So I 
> will commnet your changes from this point of view:
>
>
> On Thu, 13 Oct 2011, Pierre Le Sidaner wrote:
>> | real simplification using REST Principe that will not allow 
>> multiple | interpretation of a command
>> |       creation and starting a job is on one phase
>
> NO - it is exactly where current UWS is handy:
>
> We have common such a scenario - user prepares in his working space 
> (there is some kind of quota imposed for each user) several jobs - 
> i.e. he uploads massive data sets and parameter sets for number of 
> experiments (different spectral regions for disentangling, different 
> set o finput spectra etc ...). As he is aware he has limited number of 
> memory and processes, he has to decide what jobs to run in parallel. 
> He may run it and disconnnect. Then he may use mobile device to look 
> in his job list to see the results and by changing some parameters can 
> rerun it immediately - here it means the creation of new job and run 
> together.
 From my point of view. The management of : witch job has to be sent in 
parallel, what is the available RAM, what is the processors speed is not 
the user problem.
He has to send job. He can send many of them, he don't have to know if 
other users send job at the same time. This is an infrastructure problem 
manage by the provider that can have many kind of cluster, scheduler, 
batch queue.
So the user send 100 job. They are placed in a queue.
Usually the scheduler send this job on the available CPU using the 
knowledge of the ram, the number of CPU and the time reserve for execution.
This is how it work on every cluster.
>
> But he can as well look in mobile job list and decide OK now I know 
> the methods converges and I can run one large job prepared for some 
> time and thus being in PENDING phase.
Don't reinvent the scheduler, job are on queue and with you mobile you 
can see the status of all your job if you have a web interface to see so.
>
> Rhe same with ABORT and DELETE - he may know how long should the 
> typical run on given set take. But if it is running too long probably 
> something is wrong and he can manually ABORT that job. Or he might run 
> several smaller experiments and the long job in parallel, but it is 
> clear that such a parameter combination does not make good results so 
> he may ABORT that long job.
>
> The DELETE as we understood the schema we use for deleting the whole 
> space for the job -  it means both input data, parameter set and 
> RESULTS (our system produces results even if job is not finshed - e.g. 
> the part of convergence or divergence can be seen here as well as 
> stdout (giving hint e.g. for parameter error).
>
> There is yet no dédicated space to the user out of VOSpace.
So Input parameters are anyway destroyed after the job.
But you tell me that you want intermediate result from a job you decided 
to stop.
we have to think about that and reintroduce abort if it's necessary.




>
>> | We hope that this proposition will be much more easy to implement
>> | both from server and client phase.
>
> It is too short-sighted to see only the aspect of easiness  while 
> loosing important interaction and lowering the user's comfort.
I really don't see the point.
User will not talk UWS, he will have an interface and will play with it
we only describe exchange message between this interface and the server.
we have to define all the possible user requirement in this exchange 
message. I don't see where the lowering comfort is. But I am open to any 
extension if our proposition does not fulfil the requirement.
We are only try to make the standard more clear by defining message sent 
back from every actions and limiting the number of method to do the same 
action to limit ambiguity not comfort. Then implementation will be 
easier and more efficient.

>
>> | We have removed some useless messages on our point of view like 
>> abort | from the user. Because it make the same thing as delete as 
>> you can
>> | not retrieve the result as explain in 1.0 version.
>
> I do not understand why you could not retrieve results after abort -
> and it does not do the same:
>
> in UWS1.0 sec 2.2.3.6
> "A job may be aborted by POSTing to the /{jobs}/(job-id)/phase URI. 
> The POST contains a single parameter PHASE=ABORT which instructs the 
> UWS to attempt to abort the job. Aborting a job has the effect of 
> stopping a job executing, but the resources associated with a job 
> remain intact. "
>                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Ok as I say before, I understand this need and we have to face it, you 
are right
>
>
> sec 2.2.3.2:
>
> " Deleting a Job
>
> Sending a HTTP DELETE to a Job resource destroys that job, with the
>                                         ^^^^^^^^^^^^^^^^
> meaning noted in the definition of the Job object, above. "
>
> and definition of the JOB:
>
> "2.1.2. Job
>
> A Job object contains the state of one job. The state is a collection 
> of other objects. Each Job contains:
>
>     *
>
>       Exactly one Execution Phase.
>     *
>
>       Exactly one Execution Duration.
>     *
>
>       Exactly one Deletion Time
>     *
>
>       Exactly one Quote.
>     *
>
>       Exactly one Results List.
>     *
>
>       Exactly one Owner.
>     *
>
>       Zero or one Run Identifier.
>     *
>
>       Zero or one Error."
> ------------------
>
> As we understand it - the DELETING means removing all remnants of the 
> job - including the results.
Yes
>
>
>
>> | We have remove pending phase as describe before. Job can be on
>> | suspended phase, but it's only server action.
>
> NO - suspend it is only in case the processor is not available.
> But pending means the job space is created (data and parameters 
> uploaded, item in database structure created etc.... But the real 
> application (e.g. number cruchning code is neither run nor deployed on 
> GRID by queue sub system.
As I say before
We have to face the problem of intermediate result
But not to substitute ourselves to the job manager.
Otherwise when there is multiple user and multiple CPU, it become 
unmanageable. It's not the purpose of UWS, it's provider internal business.
>
>
>> | We have add possibility to upload file and not only to give URL
>
> it is exactly how we do this in VO-KOREL - upload of several files 
> (one one or two data files and parameter files, we can upload as well 
> all of this in one tgz file prepared by the user (for automation of 
> the CLOUD service"
>
>> | | What have to be discuss :
>> | pagination for long job list.
>
> Especially the isolation of individual users to see only their own jobs,
This can be done only if we identify user.
If we want to define the mechanism of identification, then we propose to 
use HTTP authentication mechanism that is an RFC
But I agree it's the next step we have to face.
>
>> | | We hope that you have many useful comment on the text. As the 
>> resource |
>
> As i said we are probably doing something revolutionary with UWS and 
> so our interpretation may be wrong (I hope Paul and Dave will comment 
> on their primary ideas of UWS - but I liked the original design as 
> very flexible)
>
> Wht I dislike in your proposal as well is the requirement for server 
> to give the estimate of runtime. In 99% of multiparametric 
> optimization (even with genetic algorithms) you cannot predict the 
> covergence. But you may impose initial limit just to prevent the 
> closed loop or solution oscillation. But the user has to decide 
> himself how to change the limit.
> Thats the reason we allocate its user limited amount of processed and 
> memory - but some queue priority would be probably better.
I understand the difficulty of time prediction.
But you know better the code than the user. So you are able to make an 
initial estimation of time duration.
The queue priority depend on the resource you ask. You can not have the 
same priority for a quick job mono processor and a multi processor job 
that use all the memory of the machine and run on 200h.
So if you say my job is a 5mn job, then I change my mind it will use 
200h. It's not acceptable for the other users and it brake all the 
mechanism of the job queue that decide the job scheduling process.
>
>
> I feel the UWS (which is not a standard like protocol or data format, 
> but a conceptual idea - pattern how to do things) should be rather 
> expanded to emply more flexibility and design freedom even for yet 
> unforeseen purposes, than to restrict it.
As you say UWS is not like Simple access protocol. It's the definition 
of a language to manage distant job.
In UWS you ask a service the way to query them and retrieve result. From 
this information you are able to make a client that send job and 
retrieve results.
But we didn't give any recommendation on web interface.
Our purpose is not to reduce functionality, but just to remove ambiguity.
If you can send a job from 3 different manner, it mean that people 
implementing server side  will have to face choice usually not clear
it mean that client must implement the 3 manner
it meant that the description of the service have to face a description 
of 3 maners.
I talk only about sending job, we can talk about parameters, job status ...


>
> Perhaps we might define some implementation standard at the level of 
> "protocol" for given purposes with all the MUST and MAY - e.g. as a 
> requirements for pure machine-to-machine intearction (like is TAP).
>
> I think the current TAP is well ortogonal so e.g. you can combine for 
> your purpose the CreateJOb+Startjob (phase transition from 
> Pending-RUN-Queuded-Executing)  as just two succesive calls .
I just say, give me a good reason to create a job and not sending it. 
Why if you create a job you don't want to send it. And why you are not 
able to wait until the time you are ready to send directly the job 
without going to this pending status.
Then I will have to face this new phase, a new life time of job in 
pending phase.

Anyway thank you for your comment and proposition. I hope to read you soon.

Regards
Pierre


-- 
-------------------------------------------------------------------------
                            Pierre Le Sidaner
                         Observatoire de Paris

Division Informatique de l'Observatoire
Observatoire Virtuel 01 40 51 20 89
61, avenue de l'Observatoire 75014 Paris

mailto:pierre.lesidaner at obspm.fr
http://vo-web.obspm.fr

--------------------------------------------------------------------------



More information about the grid mailing list