Modification Of UWS 1 to 1.1

Petr Skoda skoda at sunstel.asu.cas.cz
Thu Oct 13 05:52:17 PDT 2011



Hi Pierre and others,

As you remember we tried to use UWS for setting the VO-KOREL which is a 
"cloud-like" service for running the one particular FORTRAN program 
(korel) in a user-friendly environment - thus requiring the web browser to 
interact with the service (including job control). It may, of course be 
different from original requirements on UWS to provide asynchronous 
communication for TAP queries ....

However, I think that in the future our approach may be easily followed 
providing the astronomers with nice wrappers of their "boring " numerical 
code and so the "cloud" aspect will be more accented. So I will commnet 
your changes from this point of view:


On Thu, 13 Oct 2011, Pierre Le Sidaner wrote:
> | real simplification using REST Principe that will not allow multiple | 
> interpretation of a command
> |       creation and starting a job is on one phase

NO - it is exactly where current UWS is handy:

We have common such a scenario - user prepares in his working space (there 
is some kind of quota imposed for each user) several jobs - i.e. he 
uploads massive data sets and parameter sets for number of experiments 
(different spectral regions for disentangling, different set o finput 
spectra etc ...). As he is aware he has limited number of memory and 
processes, he has to decide what jobs to run in parallel. He may run it 
and disconnnect. Then he may use mobile device to look in his job list to 
see the results and by changing some parameters can rerun it immediately - 
here it means the creation of new job and run together.

But he can as well look in mobile job list and decide OK now I know the 
methods converges and I can run one large job prepared for some time and 
thus being in PENDING phase.

Rhe same with ABORT and DELETE - he may know how long should the typical 
run on given set take. But if it is running too long probably something is 
wrong and he can manually ABORT that job. Or he might run several smaller 
experiments and the long job in parallel, but it is clear that such a 
parameter combination does not make good results so he may ABORT that long 
job.

The DELETE as we understood the schema we use for deleting the whole 
space for the job -  it means both input data, parameter set and RESULTS 
(our system produces results even if job is not finshed - e.g. the part of 
convergence or divergence can be seen here as well as stdout (giving hint 
e.g. for parameter error).



> | We hope that this proposition will be much more easy to implement
> | both from server and client phase.

It is too short-sighted to see only the aspect of easiness  while loosing 
important interaction and lowering the user's comfort.

> | We have removed some useless messages on our point of view like abort | 
> from the user. Because it make the same thing as delete as you can
> | not retrieve the result as explain in 1.0 version.

I do not understand why you could not retrieve results after abort -
and it does not do the same:

in UWS1.0 sec 2.2.3.6
"A job may be aborted by POSTing to the /{jobs}/(job-id)/phase URI. The 
POST contains a single parameter PHASE=ABORT which instructs the UWS to 
attempt to abort the job. Aborting a job has the effect of stopping a job 
executing, but the resources associated with a job remain intact. "
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


sec 2.2.3.2:

" Deleting a Job

Sending a HTTP DELETE to a Job resource destroys that job, with the
                                         ^^^^^^^^^^^^^^^^
meaning noted in the definition of the Job object, above. "

and definition of the JOB:

"2.1.2. Job

A Job object contains the state of one job. The state is a collection of 
other objects. Each Job contains:

     *

       Exactly one Execution Phase.
     *

       Exactly one Execution Duration.
     *

       Exactly one Deletion Time
     *

       Exactly one Quote.
     *

       Exactly one Results List.
     *

       Exactly one Owner.
     *

       Zero or one Run Identifier.
     *

       Zero or one Error."
------------------

As we understand it - the DELETING means removing all remnants of the job 
- including the results.



> | We have remove pending phase as describe before. Job can be on
> | suspended phase, but it's only server action.

NO - suspend it is only in case the processor is not available.
But pending means the job space is created (data and parameters uploaded, 
item in database structure created etc.... But the real application (e.g. 
number cruchning code is neither run nor deployed on GRID by queue sub 
system.


> | We have add possibility to upload file and not only to give URL

it is exactly how we do this in VO-KOREL - upload of several files (one 
one or two data files and parameter files, we can upload as well all of 
this in one tgz file prepared by the user (for automation of the CLOUD 
service"

> | | What have to be discuss :
> | pagination for long job list.

Especially the isolation of individual users to see only their own jobs,

> | | We hope that you have many useful comment on the text. As the resource |

As i said we are probably doing something revolutionary with UWS and so 
our interpretation may be wrong (I hope Paul and Dave will comment on 
their primary ideas of UWS - but I liked the original design as very 
flexible)

Wht I dislike in your proposal as well is the requirement for server to 
give the estimate of runtime. In 99% of multiparametric optimization (even 
with genetic algorithms) you cannot predict the covergence. But you may 
impose initial limit just to prevent the closed loop or solution 
oscillation. But the user has to decide himself how to change the limit.
Thats the reason we allocate its user limited amount of processed and 
memory - but some queue priority would be probably better.


I feel the UWS (which is not a standard like protocol or data format, but 
a conceptual idea - pattern how to do things) should be rather expanded to 
emply more flexibility and design freedom even for yet unforeseen 
purposes, than to restrict it.

Perhaps we might define some implementation standard at the level of 
"protocol" for given purposes with all the MUST and MAY - e.g. as a 
requirements for pure machine-to-machine intearction (like is TAP).

I think the current TAP is well ortogonal so e.g. you can combine for your 
purpose the CreateJOb+Startjob (phase transition from 
Pending-RUN-Queuded-Executing)  as just two succesive calls .


I am not able to attend Puna, but will follow the program nearly on-line, 
so please put all the materials after preentation and discussion of UWS 
session on the wiki ASAP.

Best regards,

Petr


BTW you may find the description of VO-KOREL here :
http://www.ta3.sk/IB2E/posters/F05.pdf




*************************************************************************
*  Petr Skoda                         Phone : +420-323-649201, ext. 361 *
*  Stellar Department                         +420-323-620361           *
*  Astronomical Institute AS CR       Fax   : +420-323-620250           *
*  251 65 Ondrejov                    e-mail: skoda at sunstel.asu.cas.cz  *
*  Czech Republic                                                       *
*************************************************************************


More information about the grid mailing list