Modification Of UWS 1 to 1.1
Petr Skoda
skoda at sunstel.asu.cas.cz
Thu Oct 13 05:52:17 PDT 2011
Hi Pierre and others,
As you remember we tried to use UWS for setting the VO-KOREL which is a
"cloud-like" service for running the one particular FORTRAN program
(korel) in a user-friendly environment - thus requiring the web browser to
interact with the service (including job control). It may, of course be
different from original requirements on UWS to provide asynchronous
communication for TAP queries ....
However, I think that in the future our approach may be easily followed
providing the astronomers with nice wrappers of their "boring " numerical
code and so the "cloud" aspect will be more accented. So I will commnet
your changes from this point of view:
On Thu, 13 Oct 2011, Pierre Le Sidaner wrote:
> | real simplification using REST Principe that will not allow multiple |
> interpretation of a command
> | creation and starting a job is on one phase
NO - it is exactly where current UWS is handy:
We have common such a scenario - user prepares in his working space (there
is some kind of quota imposed for each user) several jobs - i.e. he
uploads massive data sets and parameter sets for number of experiments
(different spectral regions for disentangling, different set o finput
spectra etc ...). As he is aware he has limited number of memory and
processes, he has to decide what jobs to run in parallel. He may run it
and disconnnect. Then he may use mobile device to look in his job list to
see the results and by changing some parameters can rerun it immediately -
here it means the creation of new job and run together.
But he can as well look in mobile job list and decide OK now I know the
methods converges and I can run one large job prepared for some time and
thus being in PENDING phase.
Rhe same with ABORT and DELETE - he may know how long should the typical
run on given set take. But if it is running too long probably something is
wrong and he can manually ABORT that job. Or he might run several smaller
experiments and the long job in parallel, but it is clear that such a
parameter combination does not make good results so he may ABORT that long
job.
The DELETE as we understood the schema we use for deleting the whole
space for the job - it means both input data, parameter set and RESULTS
(our system produces results even if job is not finshed - e.g. the part of
convergence or divergence can be seen here as well as stdout (giving hint
e.g. for parameter error).
> | We hope that this proposition will be much more easy to implement
> | both from server and client phase.
It is too short-sighted to see only the aspect of easiness while loosing
important interaction and lowering the user's comfort.
> | We have removed some useless messages on our point of view like abort |
> from the user. Because it make the same thing as delete as you can
> | not retrieve the result as explain in 1.0 version.
I do not understand why you could not retrieve results after abort -
and it does not do the same:
in UWS1.0 sec 2.2.3.6
"A job may be aborted by POSTing to the /{jobs}/(job-id)/phase URI. The
POST contains a single parameter PHASE=ABORT which instructs the UWS to
attempt to abort the job. Aborting a job has the effect of stopping a job
executing, but the resources associated with a job remain intact. "
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
sec 2.2.3.2:
" Deleting a Job
Sending a HTTP DELETE to a Job resource destroys that job, with the
^^^^^^^^^^^^^^^^
meaning noted in the definition of the Job object, above. "
and definition of the JOB:
"2.1.2. Job
A Job object contains the state of one job. The state is a collection of
other objects. Each Job contains:
*
Exactly one Execution Phase.
*
Exactly one Execution Duration.
*
Exactly one Deletion Time
*
Exactly one Quote.
*
Exactly one Results List.
*
Exactly one Owner.
*
Zero or one Run Identifier.
*
Zero or one Error."
------------------
As we understand it - the DELETING means removing all remnants of the job
- including the results.
> | We have remove pending phase as describe before. Job can be on
> | suspended phase, but it's only server action.
NO - suspend it is only in case the processor is not available.
But pending means the job space is created (data and parameters uploaded,
item in database structure created etc.... But the real application (e.g.
number cruchning code is neither run nor deployed on GRID by queue sub
system.
> | We have add possibility to upload file and not only to give URL
it is exactly how we do this in VO-KOREL - upload of several files (one
one or two data files and parameter files, we can upload as well all of
this in one tgz file prepared by the user (for automation of the CLOUD
service"
> | | What have to be discuss :
> | pagination for long job list.
Especially the isolation of individual users to see only their own jobs,
> | | We hope that you have many useful comment on the text. As the resource |
As i said we are probably doing something revolutionary with UWS and so
our interpretation may be wrong (I hope Paul and Dave will comment on
their primary ideas of UWS - but I liked the original design as very
flexible)
Wht I dislike in your proposal as well is the requirement for server to
give the estimate of runtime. In 99% of multiparametric optimization (even
with genetic algorithms) you cannot predict the covergence. But you may
impose initial limit just to prevent the closed loop or solution
oscillation. But the user has to decide himself how to change the limit.
Thats the reason we allocate its user limited amount of processed and
memory - but some queue priority would be probably better.
I feel the UWS (which is not a standard like protocol or data format, but
a conceptual idea - pattern how to do things) should be rather expanded to
emply more flexibility and design freedom even for yet unforeseen
purposes, than to restrict it.
Perhaps we might define some implementation standard at the level of
"protocol" for given purposes with all the MUST and MAY - e.g. as a
requirements for pure machine-to-machine intearction (like is TAP).
I think the current TAP is well ortogonal so e.g. you can combine for your
purpose the CreateJOb+Startjob (phase transition from
Pending-RUN-Queuded-Executing) as just two succesive calls .
I am not able to attend Puna, but will follow the program nearly on-line,
so please put all the materials after preentation and discussion of UWS
session on the wiki ASAP.
Best regards,
Petr
BTW you may find the description of VO-KOREL here :
http://www.ta3.sk/IB2E/posters/F05.pdf
*************************************************************************
* Petr Skoda Phone : +420-323-649201, ext. 361 *
* Stellar Department +420-323-620361 *
* Astronomical Institute AS CR Fax : +420-323-620250 *
* 251 65 Ondrejov e-mail: skoda at sunstel.asu.cas.cz *
* Czech Republic *
*************************************************************************
More information about the grid
mailing list