Remote execution (code to the data)

Salgado, Jesus jesus.salgado at skao.int
Mon Oct 23 10:33:43 CEST 2023


Dear Markus,

Yes, you are right about the complexity. We are gathering information of the different techniques and software stacks used by different institutions to execute remote workflows and, obviously, we need to take them into consideration. There are two clear options; use one of them for the IVOA (what looks complicated as we are not imposing software stacks in the IVOA) or define an abstraction/simple layer. In any case, the first option is also a possibility (although we already know that our technologies do not look to be compatible).

Although knowing that this effort is quite complex, there is a need for some of our astronomical projects to provide a global solution for this problem (including new astronomical missions that provide a huge amount of data) so it is something we need to address.

Being optimistic, I think we could follow a similar approach that the one we used for TAP. Obviously, the complexity of execution of a query is high and make use of complex database technology but we are not facing this complexity but just creating an abstract layer on top that could be mapped.

Best Regards,
Jesús Salgado 
SKA Regional Centre Architect 
jesus.salgado at skao.int <mailto:jesus.salgado at skao.int>
www.skao.int <https://www.skao.int/> 
SKA Observatory
Jodrell Bank, Lower Withington,
Macclesfield, SK11 9FT, UK 




On 23/10/2023, 08:25, "grid on behalf of Markus Demleitner" <grid-bounces at ivoa.net <mailto:grid-bounces at ivoa.net> on behalf of msdemlei at ari.uni-heidelberg.de <mailto:msdemlei at ari.uni-heidelberg.de>> wrote:


Dear GWS,


On Thu, Oct 19, 2023 at 06:45:18AM +0100, Dave Morris wrote:
> Vicente was concerned that this may be too complex to implement in
> fast-moving digital scenario depicted by Cloud and Container players. It
> might be better to concentrate on what we already have and see if we can
> define some common patterns for accessing platforms based on a minimum
> compatibility at technology stack level (Kubernetes, S3 etc.). Based on this
> assessment we could measure the gap across organisations in order to
> implement PoCs for those that are close to each other.
>
> I'm following up on this in a mailing list thread because I have also heard
> similar concerns and suggestions from other people too.
>
> This thread is somewhere for us to discuss the pros and cons of the two
> directions, the abstract ExecutionPlanner interface, and the more pragmatic
> approach looking for common patterns in how we use the technologies.


Disclaimer: I'm not actually running any compute services and don't
intend to. I'm an outsider in this business, and I'd have kept my
mouth shut if others had chimed in. Since they haven't, and since I
think we need to discuss this, let me throw in some probably fairly
incompetent words.


As a general stance, I am very much in favour of giving users a
chance to avoid lock-ins, which (with network services) means
creating facilities to discover services -- e.g., Dave's
ExecutionPlanner -- and to have *somewhat* common interfaces to using
them -- e.g., Dave's ExecutionWorker.


I am hence very much in favour of attempting to adopt or develop
abstraction mechanisms where reasonable. What scares me a bit in
that context is that we try and develop our own standard for
compute service discovery and job submission when my impression is
everyone around us is doing that, too.


For instance, in the context of a national federation effort I've
been asked to participate in, there is
https://cobald-tardis.readthedocs.io/. Many <https://cobald-tardis.readthedocs.io/.  Many> other things are going
on that seem at least related, such as CERN's reana or -- this one
I'm planning to have a closer look at -- the common workflow language
CWL.


In comparison to many of these other efforts, the Execution Planner
in its iWD form seems simple and pragmatic to me. On the other hand,
these people solve lots of hard problems that we are probably
glossing over; I was fairly impressed by a talk about a complex
machinery that enables *a particular* submission service to feed
containers access tokens to *a particular* storage service so that
long-running jobs can keep writing when the storage access tokens
expire rapidly. The thought of having to write an interoperable
standard catering to this kind of thing makes we want to end this
mail.


-- Markus


(who's still dreaming that some advanced array manipulation scheme
like the one I talked about in Santiago --
http://wiki.ivoa.net/internal/IVOA/InterOpOct2017DAL/arraysql.pdf <http://wiki.ivoa.net/internal/IVOA/InterOpOct2017DAL/arraysql.pdf> --
accessible over plain old TAP would cover a substantial portion of
our code-to-data requirement beyond ADQL).

The SKA Observatory is an inter-governmental organisation and the successor of SKA Organisation, a private limited company by guarantee registered in England and Wales with registered number 07881918, with a registered office of Jodrell Bank Observatory, Lower Withington, Macclesfield, Cheshire, England, SK11 9FT.


This message is intended solely for the addressee and may contain confidential information. If you have received this message in error, please inform the sender, and immediately and permanently delete the email. Do not use, copy or disclose the information contained in this message or in any attachment.


This email has been scanned for viruses and malware, and may have been automatically archived, by Mimecast Ltd. Although SKA Observatory and SKA Organisation have taken reasonable precautions to ensure no viruses are present in this email, neither SKA Observatory nor SKA Organisation accept responsibility for any loss or damage sustained as a result of computer viruses and the recipient must ensure that the email (and attachments) are virus free.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/grid/attachments/20231023/b044dd27/attachment.htm>


More information about the grid mailing list