JSAMP HUB waits for callee before returning to caller
Hugo Buddelmeijer
buddel at astro.rug.nl
Wed Jun 25 00:54:49 PDT 2014
Mark,
Thank you for your quick response.
JSAMP_async indeed behaves better as far as my concerns went. I did not
test the shutdown functionality.
I will improve the SAMP clients under my responsibility as well. It
seems best to create a separate threads for interfaces controlling the
software (UI, SAMP), which queues (time-consuming) actions in a main,
interface-less thread doing the work. However, the new HUB behaviour is
very useful for short programs like 'client1.py' and it protects 'good'
clients from the 'bad' ones.
Perhaps most clients separate their interface properly, hence nobody
complaining so far. I noticed the delays before, but SAMP was not a high
priority for me at that time and waiting a few seconds is usually not so
bad because you need the action performed anyway. However, our recent
SAMP experiments, with two clients sending messages back and forth
autonomously, failed spectacularly because each client was waiting on
the other to close the HTTP session, resulting in a lock-up of the
entire system.
Also great to see that JSAMP is on github now. Since I had never used
JSAMP stand-alone (always through TOPCAT webstart), I went through the
IVOA website to (not) find it.
The IVOA twiki
http://wiki.ivoa.net/twiki/bin/view/IVOA/SampSoftware
still links to
http://software.astrogrid.org/doc/jsamp/
which links to google
https://code.google.com/p/astrogrid/
for the source repository.
Perhaps the IVOA twiki should be updated to point to
http://www.star.bristol.ac.uk/~mbt/jsamp/
?
(It being a wiki, I finally decided to make an account, but it seems you
need to have an account in order to make one, so this failed.)
Greetings,
Hugo
On 24-06-14 18:31, Mark Taylor wrote:
> Hugo,
>
> thank you for this carefully explained request.
>
> As you've noted, JSAMP used to work as you suggest, but I changed
> the behaviour at v1.2. The change wasn't related to the introduction
> of the web profile, it was to fix some problems associated with
> forced hub shutdown. The relevant commit is here:
>
> https://github.com/mbtaylor/jsamp/commit/764a76e39
>
> and I noted in the change log:
>
> "Fixed, I think, threading issues that occasionally prevented hub forced
> shutdown notifications getting to some clients. It is possible this
> fix will have knock-on performance or other effects, especially in
> the presence of badly-behaved clients - please report if you notice
> problems. Thanks (again) to Laurent Borgès for extensive help
> with this."
>
> though this is the first time somebody has reported problems.
>
> Thinking about it, especially with reference to your examples,
> it probably ought to work as you suggest. But since it's to do with
> threading, it's hard to test whether it's going to break things
> that are currently working, so I'm a bit nervous about attempting
> to change it.
>
> I've made some alterations which I think ought to fix it to work the
> way you want, but still enable the shutdown code to work correctly.
> I've done this on the branch async-call in the github repository:
>
> https://github.com/mbtaylor/jsamp/tree/async-call
>
> You can find a built version with the changes here:
>
> ftp://andromeda.star.bris.ac.uk/pub/star/jsamp/pre/jsamp_async-call.jar
>
> Could you try this out and see if it behaves better as far as you're
> concerned?
>
> I'd also be very grateful if others could play with it and see whether
> it introduces any new problems. In particular the guys at JMMC
> (Laurent and Sylvain) have given me a lot of help with this stuff
> in the past, so if they could try it out that would be great.
>
> Mark
>
> On Mon, 23 Jun 2014, Hugo Buddelmeijer wrote:
>
>> Hi all,
>>
>> FWIW, some more information. The problem described below was introduced in
>> JSAMP 1.2 (from February 15 2011); version 1.1 behaves similar to the current
>> astropy hub.
>>
>> The (very useful) Web Profile was introduced in version 1.2, which apparently
>> required a large rewrite of the hub code, see the links below.
>>
>> http://software.astrogrid.org/doc/p/jsamp/1.3-3/downloads.html
>> http://www.ivoa.net/pipermail/apps-samp/2011-February/000862.html
>> http://software.astrogrid.org/doc/p/jsamp/1.3-3/history.html#Version_1_2
>>
>> My knowledge of Java is too limited to assess what is necessary to get the old
>> behaviour into the new code base. Hopefully it is possible though!
>>
>> Greetings,
>> Hugo
>>
>>
>> On 2014-06-23 12:29, Hugo Buddelmeijer wrote:
>>> Hi Mark et al.,
>>>
>>> The SAMP HUB in JSAMP (and thus TOPCAT) causes unnecessary delays with
>>> clients that do not immediately return the XML-RPC call from the HUB.
>>>
>>> The astropy HUB does not show this behaviour, so the delays can in
>>> principle be avoided. Perhaps the JSAMP HUB can be improved to mimic the
>>> astropy behaviour?
>>>
>>>
>>> ** Explanation and Solution **
>>>
>>> Before returning the XML-RPC call of a calling client, the SAMP HUB in
>>> JSAMP first waits for the called-upon client to return their XML-RPC call.
>>>
>>> Instead, for 'call' and 'callAll', the HUB could return the XML-RPC call
>>> of the caller directly, before initiating the XML-RPC call to the
>>> callee. For 'callAndWait', the HUB could return the XML-RPC call from
>>> the caller as soon as it has a SAMP 'reply' from the callee,
>>> irrespective of whether the callee has returned their XML-RPC call.
>>>
>>>
>>> ** Demonstration **
>>>
>>> The attached two files demonstrate the problem. client1.py contains a
>>> client that causes problems. client2.py contains a client that sends
>>> messages to client1 through callAndWait, call, and callAll. The essence
>>> of client1 is this:
>>>
>>> def receive_call(...): # Receive a SAMP call.
>>> client.reply(...) # Immediately return a SAMP reply.
>>> time.sleep(5) # Perform desired time-consuming action.
>>> return # Return an XML-RPC reply.
>>>
>>>
>>> With the astropy HUB, all the calls from client2 take about 0.02
>>> seconds. With the JSAMP HUB, the calls take 5.02 seconds. The SAMP reply
>>> arrives even before the 'call' and 'callAll' XML-RPC function returns.
>>> Furthermore, a 'callAndWait' with a timeout of 2 seconds will also take
>>> 5 seconds.
>>>
>>> It would be great if JSAMP could become even more robust than it already
>>> is. It is currently even possible to perform a denial-of-service attack
>>> on TOPCAT using SAMP!
>>>
>>>
>>> ** Workarounds **
>>>
>>> Two workarounds for this problem are:
>>>
>>> 1) Have all clients terminate their incoming XML-RPC call immediately
>>> before performing the actions associated with the incoming message.
>>> Although this is probably the best behaviour, it seems to require quite
>>> some complexity for clients, like multi-threading and such. Perhaps so
>>> much complexity that the 'S' in SAMP is not applicable anymore. Anyone a
>>> suggestion on how to rewrite client1.py in such a way as cleanly as
>>> possible?
>>>
>>> 2) Add a try/except loop around every SAMP call with its own timeout in
>>> each client. This might be necessary anyway, since even if JSAMP is
>>> updated it might take a while before all users use the new version.
>>> However, this makes 'callAndWait' superfluous and also requires
>>> something like multithreading.
>>>
>>>
>>> Thanks for your insight,
>>>
>>> Hugo
>>>
>>>
>>>
>>>
>>
>>
>>
>
> --
> Mark Taylor Astronomical Programmer Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk +44-117-9288776 http://www.star.bris.ac.uk/~mbt/
>
More information about the apps-samp
mailing list