[TAP] sync vs async - time outs

Tom McGlynn Thomas.A.McGlynn at nasa.gov
Wed Mar 4 06:55:48 PST 2009


[I said I'd stay out but...]

One thing that confuses me a little about this discussion, seems to be
the implicit assumption that we don't need to worry about timeouts for
asynchronous queries.  I think it unlikely that most services are 
willing to provide unlimited CPU to their users.  E.g., doesn't 
SkyServer provide various queues which timeout at different limits? 
Does it have unlimited queues?

My own sense is that the only real difference between synchronous and
asynchronous services are the values of the limits. Perhaps we
allow up to 15 minutes for synchronous queries and up to 1 day for
asynchronous.

Similarly, it's manifestly unfeasible to suggest that any service handle 
all queries within any timeout.  E.g., suppose I have a table xx with 
one field and 10 rows.  Even this can time out.

Then the relatively simple query
    select * from
     xx a, xx b, xx c, xx d, xx e, xx f,
     xx g, xx h, xx i, xx j, xx k, xx l
returns 10^12 rows.  I'm sure Guy meant any sensible query, but given 
real tables it will often (almost always?) be possible to construct at 
least semireasonable queries that take some substantial fraction of a
Hubble time.

It seems to me that clients need to be able to handle time-outs for
both synchronous and asynchronous queries.  Making this work cleanly for 
both kinds of services seems more crucial to me than which are 
supported.  It's probably a little easier for asynchronous services. 
They just write an error message somewhere when they give up.

For synchronous services it's a little harder since we need to make sure 
that we send an appropriate message before the connection dies.  Maybe 
allow a timeout value for synchronous services? [Does it already exist?] 
  While I'm not sure that the underlying databases support timeouts, 
it's easy enough to put the timeout in the layer that calls the 
database.  Values of the timeout above some maximum could be forbidden 
-- or at least warned against.  This is pretty much how the underlying 
web protocols work I think and shouldn't be hard to implement.

	Regards,
	Tom

Alex Szalay wrote:
> I REALLY LIKE THIS APPROACH!!! 
> 
> In practice we found that the following numbers work pretty well
> 
> 	Instantaneous <  1 min
> 	Interactive	  < 15 min
> 	Beyond that it should be async
> 
> --Alex
> 
> -----Original Message-----
> From: Guy Rixon [mailto:gtr at ast.cam.ac.uk] 
> Sent: Wednesday, March 04, 2009 5:13 AM
> To: Gerard
> Cc: dal at ivoa.net
> Subject: Re: [TAP] sync vs async - time outs
> 
> Perhaps a mapping campaign is needed. if we can work out the
> minimum timeout that is generally imposed by the network -
> a sort of worse-probable value - then we can say to TAP implementors
> "make it work faster than this in all cases or you need an async mode".
> 
> Cheers,
> Guy
> 
> On 4 Mar 2009, at 08:42, Gerard wrote:
> 
>> Dear Dave
>>>> Guy wrote:
>>>>> However, I see no reason why a TAP installation that
>>> always operates
>>>>> quickly has to have the asynchronous mode. I would be
>>> happy for the
>>>>> asynchronous interfaces to be made optional; but only on the
>>>>> understanding that most TAP services will need them and that
>>>>> software authors be prepared to add them when that need
>>> is discovered.
>>>> I support this, but think we should let a service specify what they
>>>> think is "quickly".
>>> In practice 'quickly' is controlled by the settings on all
>>> the various network routers, switches and proxies handling
>>> the connection between client and server.
>>> A database server may take 25 seconds to think about a
>>> complex query, but if one of the network switches handling
>>> the connection thinks 20 seconds is an appropriate timeout,
>>> the response will never arrive.
>>>
>> Point well taken.
>> This is indeed the main problem I have encountered.
>> Queries that require a complete scan through a table to be  
>> completed before
>> obtaining any result, for example
>> when calculating aggregate quantities, have timed out because of  
>> session
>> timout settings on proxy and web servers.
>>
>> There clearly is a place for /async.
>> Another motiviation for implementing it is because it makes it
>> easier/possible to queue jobs.
>> It seems that "I/O concurrency" scales much worse than linear.
>> Being able to queue requests so that at most a few access the same  
>> disk at
>> the same time seems better for performance.
>>
>> Thanks
>> Gerard
> 
> 
> 



More information about the dal mailing list