[TAP] sync vs async

Tue Mar 3 03:01:24 PST 2009

Hi Pat 

> 
> Implementing /sync in addition to /async is a small amount of 
> extra work. 
I am sure that is true, but I think that the opposite is not. 
(I am arguing from the position where one has a /sync implementation.)

The reasons why I asked it to be optional which to implement (or both)
is because I assumed that for large data sets there may be reasons for the
server to prefer /async
and not wishing to have to support /sync. /sync will require connections to
be open for longer time, 
possibly does not go well with queueing etc.

For schema and VOSI metadata queries one would indeed not expect /sync to be
any problem.
Indeed it would be annoying if I have to put in a request first and download
a small file in
a second request. Instead of using this as an argument for mandating support
for /sync across the board,
it might also argue for separate treatment of metadata requests from data
queries.

I think it might be useful to see some actual case studies on this.
I will start one here, I could give some more details in Strasbourg if that
is deemed useful.

I manage the Millennium database and web application.
It is a sync-only system where users can submit SQL queries directly.
I impose a timeout of 7 minutes, this includes query execution, data
retrieval and delivery.
I impose no limit on the number of rows that can be returned. 
I can do the latter precisely because it is a sync service and I do not have
to store the result on the server side: as soon as rows arrive from the
database I write them to the (http) outputstream.

I have noticed that often it is not the query execution time that is a
bottleneck, but the data delivery time.
In most cases shortly after a query is submitted data starts coming back
from the database. 
So that may be a problem for async as well.

I have not set the query timeout to a higher value, because it is easy to
write a query that 
does not exactly do what one wants, something that is discovered too late.
If a user notices this, but must wait for example 30 minutes (roughly the
time for a table scan of our largest tables and the timeout our Durham
mirror uses) before running the improved query that is annoying as well. 
So 7 minutes is somewhat of a compromise. 
I believe that in SDSS skyserver they find that queries hardly take longer
than 10 minutes (correct?), though I believe that that database will likely
have different query patterns.

Indeed users have to learn how to work around the timeout if they do need a
complete table scan.
And though some users have solved this for themselves, often it is me that
has to suggest ways to do so 
(help-desk function). This could be done in the documentation as well, if it
is written up in user readable way (i.e. HTML iso XML). 

That is not optimal, but works because our system requires user registration
and has about 250 registered users, whereas at any day of the order of 5-10
individual users submit queries.

I do want to implement async at some point, but have not done so so far
because I did not have the time,
(nor the ability to do this quickly) to design and implement a robust
system.

It would be useful to see experiences from SDSS SkyServer, that has both
sync and async support.

Best regards

Gerard

PS
Pat, if you have a robust(java) implementation of an async service I'd be
happy to work it into our system!