TAP RFC [VOSI]

Tue Sep 29 10:19:19 PDT 2009

On Tuesday 29 September 2009 08:00:49 Alberto Micol wrote:
> My point is that a client (TAP, SIA, SSA, etc) cannot know in advance  
> if its request
> is too heavy for a given server. Even more so, if the same query is to  
> be sent to many different servers.

You are forgetting that the service also cannot in general know that the query 
is a heavy, time-consuming request that will exceed the http timeouts of using 
the sync endpoint. 

select * from someTable 
where INTERSECTS(spatial_bounds,circle('ICRS', 10,10,0.1) = 1

This is a typical spatial query (cone search) in ADQL. If the table is small, 
it will probably be fast. If the table has a spatial indexing scheme on the 
spatial_bounds column, it will probably be faster than if it does not. If the 
content is spread out and the actual condition is very selective, it will be 
faster than if all the content is inside the circle.... can anyone really 
plausibly determine ahead of time that this will be fast? probably fast? 
probably slow? slow? Not plausibly, in my opinion. I can look at a query and 
make a good guess about whether it wil be heavy or not, but I cannot write 
software to make that guess for me :-)

> To me, a query is always a SYNC query. If the service cannot answer  
> right away, the
> service will politely inform the client that the request will take a  
> bit longer,
> and will turn to ASYNC.

In reality, users will try to do a query using sync and if it fails they can 
either change the query or use async instead. If the user thought the query 
was simple and fast they will likely examine it more closely for bugs. If they 
know it is complex, they will maybe assume it is correct and try async, or 
they may set MAXREC to something small and try sync again to test it. I don't 
think the service can really make these decisions.

More immediately, that is not the UWS model and in TAP we are meshing UWS and 
DAL sync access in a single service. In future we could think about how a sync 
request could decide to redirect the caller to an equivalent async job and 
what the impact on clients would be... how about in the next version? :-)

>have to stage the result to disk

yes, it is inherent in async that one has to provide server side resources 
(not necessarily files on disk), at least temporarily but that last longer than 
a single http request. The upside of this is that when the result is 
transferred over the network, one should know the content-length and thus be 
able to support resumable downloads in case of small network issues. With 
streaming output directly from the db, one has to run the query again and 
start the transfer from scratch. With a poor network connection, the user will 
never be able to succeed.

-- 

Patrick Dowler
Tel/Tél: (250) 363-0044
Canadian Astronomy Data Centre
National Research Council Canada
5071 West Saanich Road
Victoria, BC V9E 2M7

Centre canadien de donnees astronomiques
Conseil national de recherches Canada
5071, chemin West Saanich
Victoria (C.-B.) V9E 2M7