TAP RFC [VOSI]
Alberto Micol
amicol.ivoa at googlemail.com
Wed Sep 30 03:33:26 PDT 2009
Sorry for a lengthy email, but I find this quite important for the
overall VO architecture,
especially given that this touches immediately on TAP, but later it
will be extended to all other
DAL interfaces.
On 29 Sep 2009, at 19:19, Patrick Dowler wrote:
> On Tuesday 29 September 2009 08:00:49 Alberto Micol wrote:
>> My point is that a client (TAP, SIA, SSA, etc) cannot know in advance
>> if its request
>> is too heavy for a given server. Even more so, if the same query is
>> to
>> be sent to many different servers.
>
> You are forgetting that the service also cannot in general know that
> the query
> is a heavy, time-consuming request that will exceed the http
> timeouts of using
> the sync endpoint.
This is exactly what I meant by saying "too heavy"; I really meant
time consuming
and exceeding http timeout.
But I also meant that the service receiving a question (being it
queryData, getData, doQuery etc)
is the only one that knows what to do with such question.
I never had in mind the idea that the service should use a fabulously
smart system
to estimate how long it will take to get the answer to any specific
query. I know that this is not feasible.
What I meant is that a service is implemented around whatever local
infrastructure,
and it is the service provider that decides up front, for a given data
collection, knowing her own architecture,
if a getData is to be served using SYNC or ASYNC, if a doQuery is
ASYNC or SYNC, and
so on and so forth.
The decision if to serve SYNC or ASYNC is usually taken a priori by
the service, just based on the type of
operation (getData, doQuery, etc) for any given data collection,
without looking at the actual query content.
In this respect I think it makes no sense to ask for a SYNC treatment
to a service that is not setup
to provide immediate answers. (the answer would be negative, what a
waste of time)
Nor it makes sense to ask for ASYNC because this will force all data
providers to put together a complex
machine. Even those providers that WANT to offer a SIMPLE and quick
service which could be easily
implemented with a SYNC mechanism will have to implement something
much more complex.
A very unnecessary burden, very much against what the takeup committee
wants to reach.
I would much prefer to see questions being posed without any SYNC,
ASYNC request;
the service can then take it and decide what to do:
- send back the answer to the question if it can (SYNC by default), or
otherwise,
- send back a formal answer to inform the client that ASYNC (and UWS)
is to be used.
No extra burdens to data providers, please!
And no extra burdens to the users either:
> In reality, users will try to do a query using sync and if it fails
> they can
> either change the query or use async instead. If the user thought
> the query
> was simple and fast they will likely examine it more closely for
> bugs. If they
> know it is complex, they will maybe assume it is correct and try
> async, or
> they may set MAXREC to something small and try sync again to test
> it. I don't
> think the service can really make these decisions.
All that going back and forth is completely unnecessary (unless of
course there is a real
bug in the question, but not otherwise).
- If the service decides upfront to use ASYNC (because it offers a
huge catalog) the user
will simply send his query, and the answer will be to please use ASYNC.
- If the service decides upfront to use SYNC (because it offers access
to a small catalog) the user
will simply send his query, and will receive her answer shortly.
Of course, the problem arises if a huge catalog is served only in SYNC
mode, or if
a small catalog is served and the network connection is not that good.
Timeouts will likely happen often in those two cases. In such case,
yes, the user will have to limit
using MAXREC, if not done so by the provider herself. Some handling of
the kind proposed by Pat
will always happen, but we should limit the number of cases to only
the strictly necessary ones,
balancing it out with the burden otherwise imposed to data providers.
In one sentence: Why complicating things at both ends?
Alberto
>
> select * from someTable
> where INTERSECTS(spatial_bounds,circle('ICRS', 10,10,0.1) = 1
>
> This is a typical spatial query (cone search) in ADQL. If the table
> is small,
> it will probably be fast. If the table has a spatial indexing scheme
> on the
> spatial_bounds column, it will probably be faster than if it does
> not. If the
> content is spread out and the actual condition is very selective, it
> will be
> faster than if all the content is inside the circle.... can anyone
> really
> plausibly determine ahead of time that this will be fast? probably
> fast?
> probably slow? slow? Not plausibly, in my opinion. I can look at a
> query and
> make a good guess about whether it wil be heavy or not, but I cannot
> write
> software to make that guess for me :-)
>
>> To me, a query is always a SYNC query. If the service cannot answer
>> right away, the
>> service will politely inform the client that the request will take a
>> bit longer,
>> and will turn to ASYNC.
>
More information about the dal
mailing list