TAP RFC [VOSI]

Wed Sep 30 03:40:50 PDT 2009

I'm sorry but I am quite lost in this conversation (and a little worried
about the
things that it seems to imply)

could somebody please point me to some document where I can read what we
are talking about?

Thanks and sorry for disturbing

Carlos

On Wed, 30 Sep 2009 12:33:26 +0200, Alberto Micol
<amicol.ivoa at googlemail.com> wrote:
> 
> Sorry for a lengthy email, but I find this quite important for the
> overall VO architecture,
> especially given that this touches immediately on TAP, but later it
> will be extended to all other
> DAL interfaces.
> 
> 
> On 29 Sep 2009, at 19:19, Patrick Dowler wrote:
> 
>> On Tuesday 29 September 2009 08:00:49 Alberto Micol wrote:
>>> My point is that a client (TAP, SIA, SSA, etc) cannot know in advance
>>> if its request
>>> is too heavy for a given server. Even more so, if the same query is
>>> to
>>> be sent to many different servers.
>>
>> You are forgetting that the service also cannot in general know that
>> the query
>> is a heavy, time-consuming request that will exceed the http
>> timeouts of using
>> the sync endpoint.
> 
> This is exactly what I meant by saying "too heavy"; I really meant
> time consuming
> and exceeding http timeout.
> 
> But I also meant that the service receiving a question (being it
> queryData, getData, doQuery etc)
> is the only one that knows what to do with such question.
> 
> I never had in mind the idea that the service should use a fabulously
> smart system
> to estimate how long it will take to get the answer to any specific
> query. I know that this is not feasible.
> What I meant is that a service is implemented around whatever local
> infrastructure,
> and it is the service provider that decides up front, for a given data
> collection, knowing her own architecture,
> if a getData is to be served using SYNC or ASYNC, if a doQuery is
> ASYNC or SYNC, and
> so on and so forth.
> 
> The decision if to serve SYNC or ASYNC is usually taken a priori by
> the service, just based on the type of
> operation  (getData, doQuery, etc) for any given data collection,
> without looking at the actual query content.
> 
> In this respect I think it makes no sense to ask for a SYNC treatment
> to a service that is not setup
> to provide immediate answers. (the answer would be negative, what a
> waste of time)
> 
> Nor it makes sense to ask for ASYNC because this will force all data
> providers to put together a complex
> machine. Even those providers that  WANT to offer a SIMPLE and quick
> service which could be easily
> implemented with a SYNC mechanism will have to implement something
> much more complex.
> A very unnecessary burden, very much against what the takeup committee
> wants to reach.
> 
> I would much prefer to see questions being posed without any SYNC,
> ASYNC request;
> the service can then take it and decide what to do:
> - send back the answer to the question if it can (SYNC by default), or
> otherwise,
> - send back  a formal answer to inform the client that ASYNC (and UWS)
> is to be used.
> 
> No extra burdens to data providers, please!
> 
> And no extra burdens to the users either:
>> In reality, users will try to do a query using sync and if it fails
>> they can
>> either change the query or use async instead. If the user thought
>> the query
>> was simple and fast they will likely examine it more closely for
>> bugs. If they
>> know it is complex, they will maybe assume it is correct and try
>> async, or
>> they may set MAXREC to something small and try sync again to test
>> it. I don't
>> think the service can really make these decisions.
> 
> 
> All that going back and forth is completely unnecessary (unless of
> course there is a real
> bug in the question, but not otherwise).
> - If the service decides upfront to use ASYNC (because it offers a
> huge catalog) the user
> will simply send his query, and the answer will be to please use ASYNC.
> - If the service decides upfront to use SYNC (because it offers access
> to a small catalog) the user
> will simply send his query, and will receive her answer shortly.
> Of course, the problem arises if a huge catalog is served only in SYNC
> mode, or if
> a small catalog is served and the network connection is not that good.
> Timeouts will likely  happen often in those two cases. In such case,
> yes, the user will have to limit
> using MAXREC, if not done so by the provider herself. Some handling of
> the kind proposed by Pat
> will always happen, but we should limit the number of cases to only
> the strictly necessary ones,
> balancing it out with the burden otherwise imposed to data providers.
> 
> In one sentence: Why complicating things at both ends?
> 
> Alberto
> 
>>
>> select * from someTable
>> where INTERSECTS(spatial_bounds,circle('ICRS', 10,10,0.1) = 1
>>
>> This is a typical spatial query (cone search) in ADQL. If the table
>> is small,
>> it will probably be fast. If the table has a spatial indexing scheme
>> on the
>> spatial_bounds column, it will probably be faster than if it does
>> not. If the
>> content is spread out and the actual condition is very selective, it
>> will be
>> faster than if all the content is inside the circle.... can anyone
>> really
>> plausibly determine ahead of time that this will be fast? probably
>> fast?
>> probably slow? slow? Not plausibly, in my opinion. I can look at a
>> query and
>> make a good guess about whether it wil be heavy or not, but I cannot
>> write
>> software to make that guess for me :-)
>>
>>> To me, a query is always a SYNC query. If the service cannot answer
>>> right away, the
>>> service will politely inform the client that the request will take a
>>> bit longer,
>>> and will turn to ASYNC.
>>