Asynchronous querying and tabular data

Wed May 2 11:07:50 PDT 2007

On Wed, 2 May 2007, Patrick Dowler wrote:

> Ok, I see where you are coming from. I think the disconnect is that everyone
> else (me included) sees TAP as a single step process which can be sync or
> async; the service would decide which to support.
>
> I think estimation is more or less pointless - even the RDBMS with all kinds
> of internal knowledge and statistics has a hard time chosing a good query
> plan and none of the 4-5 I have used have an estimating built in. There is
> good old "select count(*)" but that is faster if the query cost is dominated
> by delivering the rows, which is not always the case. It is more often than
> not dominated by the cost of joins (including using an index and then looking
> up a bunch of rows in the table - which has cost that scales just like a
> key-join).

I also suspect that query estimation could be quite difficult in
general, although for really large queries where significant resources
are required it is probably necessary.

I don't any reason why we couldn't have a way to submit a query
directly to execute as an asynchronous operation.  For TAP this may
be all that is required.  A simple way to do this might be to just
skip the queryData, and issue a stageData instead, containing all
the query and staging information directly in the job description.

A single service could support both: queryData for synchronous DM and
ADQL-based queries, and optionally stageData for asyn/staged execution.
The client would then either have to guess which to use, or try a
few smaller synchronous queries first to determine what to do, and
then resubmit a larger query as a batch job.

 	- Doug