Asynchronous querying and tabular data

Tue May 1 21:02:44 PDT 2007

Hi Kona, All -

Frankly I'm not really sure how to respond to this posting.  The fact is
that nearly every detail of what is described here is incorrect, from the
get-go, and mistates fundamentally the (fairly simple I thought) concept
of an object-oriented queryData to estimate the work to be performed
(which could then be performed either synchronously or asynchronously),
followed by a stageData or whatever to initiate an asynchronous operation,
presumably followed by some version of the UWS pattern to interact with
the job once it is in progress.

I could respond to this posting in detail and walk through the
mistatements, but I don't think this would be worthwhile or productive
given that we are already at this point.  Lets just say that I do not
agree that this is in any way an accurate description of what has been
proposed.  I have already made an number of postings on this concept in
several forums recently, apparently to little effect.  It is not clear
it would serve any purpose for me to go through it again here now.
Very briefly, the only real distinction between an asynchronous,
object-oriented data service, and a generic job management system,
is in how the elements or tasks of the job to be executed are defined.
This is different for a DAL interface than for the generic case, because
we already have a way to define quite precisely the data product(s) we
want to generate, and once we have done this, the job to be performed
is fully defined.

Instead lets back up and look at the greater issue of integrating TAP
with the other DAL interfaces.  In the next several years we should have
a number of data access interfaces, but I think the "cornerstones"
will be catalog, image, and spectral access.  We are simply not
doing our jobs well if we don't have some uniformity, and sharing of
technology, approach, and interface, between these second generation
data access protocols.  Plus, there is also still interest in eventually
integrating ADQL capabilities into the other DAL services, in which
case the distinction becomes even less.  I think this issue falls in the
category of a _requirement_.  Unless there is a good reason to diverge,
we should adopt a uniform approach; if there is a good reason to do
something different, by all means lets do so, while keeping most of the
rest of the interface the same.

Also - I think many will argue that simple, synchronous, non-authenticated
queries remain the priority.  It is good to hear actually, that people are
taking large queries seriously, as I also think this is quite important
to have, to move beyond the "toy" stage.  A good interface will be simple
for synchronous queries against a single table, but easy to extend to
asynchronous operations, reusing much of the core interface.  What is the
difference, really?  Once the query is posed, the service can determine
whether it is simple enough to proceed synchronously, or expensive enough
to require staging (or perhaps rethinking on the part of the client).
The result, if a large query is attempted synchronously, is truncation
or an error response; alternatively, we for serious large queries we
have a two-stage operation involving estimation and job submission.
This is basically what queryData/stageData concept already provides.

	- Doug

On Tue, 1 May 2007, Kona Andrews wrote:

> Dear all,
>
> Copied below is a useful discussion from a colleague of why access
> protocols like SIAP and SSAP don't extend so gracefully to large
> tabular data queries, and why therefore we shouldn't try to make
> TAP exactly conform to the model assumed by these protocols.
>
> Cheers,
> Kona