Scope of TAP parameter-based queries

Tue Feb 24 11:43:55 PST 2009

On Tue, 24 Feb 2009, Patrick Dowler wrote:

> For now, there are many specific issues with P?? (the name included) that we
> can address in separate topics. I will start a few of those today, starting
> with "scope of PQL".

As as starting point here are the capabilities which NVO wants from
simple parameter-based queries in TAP:

     o	Simple table/DBMS metadata queries using the same query
 	interface as is used for table data queries (i.e. based upon
 	the SQL information schema approach).

 	Use of the same query interface means that all this code can
 	be reused on both the client and server side.  Nice features
 	such as multiple output formats fall out without having to
 	do any extra work.  The approach is scalable to very large
 	tablesets, i.e., individual tables can be queried.  We want
 	this capability for our science users who will write software
 	which uses TAP.  Science users generally do not want to deal
 	with the big block of XML metadata which VOSI will return.
 	Mostly they just want a list of tables, or a list of the
 	columns of a single table, in text, csv, or votable format
 	depending upon what they want to do with it.

     o	Cone search replacement.  Something about as simple to use
 	as classic SCS, but much more powerful.  Hence we continue
 	to support simple cone (circular) regions, but add optional
 	support for more general spatial regions (REGION parameter).
 	It should be possible to specify the table to be queried
 	(FROM parameter), and optionally the subset of fields to be
 	returned (SELECT parameter).  Simple range based filter-type
 	expressions of individual tables (mainly astronomical catalogs)
 	should be supported (WHERE parameter).	We should keep this
 	capability fairly simple as ADQL will be available for the
 	more complex cases.

     o	Multi-position queries (aka "multicone").  A user supplied
 	table of positions is used to generalize the cone search to
 	multiple positions, providing scalability.  A single table is
 	returned containing the rows found for each input position,
 	tagged by the position ID.  Requires the UPLOAD capability (URL
 	based) as well as inline table upload capability.  REGION can
 	be used to arbitrarly mask the table of input positions, e.g.,
 	to allow a very large table to be used as the input position
 	table in a cross match.  It should be possible to further
 	constrain the output by specifying simple range constraints on
 	individual table fields.  In queries of large catalogs this can
 	greatly reduce the amount of data to be returned to the client.

 	The multi-position query can be very useful as-is, but will
 	often be only the first stage of a more complex multi-stage
 	query, e.g., employing a custom multiparametric cross-matching
 	algorithm on the client side to further refine the data,
 	after doing a rough selection using the multi-position query.
 	Distributed queries are easily supported as well, with minimal
 	requirements on each individual TAP node.

     o	Simple filter type queries of individual astronomical catalogs.
 	This is really a variant on the cone search, but eliminates
 	the requirement that a spatial constraint be used.  The only
 	constraint would be simple range type constraints on individual
 	table fields.

     o	Query for table modifications (MTIME parameter).  This is a
 	special case, used to query a remote service for what has
 	changed in a given time interval, e.g., to automatically
 	maintain a replica of a catalog.  It is much easier to do this
 	with a parameter-based query than with something like ADQL,
 	due to the higher level of abstraction.  The service is free
 	to maintain time-of-modify/delete/update information any way
 	they want, internal to the service/DBMS.

     o	Another subtle point is that DBMS views should be supported.
 	This will provide a very powerful way to extend the simple
 	parameter-based query, allowing complex SQL-based operations
 	on the server side.  Unlike ADQL which permits general client
 	defined expressions, the operations provided would be those
 	defined by the data provider.  But often the data provider
 	will be the one in the best position to define standard views
 	of their data.

I think that is all we identified here.  All of this needs to be
fully specified in a TAP-specific manner in order to permit services
to be implemented rigorously, and to define the interface well enough
that clients can use it for actual access.

 	- Doug