Scope of TAP parameter-based queries
Douglas Tody
dtody at nrao.edu
Tue Feb 24 11:43:55 PST 2009
On Tue, 24 Feb 2009, Patrick Dowler wrote:
> For now, there are many specific issues with P?? (the name included) that we
> can address in separate topics. I will start a few of those today, starting
> with "scope of PQL".
As as starting point here are the capabilities which NVO wants from
simple parameter-based queries in TAP:
o Simple table/DBMS metadata queries using the same query
interface as is used for table data queries (i.e. based upon
the SQL information schema approach).
Use of the same query interface means that all this code can
be reused on both the client and server side. Nice features
such as multiple output formats fall out without having to
do any extra work. The approach is scalable to very large
tablesets, i.e., individual tables can be queried. We want
this capability for our science users who will write software
which uses TAP. Science users generally do not want to deal
with the big block of XML metadata which VOSI will return.
Mostly they just want a list of tables, or a list of the
columns of a single table, in text, csv, or votable format
depending upon what they want to do with it.
o Cone search replacement. Something about as simple to use
as classic SCS, but much more powerful. Hence we continue
to support simple cone (circular) regions, but add optional
support for more general spatial regions (REGION parameter).
It should be possible to specify the table to be queried
(FROM parameter), and optionally the subset of fields to be
returned (SELECT parameter). Simple range based filter-type
expressions of individual tables (mainly astronomical catalogs)
should be supported (WHERE parameter). We should keep this
capability fairly simple as ADQL will be available for the
more complex cases.
o Multi-position queries (aka "multicone"). A user supplied
table of positions is used to generalize the cone search to
multiple positions, providing scalability. A single table is
returned containing the rows found for each input position,
tagged by the position ID. Requires the UPLOAD capability (URL
based) as well as inline table upload capability. REGION can
be used to arbitrarly mask the table of input positions, e.g.,
to allow a very large table to be used as the input position
table in a cross match. It should be possible to further
constrain the output by specifying simple range constraints on
individual table fields. In queries of large catalogs this can
greatly reduce the amount of data to be returned to the client.
The multi-position query can be very useful as-is, but will
often be only the first stage of a more complex multi-stage
query, e.g., employing a custom multiparametric cross-matching
algorithm on the client side to further refine the data,
after doing a rough selection using the multi-position query.
Distributed queries are easily supported as well, with minimal
requirements on each individual TAP node.
o Simple filter type queries of individual astronomical catalogs.
This is really a variant on the cone search, but eliminates
the requirement that a spatial constraint be used. The only
constraint would be simple range type constraints on individual
table fields.
o Query for table modifications (MTIME parameter). This is a
special case, used to query a remote service for what has
changed in a given time interval, e.g., to automatically
maintain a replica of a catalog. It is much easier to do this
with a parameter-based query than with something like ADQL,
due to the higher level of abstraction. The service is free
to maintain time-of-modify/delete/update information any way
they want, internal to the service/DBMS.
o Another subtle point is that DBMS views should be supported.
This will provide a very powerful way to extend the simple
parameter-based query, allowing complex SQL-based operations
on the server side. Unlike ADQL which permits general client
defined expressions, the operations provided would be those
defined by the data provider. But often the data provider
will be the one in the best position to define standard views
of their data.
I think that is all we identified here. All of this needs to be
fully specified in a TAP-specific manner in order to permit services
to be implemented rigorously, and to define the interface well enough
that clients can use it for actual access.
- Doug
More information about the dal
mailing list