[TAP] Summary: data type for column metadata

Sun Apr 19 17:12:39 PDT 2009

To summarize, what is currently proposed for TAP metadata queries
are the following:

     o	VOSI interface returning registry XML for the full tableset
 	provided by the service.  The metadata for a single table or
 	a simple list of tables cannot be directly queried, and the
 	client must deal with a custom XML schema.  This is needed
 	for registry-based discovery.

     o	Null query returning a table (defaulting to VOTable format)
 	with no data rows.  A null query can result from any data
 	query which does not find any rows (which is not an error),
 	or can be forced by setting MAXREC=0.  This is simple
 	and uses the standard query mechanism, but returns only
 	basic VOTable-specific instance metadata for a single table.
 	Special support is however provided via the VOTable mechanisms
 	for things like GROUP which can be useful for associating
 	coordinate fields or grouping the fields of a data model.

     o	Tableset metadata queries via TAP_SCHEMA.  This allows the
 	tableset schema to be directly queried, rather than querying a
 	table instance, hence a richer set of metadata can be queried.
 	The standard query interface can be used to query table
 	metadata such as a list of tables or a list of the columns of
 	a table.  Individual TAP_SCHEMA tables may be queried which
 	is convenient for the client which does not normally want all
 	the metadata for the full tableset as with the VOSI interface
 	(although the param query as currently proposed does provide
 	a special mechanism for returning the full tableset metadata).

Both the ADQL and param query interfaces (if either or both are
supported by the service) can be used for either null queries
(MAXREC=0) or to query the TAP_SCHEMA.

Some examples (taking liberties with the HTTP formatting and ignoring
how we specify the query mode):

     QUERY="SELECT * FROM someTable"&MAXREC=0	# null query via ADQL
     FROM=someTable&MAXREC=0			# null query via param query

     QUERY="SELECT * FROM TAP_SCHEMA.tables"	# list tables via ADQL
 						# table columns via ADQL
     QUERY="SELECT * FROM TAP_SCHEMA.columns WHERE table_name=someTable"

     FROM=TAP_SCHEMA.tables			# list tables via param query
 						# table columns param query
     FROM=TAP_SCHEMA.columns&WHERE=table_name,someTable
     FROM=TAP_SCHEMA.tableset&FORMAT=xml		# registry/VOSI tableset

On Fri, 17 Apr 2009, Patrick Dowler wrote:
> On Friday 17 April 2009 08:15:14 Gerard wrote:
>> Hi Pat
>> A somewhat long comment again, just before the weekend.
>>
>>>> I like this list.  I think we also need to include a recommendation
>>>> (at
>>>> least) for how these should be mapped to VOTable types.  This is not
>>>> only for consistency in TAP responses, but also for describing a table
>>>> in the registry outside the context of TAP (e.g. describing the table
>>>> returned from an SIA query).  Does this seem reasonable?
>>>>     ADQL type      VOTable
>>>>     BOOLEAN        boolean
>>>>     SMALLINT       short
>>>>     REAL           float
>>>>     DOUBLE         double
>>>>     TIMESTAMP      char arraysize="*", (format?)
>>>>                     (or is it numeric?)
>> 
>> I personally think a mapping fro ADQL->VOTable should be mandated. But
>> it has to go both ways doesn't it?  If I upload a VOTable into
>> TAP_UPLOAD and write a query against it, I must be allowed some
>> expectation on the ADQL datatypes of the columns in the table that
>> was uploaded.  In particular I wonder whether we can insist that the
>> following two equivalent actions MUST have the same result:
>
> * PROPOSAL*
> 
> A specific mapping of ADQL <-> VOTable datatypes (+formats or
> something, TBD) will be mandated.

I like the proposal to use ADQL types in the TAP_SCHEMA for field
types, provided we define a standard mapping to VOTable types.
This provides additional needed information on table types in a
DBMS-independent fashion while also providing a clear mapping to
VOTable types.

I don't buy the arguments that we need to fully specify coordinate
systems in detail in the TAP_SCHEMA in order to do cross correlations.
Most spatial correlations will not be done using low level field
expressions in any case.  Rather, one will use higher level constructs
such as POS,SIZE or REGION in param query (for cone search, multiquery,
etc.), or the equivalent mechanisms in ADQL.  If the service implements
these it will know how to map spatial regions to the underlying table,
without the client having to sort out the details for every such table.
Often complex service-specific spatial indexes will be used, such as
HTM etc., and a higher level construct is needed to make effective
use of these.

In any case it we cannot assume that services would reliably provide
such detailed coordinate system metadata for every data table -
I would bet that most will not.  Even if they tried to do so there
might be several semantic types of coordinates in the table and the
client would not know which to use for correlations, without a detailed
(human) understanding of the table.

In the more complex cases a human will have to understand the data
they are dealing with and pose the query.  To support this and
avoid forcing the user to go to the literature for simpler queries,
we just need to provide basic coordinate system information in the
TAP_SCHEMA, such as could be provided by a single FRAME (or whatever)
attribute encoded as a string, based upon the STC data model (but
not necessarily the specified serializations).

> Further, we also mandate the
> mapping of column names <-> VOTable FIELD names (plain column name,
> not qualified with table/alias).  Anything less will make synchronous
> queries w/ upload basically impossible.

I agree that the VOTable field *name* should be used for table column
names.  ID has a more specialized usage within VOTable, being used
as part of the ID-REF construct.

 	- Doug