TAP: FIELD/PARAM format attribute

Mark Taylor m.b.taylor at bristol.ac.uk
Wed May 13 07:23:04 PDT 2009


On Fri, 8 May 2009, Keith Noddle wrote:

> Discussion starter: please can someone close to the subject reply to this
> email outlining the current state of play.
> 
> Keith.

I'll have a go at this, though I'm not a TAP insider, so corrections
are welcome if I'm misrepresenting anything.

TAP is all about getting data in and out of databases.
The chosen data exchange format is VOTable.  So in various places it's
necessary to move data from an RDBMS table or tables 
to a VOTable (query result) or vice versa (table upload).

The type system for data inside of databases is as far as TAP
is concerned defined by ADQL (I think - though maybe additional
DB-specific types are permitted too?).  Since the VOTable type
system is less rich than ADQL's there is a mismatch.  
The main problems relate to the database types
TIMESTAMP, POINT and REGION (any more?) - the others are basically
scalar or array numeric/character/boolean types for which there is
an obvious 1:1 correspondence between ADQL type and VOTable type.
It is straightforward to represent these non-VOTable data items 
in a VOTable (TIMESTAMP -> ISO-8601 string; POINT & REGION -> STC-S 
string), but simply doing that loses metadata.  You can't tell by 
looking at the result VOTable from a TAP data query whether a given 
one of its columns represents a TIMESTAMP, and when uploading a 
table with columns which should represent TIMESTAMPs will simply
be ingested into the database as strings instead.

Pat proposed (http://www.ivoa.net/forum/dal/0904/1142.htm),
and introduced at TAP draft 0.42, the solution of using ADQL datatypes,
rather than VOTable datatypes, for the column metadata stored
in the TAP_SCHEMA metadata tables and hence available from TAP 
metadata queries.  This was generally welcomed, and goes a long
way to improving matters, though I think there may be some 
issues that it doesn't cover (uploaded tables are still a problem,
since you can't upload TAP_SCHEMA metadata, can you?  and what about
calculated columns in output which won't have TAP_SCHEMA entries?)

There have been other suggestions to provide a more complete 
solution to the problem, which basically bridge the mismatch
between the ADQL and VOTable type systems, allowing you to tell
by looking at an result or uploaded VOTable what ADQL types its 
columns represent.  I'm not clear whether in view of the 0.42
innovation in the previous paragraph a solution along these 
lines is still regarded as necessary, though it would probably
make things tidier and easier for TAP client software.  
These suggestions fall into the following categories:

   1. extend the VOTable type system to include the missing types

   2. add a new attribute (name still under discussion - possibly
         "representation") to label columns with the missing types;
         a TIMESTAMP column would still be a string in a VOTable,
         but additionally marked "representation='iso-8601'"

   3. (ab)use the existing unit attribute to label columns with
         the missing types; 
         a TIMESTAMP column would still be a string in a VOTable,
         but additionally marked "unit='iso-8601'"

   4. rely on the existing machinery of utypes; by understanding
         the utype for a column, application code should be able to 
         work out whether it corresponds to a TIMESTAMP or whatever

  (N. others??)

2 would require a small addition to the VOTable standard,
and require no changes (though make some improvements possible)
to existing VOTable parsers and their client applications.  
3 would in principle have slightly more impact since it would
change syntactical rules for an existing attribute, but in practice
the effect on existing software is likely to be minimal.
There are arguments (though by no means universally accepted) 
independent of TAP in favour of one or other of these VOTable changes;
DAL/TAP's support would surely make their adoption more likely.
With the agreement of the VOTable group it would probably(?) be possible 
to get one or other into VOTable 1.2, which all being well is 
planned to move towards REC soon.  It would in fact be possible for 
TAP to adopt option 3 and use it with the current VOTable standard 
(1.1) on the understanding that it would be blessed in VOTable 1.2 
when it comes out - it's technically illegal VOTable 1.1, but 
unlikely to cause software problems (such practice is already 
used elsewhere).

1 would require a substantial change to the VOTable standard,
and have major implications for existing VOTable parsers and 
perhaps their client applications.

4 requires no change to VOTable.
         

Consensus has not been reached on this issue.  Discussion is ongoing
on the DAL list, and losing focus in the usual way, mainly in the 
threads:

   [TAP] data type for column metadata (from 17 March)
   [TAP] Summary: data type for column metadata  (from 15 April)
   content, format, ctype, or xtype  (from 3 May)

in the interests of keeping this summary both neutral and of manageable
size, I will not attempt to summarise the arguments in those threads
here.

Mark

-- 
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/



More information about the dal mailing list