TAP: FIELD/PARAM format attribute
Mark Taylor
m.b.taylor at bristol.ac.uk
Wed May 13 07:23:04 PDT 2009
On Fri, 8 May 2009, Keith Noddle wrote:
> Discussion starter: please can someone close to the subject reply to this
> email outlining the current state of play.
>
> Keith.
I'll have a go at this, though I'm not a TAP insider, so corrections
are welcome if I'm misrepresenting anything.
TAP is all about getting data in and out of databases.
The chosen data exchange format is VOTable. So in various places it's
necessary to move data from an RDBMS table or tables
to a VOTable (query result) or vice versa (table upload).
The type system for data inside of databases is as far as TAP
is concerned defined by ADQL (I think - though maybe additional
DB-specific types are permitted too?). Since the VOTable type
system is less rich than ADQL's there is a mismatch.
The main problems relate to the database types
TIMESTAMP, POINT and REGION (any more?) - the others are basically
scalar or array numeric/character/boolean types for which there is
an obvious 1:1 correspondence between ADQL type and VOTable type.
It is straightforward to represent these non-VOTable data items
in a VOTable (TIMESTAMP -> ISO-8601 string; POINT & REGION -> STC-S
string), but simply doing that loses metadata. You can't tell by
looking at the result VOTable from a TAP data query whether a given
one of its columns represents a TIMESTAMP, and when uploading a
table with columns which should represent TIMESTAMPs will simply
be ingested into the database as strings instead.
Pat proposed (http://www.ivoa.net/forum/dal/0904/1142.htm),
and introduced at TAP draft 0.42, the solution of using ADQL datatypes,
rather than VOTable datatypes, for the column metadata stored
in the TAP_SCHEMA metadata tables and hence available from TAP
metadata queries. This was generally welcomed, and goes a long
way to improving matters, though I think there may be some
issues that it doesn't cover (uploaded tables are still a problem,
since you can't upload TAP_SCHEMA metadata, can you? and what about
calculated columns in output which won't have TAP_SCHEMA entries?)
There have been other suggestions to provide a more complete
solution to the problem, which basically bridge the mismatch
between the ADQL and VOTable type systems, allowing you to tell
by looking at an result or uploaded VOTable what ADQL types its
columns represent. I'm not clear whether in view of the 0.42
innovation in the previous paragraph a solution along these
lines is still regarded as necessary, though it would probably
make things tidier and easier for TAP client software.
These suggestions fall into the following categories:
1. extend the VOTable type system to include the missing types
2. add a new attribute (name still under discussion - possibly
"representation") to label columns with the missing types;
a TIMESTAMP column would still be a string in a VOTable,
but additionally marked "representation='iso-8601'"
3. (ab)use the existing unit attribute to label columns with
the missing types;
a TIMESTAMP column would still be a string in a VOTable,
but additionally marked "unit='iso-8601'"
4. rely on the existing machinery of utypes; by understanding
the utype for a column, application code should be able to
work out whether it corresponds to a TIMESTAMP or whatever
(N. others??)
2 would require a small addition to the VOTable standard,
and require no changes (though make some improvements possible)
to existing VOTable parsers and their client applications.
3 would in principle have slightly more impact since it would
change syntactical rules for an existing attribute, but in practice
the effect on existing software is likely to be minimal.
There are arguments (though by no means universally accepted)
independent of TAP in favour of one or other of these VOTable changes;
DAL/TAP's support would surely make their adoption more likely.
With the agreement of the VOTable group it would probably(?) be possible
to get one or other into VOTable 1.2, which all being well is
planned to move towards REC soon. It would in fact be possible for
TAP to adopt option 3 and use it with the current VOTable standard
(1.1) on the understanding that it would be blessed in VOTable 1.2
when it comes out - it's technically illegal VOTable 1.1, but
unlikely to cause software problems (such practice is already
used elsewhere).
1 would require a substantial change to the VOTable standard,
and have major implications for existing VOTable parsers and
perhaps their client applications.
4 requires no change to VOTable.
Consensus has not been reached on this issue. Discussion is ongoing
on the DAL list, and losing focus in the usual way, mainly in the
threads:
[TAP] data type for column metadata (from 17 March)
[TAP] Summary: data type for column metadata (from 15 April)
content, format, ctype, or xtype (from 3 May)
in the interests of keeping this summary both neutral and of manageable
size, I will not attempt to summarise the arguments in those threads
here.
Mark
--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
More information about the dal
mailing list