[TAP] data type for column metadata
Markus Demleitner
msdemlei at ari.uni-heidelberg.de
Wed Mar 18 01:30:01 PDT 2009
Dear DAL group,
On Tue, Mar 17, 2009 at 12:41:48PM -0700, Patrick Dowler wrote:
> The current draft specifies that the data type is the VOTable datatype, which
> includes only primitive types and arrays of them. However, databases also
> have timestamp/datetime columns and, with ADQL will have columns that contain
> region values.
First off, I'm not a big fan of inflating the "VOTable type system".
As is, a VO application has to support at least its native types, SQL/ADQL,
VOTable, and typically XSD types. In an ideal world, I'd propose to
strike down the VOTable type system and replace it with a clearly
defined ADQL one. However, I know this will not happen. On the
other hand, if we expand the VOTable type system, I'd strongly
suggest keeping at least the extensions aligned with ADQL's type
system (probably after putting a bit more rigor into that).
The alternative obviously would be trying to handle theses things
through UCDs and units. This is how I currently treat datetimes
(e.g. unit=d yields a JD float, except when the UCD contains MJD, and
then it's an MJD float; unit=y-m-d yields an ISO string, etcetc.)
Needless to say, I hate it. Let's not go down that route.
> In the db, dates will be be timestamp or datetime (SQL type). In the output
> VOTable they will be strings (datatype="char", arraysize="*") encoded in an
> ISO8601 format.
Hm -- I'd like to see a timestamp type for that, together with a
strict specification of how unit is to be interpreted of it or
disallowing it altogether.
It would be nice if there could be a suitable binary representation
as well, but ISO/ASCII would probably do in binary VOTable streams as
well. However, ISO timestamps have variable length (if you allow, as
we should, fractional seconds and, as we should not allow,
timezones), so something more elegant (a pair of 32 bit integral JD
and a double for the seconds?) would be nicer.
> In the db, regions will likely be varchar or varbinary (SQL type). In the
> output VOTable they will be stringsencoded in STC/S format.
Hm -- right now, ADQL specifies how to output its geometry
expressions. I have to admit that I don't like this specification
-- for one, I've never understood the reasoning behind leaving out
the quotes from the first argument which gives rise to all kinds of
unnecessary trouble; plus I'm still convinced that coordinate system
metadata belongs to a metadata section and should not be (directly)
encoded in values, a.k.a., I don't like the first argument at all.
So, I think if we want to mandate STC-S in the VOTables, we need
to change ADQL. And I'm adamantly against just referring
to STC-S. Parsing full STC-S is quite involved and therefore
relatively slow, and also overkill for what you could want in a table
column (*cough* first normal form *cough*). Thus, if we go the STC-S
way (which still would munge metadata into data and thus IMHO would
be suboptimal), we need to define a subset of STC-S that is *quickly
parseable* into a *simple model*.
By the way, these days many databases have native geometry types
(e.g., for GIS applications), and thus I'd doubt that storage of
regions in databases will "likely" be in strings or binary blobs.
But that's just an aside.
> Secondarily, should we also describe the SQL type in addition? There are some
> query writing tasks where that is needed (e.g. when implicit conversions are
> going on).
Yes. Query generation will be a nightmare as is, and saving a couple
of bytes in the schema description and the query replies doesn't
justify the hassles of having to guess the mapping a TAP service does
from its ADQL type system to VOTable.
*However*, I think this only really makes sense if there's a well
defined and spelled-out type system in ADQL. Since ADQL has (wisely)
left out the entire DDL part of SQL (and, IIRC, casting as well),
such a thing doesn't exist yet, I think. Dumping the SQL type into
a column probably doesn't help a lot. Pop quiz: What input formats
are allowed for a column of type MAC (of course, you don't know that
the backend DB is Postgres)[1].
Cheers,
Markus
[1] Yeah, I know that MAC type columns (which store ethernet hardware
addresses) aren't terribly likely in astronomical databases, but I,
for example, have BOXes in my tables already, and that's in no way
better.
More information about the dal
mailing list