[TAP] data type for column metadata

Mark Taylor m.b.taylor at bristol.ac.uk
Thu Mar 19 13:26:41 PDT 2009


On Wed, 18 Mar 2009, Markus Demleitner wrote:

> Dear DAL group,
> 
> On Tue, Mar 17, 2009 at 12:41:48PM -0700, Patrick Dowler wrote:
> > The current draft specifies that the data type is the VOTable datatype, which 
> > includes only primitive types and arrays of them.  However, databases also 
> > have timestamp/datetime columns and, with ADQL will have columns that contain 
> > region values. 
> First off, I'm not a big fan of inflating the "VOTable type system".
> As is, a VO application has to support at least its native types, SQL/ADQL,
> VOTable, and typically XSD types.  In an ideal world, I'd propose to
> strike down the VOTable type system and replace it with a clearly
> defined ADQL one.  However, I know this will not happen.  On the
> other hand, if we expand the VOTable type system, I'd strongly
> suggest keeping at least the extensions aligned with ADQL's type
> system (probably after putting a bit more rigor into that).

I think this can be addressed within VOTable, but not in the type
system itself.

VOTable, like FITS, is a container intended to carry low-level
data types: numeric types, characters, and arrays of these.
The types available are sufficient to contain pretty much any
data.  BLOBs and CLOBs can be encoded with datatype of char and 
unsignedByte and an appropriate fixed or variable arraysize.
A datetime can be encoded in ISO8601 form and put in a string
(datatype="char" arraysize="*") column, or possibly encoded as
some kind of scalar or array numeric value such as a double MJD.

The problem is not encoding the data in a VOTable with the existing
range of types, but for readers of the table to understand what's
meant - if a column is understood to represent a time stamp
(which can be done using a UCD or a utype) and it's a double, 
is it a JD or MJD?  If it's a string is it an ISO-8601 string 
or something like "Tuesday, 13th March" (hopefully not the latter, 
but you get the idea).

> The alternative obviously would be trying to handle theses things
> through UCDs and units.  This is how I currently treat datetimes
> (e.g. unit=d yields a JD float, except when the UCD contains MJD, and
> then it's an MJD float; unit=y-m-d yields an ISO string, etcetc.)
> Needless to say, I hate it.  Let's not go down that route.

I agree that using UCDs and units for this is nasty.  I also do
it where I don't have a better solution (for instance there is an
informal and technically illegal convention that datatype="char"
and units="hms"/"dms" indicates sexagesimal), and wish I didn't have to.
The problem is that whether (e.g.) a string is an ISO-8601 date
is not a "unit" and neither is it a UCD-type thing (or indeed a
utype-type thing).

I believe the best way to handle this would be to introduce an
additional attribute on VOTable FIELDs (columns) called something
like "format".  The interpretation of the content of this field
need not (and probably should not) be specified by the VOTable 
standard itself, but by VOTable 'customers', for instance the TAP
standard, or user communities.  A string column labelled 
format="iso8601" (or possibly something like format="tap:iso8601",
or indeed, format="postgresql:MAC") could be processed as a string 
by generic VOTable libraries without requiring any changes to the 
VOTable standard or implementations, but an application which 
understood this format type could do something sensible with it 
as appropriate.

Having these conventions outside the VOTable standard itself means
that the VOTable standard does not have to keep undergoing
changes as different user communities find that the type they need
is not currently offered by VOTable.  This benefits the VOTable
standard (more stable) and the user communities (less effort 
adapting it to their uses).

I made basically this suggestion in 2006 on the VOTable mailing list
(http://www.ivoa.net/forum/votable/0604/0815.htm); it generated a lot
of mail traffic but didn't otherwise get very far that time around.
I just thought I'd try my luck again.

Mark

-- 
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/



More information about the dal mailing list