[TAP] data type for column metadata

Patrick Dowler patrick.dowler at nrc-cnrc.gc.ca
Thu Mar 26 09:47:47 PDT 2009


On 2009-03-25 20:43:51 Douglas Tody wrote:
> Reading through this long discussion of column data types/formats my
> main observation is that we appear to be confusing data models and
> data semantics (understanding precisely what the table columns refer
> to) with the data access mechanism.  In the case of TAP, or SQL/ADQL
> etc., what we have is a relatively low level table access mechanism.
> The same thing is true of something like VOTable which is merely a
> generic table container.

Yes, we need to keep in mind that TAP (and ADQL specifically) provides a 
lower-level access with no semantics included. Users will need to know 
something to write the query and we can provide some of that (again, the low 
level details).

Note: where ever I say "write a query" I mean more specifically "use in the 
where clause" (for example, there are other uses) because you don't need to 
know anything to do "SELECT * FROM someTable"

So, I'll take another stab at where we are with column metadata, starting at 
the lowest levels.

* column name: a legal column name as defined by ADQL

* datatype: the current draft uses VOTable types but we have shown several 
places where that is not adequate and concluded that we need to use a list of 
types from ADQL/SQL 

* units: current draft includes units so the service can tell the client how 
to express numeric values

With column name, datatype, and units (eg length,DOUBLE,quadloobs) one can 
write simple queries like

SELECT * FROM someTable where price > 3

to find all things with a price larger than 16 terkats. Simple, basic stuff. 
Note that it is the client's job to convert 16 terkats into 3 (quadloobs) 
since you cannot specify the units in syntactially legal ADQL.

From here we have to add more metadata so people can query on some of the more 
complex values. 

* system: I propose that column metadata include a place to specify the 
coordinate system applicable to values; name TBD depending on what in 
actually needs to contain

Question for Arnold: Can we usefully put some string from STC in here, as we 
intend to require users do when using the region functions? Is this in fact 
the same list, but here also including energy and time as well? Is it enough:  
can you include both the scale and zero-point? Aiming to be useful here...

Example 1 metadata: the_date,TIMESTAMP,?,GREGORIAN+UTC

SELECT * FROM the_table WHERE the_date >= '2009-01-01T0:0:0'

Example 2 metadata: the_date,DOUBLE,d,MJD+UTC

SELECT * FROM the_table WHERE the_date >= 53000.0

Aside from the fact that the two values don't agree with each other (because I 
can't do MJD <-> calender date in my head :-) is this plausible and correct?

Q. Are the units of the first one dimensionless? seconds?

Q. Can you express both MJD and UTC in the "coordsys" metadata?

Q. Do we really need to specify Gregorian? Technically yes...

Note: This may well look redundant with the fact that REGION and related 
functions allow the user to specify an absolute value (coordinate system + 
coordinates). I think that is OK. The metadata specifies the coordinate 
system that the service uses, which implies that all queries written in that 
system will not require transformation and thus will work. Queries can be 
written in other coordinate systems  - spatial anyway - but they will fail if 
the service cannot or will not do the transformation. I think that is fine: 
simple clients can fire off queries in their favourite coordinate system, let 
the service do the work, and risk failures; more powerful clients can check 
and do the transformation client side to avoid one failure mode. For energy 
and time queries, the client will have to do the transformations; depending 
on how things work in practice, maybe a future revision of ADQL will follow 
the style of REGION et al, or maybe  not.

If this is missing something. 





* ucd: current draft includes UCD for when a service can semantically tag a 
column; UCDs do not have a direct use in writing queries, they only help the 
user to understand the content

* utype: current draft includes type for when a table contains multiple 
columns from a specific data model and the service wants to express that;
-- 

Patrick Dowler
Tel/Tél: (250) 363-0044
Canadian Astronomy Data Centre
National Research Council Canada
5071 West Saanich Road
Victoria, BC V9E 2M7

Centre canadien de donnees astronomiques
Conseil national de recherches Canada
5071, chemin West Saanich
Victoria (C.-B.) V9E 2M7



More information about the dal mailing list