[TAP] data type for column metadata
Patrick Dowler
patrick.dowler at nrc-cnrc.gc.ca
Thu Mar 26 09:47:47 PDT 2009
On 2009-03-25 20:43:51 Douglas Tody wrote:
> Reading through this long discussion of column data types/formats my
> main observation is that we appear to be confusing data models and
> data semantics (understanding precisely what the table columns refer
> to) with the data access mechanism. In the case of TAP, or SQL/ADQL
> etc., what we have is a relatively low level table access mechanism.
> The same thing is true of something like VOTable which is merely a
> generic table container.
Yes, we need to keep in mind that TAP (and ADQL specifically) provides a
lower-level access with no semantics included. Users will need to know
something to write the query and we can provide some of that (again, the low
level details).
Note: where ever I say "write a query" I mean more specifically "use in the
where clause" (for example, there are other uses) because you don't need to
know anything to do "SELECT * FROM someTable"
So, I'll take another stab at where we are with column metadata, starting at
the lowest levels.
* column name: a legal column name as defined by ADQL
* datatype: the current draft uses VOTable types but we have shown several
places where that is not adequate and concluded that we need to use a list of
types from ADQL/SQL
* units: current draft includes units so the service can tell the client how
to express numeric values
With column name, datatype, and units (eg length,DOUBLE,quadloobs) one can
write simple queries like
SELECT * FROM someTable where price > 3
to find all things with a price larger than 16 terkats. Simple, basic stuff.
Note that it is the client's job to convert 16 terkats into 3 (quadloobs)
since you cannot specify the units in syntactially legal ADQL.
From here we have to add more metadata so people can query on some of the more
complex values.
* system: I propose that column metadata include a place to specify the
coordinate system applicable to values; name TBD depending on what in
actually needs to contain
Question for Arnold: Can we usefully put some string from STC in here, as we
intend to require users do when using the region functions? Is this in fact
the same list, but here also including energy and time as well? Is it enough:
can you include both the scale and zero-point? Aiming to be useful here...
Example 1 metadata: the_date,TIMESTAMP,?,GREGORIAN+UTC
SELECT * FROM the_table WHERE the_date >= '2009-01-01T0:0:0'
Example 2 metadata: the_date,DOUBLE,d,MJD+UTC
SELECT * FROM the_table WHERE the_date >= 53000.0
Aside from the fact that the two values don't agree with each other (because I
can't do MJD <-> calender date in my head :-) is this plausible and correct?
Q. Are the units of the first one dimensionless? seconds?
Q. Can you express both MJD and UTC in the "coordsys" metadata?
Q. Do we really need to specify Gregorian? Technically yes...
Note: This may well look redundant with the fact that REGION and related
functions allow the user to specify an absolute value (coordinate system +
coordinates). I think that is OK. The metadata specifies the coordinate
system that the service uses, which implies that all queries written in that
system will not require transformation and thus will work. Queries can be
written in other coordinate systems - spatial anyway - but they will fail if
the service cannot or will not do the transformation. I think that is fine:
simple clients can fire off queries in their favourite coordinate system, let
the service do the work, and risk failures; more powerful clients can check
and do the transformation client side to avoid one failure mode. For energy
and time queries, the client will have to do the transformations; depending
on how things work in practice, maybe a future revision of ADQL will follow
the style of REGION et al, or maybe not.
If this is missing something.
* ucd: current draft includes UCD for when a service can semantically tag a
column; UCDs do not have a direct use in writing queries, they only help the
user to understand the content
* utype: current draft includes type for when a table contains multiple
columns from a specific data model and the service wants to express that;
--
Patrick Dowler
Tel/Tél: (250) 363-0044
Canadian Astronomy Data Centre
National Research Council Canada
5071 West Saanich Road
Victoria, BC V9E 2M7
Centre canadien de donnees astronomiques
Conseil national de recherches Canada
5071, chemin West Saanich
Victoria (C.-B.) V9E 2M7
More information about the dal
mailing list