[TAP] data type for column metadata

Patrick Dowler patrick.dowler at nrc-cnrc.gc.ca
Tue Mar 24 17:02:59 PDT 2009


I will take a stab at these as it is an extensive list of the issues involved.

This is more "how to deal with these in an RDBMS" rather than "the right way 
to deal with them", but I think that is the issue here :-) 

On 2009-03-24 13:27:31 Arnold Rots wrote:
> For a Timestamp:
> What is the data type and precision?
>   Could be datetime
>   Or could be a floating point number

True. We use timestamp (aka datetime) for lastModified values and release 
dates; we use double for observation start and end times. But see below for 
the rest of the story.

So from another post, the datatype would be TIMESTAMP or DOUBLE.

> What kind of parameter does it represent?
>   OK, that would be time instant, if we are talking about timestamps

Yes. In response to Gerard's list of data types in SQL, I mentioned that I 
have never found a good use for either DATE or TIME types - only TIMESTAMP. 
Maybe other people have a different experience?

> If it is a coordinate value, what coordinate system does it refer to?
>   The Time Scale (TT, UTC, TAI, GPS, TCG, TDB, TCB, ...)

It turns out that RDBMSs vary in their treatment of time zone. In practice we 
use either local time (for lastModified timestamps), UTC (for data release 
dates), and MJD (for observation start and end times). For the timestamp 
values, the application has to "know" the timezone in order to extract the 
value from the DB correctly. Even for the MJD values, we just read the double 
from the DB but the application still "knows" it is an MJD at some level. 

Now, we chose MJD for "astronomical times" because it is much easier to 
compute things (histograms, statictics, etc) when the number is directly 
accessible to SQL.

>   The Reference Position
>   If it is relative (elapsed) time, the time zero point

If by this you mean something like exposure time, you just need a numeric 
value and have to know the units. The rest depends on the data model (see 
below). You do not need to know the zero point to express an amount of time. 
If you want to express a time interval, that could be done with a start,end 
or a start,duration -- in which case you have the zero-point as fully 
specified as you can (given the other points). 

(Note: I was not successful in getting an interval type into ADQL; that means 
TAP services would have to expose separate columns for start and end or start 
and duration and the user would have to use the two together).
 
> How is it represented?
>   ISO-8601 (with the CCYY-MM-DD[Thh:mm:ss[.s...]] restriction)

I am assuming from context that you mean this in the sense of how are values 
exchanged between client and service. This is important when you go to 
serialise a value, presumably to give it to someone else (eg some other piece 
of software). I would argue that for timestamps you have to include the 
timezone in that ISO-8601 variant above, eg: CCYY-MM-DDThh:mm:ss.sZ in order 
to carry all the necessary information. Otherwise, the recipient has to 
assume the timezone in order to parse into the numeric date value (that most 
software actually uses). Most software libraries will happily parse and 
assume "local" timezone, which in the VO will mostly be wrong :-)

>   JD
>   MJD

Since they are numbers (probably double) these are expressed with the usual 
arabic symbols. That does mean that it is not so self-contained as an 
ISO-8601 format w/ timezone as above. I do not know where one would say a 
column is JD vs MJD... hopefully someplace more machine-usable than the 
documentation or comments :)

> Where does it fit into the information object?
>   E.g., the time a photon was received
>   or the time the record was recorded
>   or the time this particualr file was written

These do not have anything to do with TAP per se. The TAP metadata and the 
VOTable output format allow for utypes to be attached to columns and if there 
is a data model then that mechanism could be used... but there need not be a 
data model at all, in which case users just have to "know" (eg learn 
out-of-band) what the content means. That is the necessary nature of a 
low-level protocol, IMO.

I hope this helps clarify; it is long enough that it may well not :-(

-- 

Patrick Dowler
Tel/Tél: (250) 363-0044
Canadian Astronomy Data Centre
National Research Council Canada
5071 West Saanich Road
Victoria, BC V9E 2M7

Centre canadien de donnees astronomiques
Conseil national de recherches Canada
5071, chemin West Saanich
Victoria (C.-B.) V9E 2M7



More information about the dal mailing list