[TAP] data type for column metadata

Arnold Rots arots at head.cfa.harvard.edu
Wed Mar 25 06:45:00 PDT 2009


Patrick Dowler wrote:
[ Charset ISO-8859-1 unsupported, converting... ]
> 
> I will take a stab at these as it is an extensive list of the issues involved.
> 
> This is more "how to deal with these in an RDBMS" rather than "the right way 
> to deal with them", but I think that is the issue here :-) 

Pat,
My concern is that the present discussion has jummped to "what is the
right way" to deal with numbers without first asking what it is that
we are dealing with.

> 
> On 2009-03-24 13:27:31 Arnold Rots wrote:
> > For a Timestamp:
> > What is the data type and precision?
> > ? Could be datetime
> > ? Or could be a floating point number
> 
> True. We use timestamp (aka datetime) for lastModified values and release 
> dates; we use double for observation start and end times. But see below for 
> the rest of the story.
> 
> So from another post, the datatype would be TIMESTAMP or DOUBLE.

OK

> 
> > What kind of parameter does it represent?
> > ? OK, that would be time instant, if we are talking about timestamps
> 
> Yes. In response to Gerard's list of data types in SQL, I mentioned that I 
> have never found a good use for either DATE or TIME types - only TIMESTAMP. 
> Maybe other people have a different experience?

I'm not talking about a data type, here, but about a concept: it is a
generic time stamp, not a spatial coordinate, not a duration, etc.

> 
> > If it is a coordinate value, what coordinate system does it refer to?
> > ? The Time Scale (TT, UTC, TAI, GPS, TCG, TDB, TCB, ...)
> 
> It turns out that RDBMSs vary in their treatment of time zone. In practice we 
> use either local time (for lastModified timestamps), UTC (for data release 
> dates), and MJD (for observation start and end times). For the timestamp 
> values, the application has to "know" the timezone in order to extract the 
> value from the DB correctly. Even for the MJD values, we just read the double 
> from the DB but the application still "knows" it is an MJD at some level. 
> 
> Now, we chose MJD for "astronomical times" because it is much easier to 
> compute things (histograms, statictics, etc) when the number is directly 
> accessible to SQL.

Sorry, here you fall off the cliff: MJD DOES NOT IMPLY A TIME SCALE!
For "astronomical" times we need to know Time Scale (see list above),
Reference Position (see below), and possibly the zero point (see below).
Somehow that information must be included or implied in the quesry and
somehow the service must know what the Time Reference Frame is that
its "astronomical" times are tied to.
All other time stamps, such as modification times, should ideally be
in UTC on the surface of the earth: i.e., convert your timezone stamps
to UTC (which is trivial if you know your timezone, of course); this
would prevent a lot of confusion, especially in the worldwide setting
that is the VO.

> 
> > ? The Reference Position
> > ? If it is relative (elapsed) time, the time zero point
> 
> If by this you mean something like exposure time, you just need a numeric 
> value and have to know the units. The rest depends on the data model (see 
> below). You do not need to know the zero point to express an amount of time. 
> If you want to express a time interval, that could be done with a start,end 
> or a start,duration -- in which case you have the zero-point as fully 
> specified as you can (given the other points). 

No - in many time series datasets time stamps are provided as time
eleapsed since a zero point, or a reference time, in MJD, for instance
(the MJDREF keyword in many FITS files). But you still need to know
the other three elements of the time refernce frame.

> 
> (Note: I was not successful in getting an interval type into ADQL; that means 
> TAP services would have to expose separate columns for start and end or start 
> and duration and the user would have to use the two together).

Yes, this is a differnt issue.

>  
> > How is it represented?
> > ? ISO-8601 (with the CCYY-MM-DD[Thh:mm:ss[.s...]] restriction)
> 
> I am assuming from context that you mean this in the sense of how are values 
> exchanged between client and service. This is important when you go to 
> serialise a value, presumably to give it to someone else (eg some other piece 
> of software). I would argue that for timestamps you have to include the 
> timezone in that ISO-8601 variant above, eg: CCYY-MM-DDThh:mm:ss.sZ in order 
> to carry all the necessary information. Otherwise, the recipient has to 
> assume the timezone in order to parse into the numeric date value (that most 
> software actually uses). Most software libraries will happily parse and 
> assume "local" timezone, which in the VO will mostly be wrong :-)

Please do not use ISO-8601 with timezone indicators - just the
restricted form I gave above. Only use UTC - Timezone 0 - conforming
with current astronomical usage.

> 
> > ? JD
> > ? MJD
> 
> Since they are numbers (probably double) these are expressed with the usual 
> arabic symbols. That does mean that it is not so self-contained as an 
> ISO-8601 format w/ timezone as above. I do not know where one would say a 
> column is JD vs MJD... hopefully someplace more machine-usable than the 
> documentation or comments :)

Well, that is the issue: you need to know what you are talking about,
in the queries and the results, in order to figure out what it means.

> 
> > Where does it fit into the information object?
> > ? E.g., the time a photon was received
> > ? or the time the record was recorded
> > ? or the time this particualr file was written
> 
> These do not have anything to do with TAP per se. The TAP metadata and the 
> VOTable output format allow for utypes to be attached to columns and if there 
> is a data model then that mechanism could be used... but there need not be a 
> data model at all, in which case users just have to "know" (eg learn 
> out-of-band) what the content means. That is the necessary nature of a 
> low-level protocol, IMO.

The point is that Utypes will not tell you this kind of information:
you can query a database, get times (or spatial coordinates) back,
with or without utypes and still have no idea what they refer to.
This transcends data models and it's not clear that there is a
mechanism to learn that type of information...

> 
> I hope this helps clarify; it is long enough that it may well not :-(

I'll say the same ;-)

  - Arnold

> 
> -- 
> 
> Patrick Dowler
> Tel/T?l: (250) 363-0044
> Canadian Astronomy Data Centre
> National Research Council Canada
> 5071 West Saanich Road
> Victoria, BC V9E 2M7
> 
> Centre canadien de donnees astronomiques
> Conseil national de recherches Canada
> 5071, chemin West Saanich
> Victoria (C.-B.) V9E 2M7
> 
--------------------------------------------------------------------------
Arnold H. Rots                                Chandra X-ray Science Center
Smithsonian Astrophysical Observatory                tel:  +1 617 496 7701
60 Garden Street, MS 67                              fax:  +1 617 495 7356
Cambridge, MA 02138                             arots at head.cfa.harvard.edu
USA                                     http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------



More information about the dal mailing list