[TAP] data type for column metadata

Patrick Dowler patrick.dowler at nrc-cnrc.gc.ca
Wed Mar 25 10:23:04 PDT 2009


On 2009-03-25 08:13:26 Markus Demleitner wrote:
> The discussion started as "How do we say, in TAP metadata, how a
> given quantity is stored in a database".  Since the database is being
> queried through ADQL, and column types restrict what queries are
> syntactically legal, the question probably should be phrased: "How
> does a certain column look to ADQL?".
>
> A second point was "How is a value for that column transferred?"
> (i.e., do we want a timestamp type or geometry types in VOTable?).
> This is a different, and at least equally involved question.

Agreed - this should be decoupled from the "query writing metadata".

> Every nontrivial ADQL query will have lots of pitfalls, and the
> resulting tables will require lots of interpretation by both the
> client and the human using it.  I'm afraid that's what science is
> about, and trying to push much of it into protocols will turn their
> descriptions into physics textbooks.  This may be a good thing, but
> will take ages.

Agree 100%. 

> Well, but for both TAP and VOTable data types only the fact that
> *some* time is being stored here is relevant.  What time scale it is
> in, whether it's the time some GRB could be detected on Antares or
> the start of an exposure has nothing to do with data or legal queries
> but is entirely an issue of metadata and interpretation.
>
> I'm not saying it's unimportant, I'm just saying that it doesn't
> matter for purposes of storing or transferring data.  I would also
> maintain that it doesn't matter for writing queries, since, as
> pointed out above, I have strong doubts that we want to build so
> much physics into your protocol that queries "violating physics"
> become some kind of error.

Also agree 100%. There are many many ways to write ADQL (SQL) queries that 
generate nonsense results. Only the query writer can do anything about that.


> [My take: *If* we want to expose metadata this detailed up front, why not
> use the proven techique of DAL v1 and define some way to retrieve
> an "empty" VOTable for a given query?  There are pitfalls in this as
> well, and I'd rather see an implementation of this before it ends up
> in a standard, but I think that would be the least intrusive way]

I recall that this option is available by setting MAXREC=0 when doing the 
query. However, Gerard has argued in the past that populating the result 
VOTable metadata from the query can be simple (if the select contains column 
names) and very complex if it contains expressions.

> > Please do not use ISO-8601 with timezone indicators - just the
> > restricted form I gave above. Only use UTC - Timezone 0 - conforming
> > with current astronomical usage.
>
> Full support here.  *Don't* mess with time zones, DST, and all those
> other horrors.  The laws of physics are complex enough, we don't want
> to have to account for the laws of 200+ governments on this planet.

The reason I mentioned timezone is that without it people have to know/assume 
the timezone is to correctly parse the string into a number, which is how 
software actually works. Now, we can just say "all timestamps are in UTC" but 
that is something we have to say and there will be existing databases out 
there with timestamps in local time. 

I am OK with requiring UTC and avoiding time zones (which are a pain), but the 
implication of that is that TAP services with non-UTC timestamps must:

* be able to parse the query and transform all date constants from UTC to the 
timezone they use
* convert selected values from local -> UTC in the output

I have argued elsewhere that I think TAP services will need to parse the query 
anyway due to their own DB behaving differently than the ADQL+TAP behaviour, 
so I don't personally think this is a major impediment. It is the implication 
though.

Note: It is quite possible that if we did the opposite that services would 
still have to parse and deal with the serialised timestamp outside the DB 
anyway (say their DB doesn't like the format with timezone). 


> So -- I'll support basically any choice of SQL types being part of
> TAP metadata; I think we definitely should have timestamps in there
> (though of course, people are free to store times in floats, integers
> or varchars if that's appropriate; it's just that there is one
> standard way to store and transfer timestamps, and everything else
> has to be manually supported).

I take this as a proposal for one allowed iso8601 timestamp variant (mentioned 
earlier, without timezone) and a requirement that all timestamps are in UTC. 
Correct? 

Comments? Votes?

> A quick inspection of the ADQL
> grammar seems to suggest that it's POINT and REGION indeed:
>
> <coord_value> ::= <point> | <column_reference>
>
> -- which singles out POINT from other geometry value expressions.
>
> This reasoning would break if we anticipated the need to define
> functions taking only certain types of geometries.  Do we?

I don't anticipate more functions that take specific geometry types as 
arguments. I think one can look at this as there only being just the two 
types and the other functions (CIRCLE, BOX, POLYGON) as being constructors 
that also create a REGION. In OO one might have REGION as the base class and 
these constructors imply the presence of subclasses.... but SQL is not OO so 
I don't think one is forced to interpret it that way. 


-- 

Patrick Dowler
Tel/Tél: (250) 363-0044
Canadian Astronomy Data Centre
National Research Council Canada
5071 West Saanich Road
Victoria, BC V9E 2M7

Centre canadien de donnees astronomiques
Conseil national de recherches Canada
5071, chemin West Saanich
Victoria (C.-B.) V9E 2M7



More information about the dal mailing list