[TAP] Summary: data type for column metadata

Patrick Dowler patrick.dowler at nrc-cnrc.gc.ca
Fri Apr 17 10:28:09 PDT 2009


On Friday 17 April 2009 08:15:14 Gerard wrote:
> Hi Pat
> A somewhat long comment again, just before the weekend.
>
> >> I like this list.  I think we also need to include a recommendation
> >> (at
> >> least) for how these should be mapped to VOTable types.  This is not
> >> only for consistency in TAP responses, but also for describing a table
> >> in the registry outside the context of TAP (e.g. describing the table
> >> returned from an SIA query).  Does this seem reasonable?
> >>     ADQL type      VOTable
> >>     BOOLEAN        boolean
> >>     SMALLINT       short
> >>     REAL           float
> >>     DOUBLE         double
> >>     TIMESTAMP      char arraysize="*", (format?)
> >>                     (or is it numeric?)
>
> I personally think a mapping fro ADQL->VOTable should be mandated. But it
> has to go both ways doesn't it?
> If I upload a VOTable into TAP_UPLOAD and write a query against it, I must
> be allowed some expectation on the
> ADQL datatypes of the columns in the table that was uploaded.
> In particular I wonder whether we can insist that the following two
> equivalent actions MUST have the same result:

Yes, the is correct. The caller has to know/predict the datatypes of columns 
in an uploaded VOTable or they will not be able to write an syntactically 
correct query. If the service is allowed to arbitrarily map types, then we 
would be limited to:

- service has to expose metadata for tables in TAP_UPLOAD schema
- callers could not combine table upload and synchronous queries: the table 
and query are all provided in a single (http) request so there is no 
opportunity to check the actual metadata

For the same reason, the mapping of column name <-> FIELD name attribute has 
to also be bi-directional.... (see below)

> The problem is that the above list is not invertbile. And it seems that
> currently VOTable does not give an option to make this so.
> I guess that unless VOTable's datatype systems is expanded, something like
> Mark's proposal SHOULD be available.

Yes, one needs some way to map 
TIMESTAMP <-> char, *, tap:ISO8601
REGION <-> char, *, STC-S
etc.

> > This does mean that TAP should specify how the column names
> > in TAP_SCHEMA (and the query) relate to the FIELDs in the
> > VOTable? Specifically, which attribute of FIELD holds the
> > (fully-qualified) column name? Otherwise, people could not
> > put the two together and understand the VOTable fully.
>
> I think it is a natural expectation that when a name can be identified for
> a column in a query result that that name is used as the name attribute of
> a FIELD.
> when using "select * from sometable" these have to be inferred, when using
> "select a as b, c as d" I expect b and d as names, not a and c.
> This seems to me pretty straightforward. I can see problems to arise
> because it is legal in ADQL (isn't it?) to pose (example from Markus) :
>
> select a+b
>   from sometable
>
> I.e. no column name can be inferred.
>
> It is also potentially difficult for clients to deal with legal ADQL
> queries that return multiple columns with the same name.
> At least TOPCAT's plugin accessing GAVO's Millennium site returns only a
> single set of columns from the Snapshots table when running the
> following SQL:
> select * from snapshots s1, snapshots s2
> where s1.snapnum=s2.snapnum

I think in both of these cases, all we can do is recommend that query writers 
provide names via AS, eg

select a+b AS a_plus_b
from sometable

select s1.foo as s1_foo, s1.bar as s1_bar, s2.foo as s2_foo, s2.bar as s2_bar
from snapshots s1, snapshots s2
where s1.snapnum=s2.snapnum

so that the result set has legal and distinct column names that the user will 
understand. Furthermore, if the user does this, the resulting VOTable will 
have FIELDs with distinct name attributes that are legal ADQL column names, 
which means the table could in principle be uploaded to a (different) TAP 
service without modification. Well, there could be other gotchas but this is a 
hard technical requirement for upload already.

* PROPOSAL*

A specific mapping of ADQL <-> VOTable datatypes (+formats or something, TBD) 
will be mandated. Further, we also mandate the mapping of column names <-> 
VOTable FIELD names (plain column name, not qualified with table/alias). 
Anything less will make synchronous queries w/ upload basically impossible.

The user has the capability, with ADQL at least, to control the naming in the 
output table. If they do not do that (do not use AS) then the service must 
still produce a table with distinct FIELD name(s), but they can be named 
arbitrarily. 

note: For other query languages, if they provide a join and/or expression 
capability, they would probably also have to provide such a naming mechanism 
to avoid ambiguities and/or illegal column names.


-- 

Patrick Dowler
Tel/Tél: (250) 363-0044
Canadian Astronomy Data Centre
National Research Council Canada
5071 West Saanich Road
Victoria, BC V9E 2M7

Centre canadien de donnees astronomiques
Conseil national de recherches Canada
5071, chemin West Saanich
Victoria (C.-B.) V9E 2M7



More information about the dal mailing list