[TAP] Summary: data type for column metadata
Patrick Dowler
patrick.dowler at nrc-cnrc.gc.ca
Fri Apr 17 10:28:09 PDT 2009
On Friday 17 April 2009 08:15:14 Gerard wrote:
> Hi Pat
> A somewhat long comment again, just before the weekend.
>
> >> I like this list. I think we also need to include a recommendation
> >> (at
> >> least) for how these should be mapped to VOTable types. This is not
> >> only for consistency in TAP responses, but also for describing a table
> >> in the registry outside the context of TAP (e.g. describing the table
> >> returned from an SIA query). Does this seem reasonable?
> >> ADQL type VOTable
> >> BOOLEAN boolean
> >> SMALLINT short
> >> REAL float
> >> DOUBLE double
> >> TIMESTAMP char arraysize="*", (format?)
> >> (or is it numeric?)
>
> I personally think a mapping fro ADQL->VOTable should be mandated. But it
> has to go both ways doesn't it?
> If I upload a VOTable into TAP_UPLOAD and write a query against it, I must
> be allowed some expectation on the
> ADQL datatypes of the columns in the table that was uploaded.
> In particular I wonder whether we can insist that the following two
> equivalent actions MUST have the same result:
Yes, the is correct. The caller has to know/predict the datatypes of columns
in an uploaded VOTable or they will not be able to write an syntactically
correct query. If the service is allowed to arbitrarily map types, then we
would be limited to:
- service has to expose metadata for tables in TAP_UPLOAD schema
- callers could not combine table upload and synchronous queries: the table
and query are all provided in a single (http) request so there is no
opportunity to check the actual metadata
For the same reason, the mapping of column name <-> FIELD name attribute has
to also be bi-directional.... (see below)
> The problem is that the above list is not invertbile. And it seems that
> currently VOTable does not give an option to make this so.
> I guess that unless VOTable's datatype systems is expanded, something like
> Mark's proposal SHOULD be available.
Yes, one needs some way to map
TIMESTAMP <-> char, *, tap:ISO8601
REGION <-> char, *, STC-S
etc.
> > This does mean that TAP should specify how the column names
> > in TAP_SCHEMA (and the query) relate to the FIELDs in the
> > VOTable? Specifically, which attribute of FIELD holds the
> > (fully-qualified) column name? Otherwise, people could not
> > put the two together and understand the VOTable fully.
>
> I think it is a natural expectation that when a name can be identified for
> a column in a query result that that name is used as the name attribute of
> a FIELD.
> when using "select * from sometable" these have to be inferred, when using
> "select a as b, c as d" I expect b and d as names, not a and c.
> This seems to me pretty straightforward. I can see problems to arise
> because it is legal in ADQL (isn't it?) to pose (example from Markus) :
>
> select a+b
> from sometable
>
> I.e. no column name can be inferred.
>
> It is also potentially difficult for clients to deal with legal ADQL
> queries that return multiple columns with the same name.
> At least TOPCAT's plugin accessing GAVO's Millennium site returns only a
> single set of columns from the Snapshots table when running the
> following SQL:
> select * from snapshots s1, snapshots s2
> where s1.snapnum=s2.snapnum
I think in both of these cases, all we can do is recommend that query writers
provide names via AS, eg
select a+b AS a_plus_b
from sometable
select s1.foo as s1_foo, s1.bar as s1_bar, s2.foo as s2_foo, s2.bar as s2_bar
from snapshots s1, snapshots s2
where s1.snapnum=s2.snapnum
so that the result set has legal and distinct column names that the user will
understand. Furthermore, if the user does this, the resulting VOTable will
have FIELDs with distinct name attributes that are legal ADQL column names,
which means the table could in principle be uploaded to a (different) TAP
service without modification. Well, there could be other gotchas but this is a
hard technical requirement for upload already.
* PROPOSAL*
A specific mapping of ADQL <-> VOTable datatypes (+formats or something, TBD)
will be mandated. Further, we also mandate the mapping of column names <->
VOTable FIELD names (plain column name, not qualified with table/alias).
Anything less will make synchronous queries w/ upload basically impossible.
The user has the capability, with ADQL at least, to control the naming in the
output table. If they do not do that (do not use AS) then the service must
still produce a table with distinct FIELD name(s), but they can be named
arbitrarily.
note: For other query languages, if they provide a join and/or expression
capability, they would probably also have to provide such a naming mechanism
to avoid ambiguities and/or illegal column names.
--
Patrick Dowler
Tel/Tél: (250) 363-0044
Canadian Astronomy Data Centre
National Research Council Canada
5071 West Saanich Road
Victoria, BC V9E 2M7
Centre canadien de donnees astronomiques
Conseil national de recherches Canada
5071, chemin West Saanich
Victoria (C.-B.) V9E 2M7
More information about the dal
mailing list