table_name syntax
Dave Morris
dave.morris at metagrid.co.uk
Tue Apr 28 04:23:02 CEST 2015
On 2015-04-27 15:00, Mark Taylor wrote:
> Hi DAL,
>
> I have an arcane query about how to represent and interpret TAP
> service table names that appear in
> (a) the table_name column of the TAP_SCHEMA.tables table
> (b) tableset/schema/table/name elements in a TableSet document
>
Short answer is yes. The server has to add quotes to the catalog,
schema, table and column names.
So this is correct.
+-------------+---------------------+
| schema_name | table_name |
+-------------+---------------------+
| viz4 | "B/avo.rad/catalog" |
| viz4 | "B/avo.rad/wsrt" |
| viz4 | "B/bax/bax" |
+-------------+---------------------+
Longer answer ... the wording of the current specification means that
all of these are valid entries for the same table.
+-------------+-----------------------------+
| schema_name | table_name |
+-------------+-----------------------------+
| viz4 | "B/avo.rad/catalog" |
| viz4 | viz4."B/avo.rad/catalog" |
| viz4 | "viz4"."B/avo.rad/catalog" |
| "viz4" | "B/avo.rad/catalog" |
| "viz4" | viz4."B/avo.rad/catalog" |
| "viz4" | "viz4"."B/avo.rad/catalog" |
+-------------+-----------------------------+
With a regular table name, you have even more variants, both parts may,
or may not, be quoted and the table_name may, or may not, be qualified.
+-------------+-----------------+
| schema_name | table_name |
+-------------+-----------------+
| twomass | psc |
| twomass | "psc" |
| twomass | twomass.psc |
| twomass | "twomass"."psc" |
| twomass | "twomass".psc |
| twomass | twomass."psc" |
| "twomass" | psc |
| "twomass" | "psc" |
| "twomass" | twomass.psc |
| "twomass" | "twomass"."psc" |
| "twomass" | "twomass".psc |
| "twomass" | twomass."psc" |
+-------------+-----------------+
Which makes it hard to search for table names that start with a specific
pattern,
SELECT .. WHERE schema_name = 'twomass' AND table_name LIKE 'ps%'
would only match one of the possible combinations.
-------- --------
Also, the current ADQL specification explicitly includes a syntax for
including double quotes inside a quoted identifier.
So it would be valid to have a schema name with double quotes in it,
viz"4", and a table name with double quotes in it, "B"/avo.rad/catalog.
Resulting in either of these combinations
+-------------+------------------------------------+
| schema_name | table_name |
+-------------+------------------------------------+
| "viz""4""" | """B""/avo.rad/catalog" |
| "viz""4""" | "viz""4"""."""B""/avo.rad/catalog" |
+-------------+------------------------------------+
which makes it hard to do reliable string matches without drowning in
quotes.
-------- --------
In the current ADQL specification the definition of an identifier is
<identifier> ::= <regular_identifier> | <delimited_identifier>
Where a regular_identifier is explicitly restricted to [0-9a-zA-Z_], but
a delimited_identifier is defined as
<delimited_identifier> ::=
<double_quote> <delimited_identifier_body> <double_quote>
<delimited_identifier_body> ::= <delimited_identifier_part>...
<delimited_identifier_part> ::=
<nondoublequote_character> | <double_quote_symbol>
The escaped double quote is defined as
<double_quote_symbol> ::= <double_quote><double_quote>
However, the definitions for nondoublequote_character and
nonquote_character are both blank
<nondoublequote_character> ::=
<nonquote_character> ::=
So apart from explicitly allowing double quotes, the specification
doesn't actually define the list of allowed characters for a delimited
identifier.
** This is a bug and needs fixing in the next version anyway **
-------- --------
I would like to suggest we take this opportunity to define a minimal
list of allowed characters for a delimited identifier, without single or
double quotes, percent, spaces, tabs or anything else that might cause
problems in SELECT statements.
As far as I know, at the moment the Vizier tables are the only service
using anything outside the range of [a-zA-Z0-9_]. So we could start with
the minimal set of characters that would enable the Vizier tables to
keep their current names and we explicitly exclude everything else until
someone raises a specific case for another character to be included.
This would allow Vizier tables to keep their current names and should
not affect anyone else. However, it does limit the problem to as small a
set of additional characters as possible and may help to prevent further
problems before they arise.
-------- --------
I would also like to request we add two new columns to TAP_SCHEMA,
'raw_schema_name' and 'raw_table_name', which always contain the raw,
unqualified, unquoted, schema and table names.
+-------------+-----------------------------+--------------------+
| schema_name | table_name | raw_schema_name |
+-------------+-----------------------------+--------------------+
| viz4 | "B/avo.rad/catalog" | viz4 |
| viz4 | viz4."B/avo.rad/catalog" | viz4 |
| viz4 | "viz4"."B/avo.rad/catalog" | viz4 |
| "viz4" | "B/avo.rad/catalog" | viz4 |
| "viz4" | viz4."B/avo.rad/catalog" | viz4 |
| "viz4" | "viz4"."B/avo.rad/catalog" | viz4 |
+-------------+-----------------------------+--------------------+
+-------------+-----------------------------+--------------------+
| schema_name | table_name | raw_table_name |
+-------------+-----------------------------+--------------------+
| viz4 | "B/avo.rad/catalog" | B/avo.rad/catalog |
| viz4 | viz4."B/avo.rad/catalog" | B/avo.rad/catalog |
| viz4 | "viz4"."B/avo.rad/catalog" | B/avo.rad/catalog |
| "viz4" | "B/avo.rad/catalog" | B/avo.rad/catalog |
| "viz4" | viz4."B/avo.rad/catalog" | B/avo.rad/catalog |
| "viz4" | "viz4"."B/avo.rad/catalog" | B/avo.rad/catalog |
+-------------+-----------------------------+--------------------+
A simple client may use the existing table_name, as recommended by the
server, and not have to worry about adding quotes.
A more technical client may use raw_schema name and raw_table_name to
get the raw, unqualified and unquoted, names.
This does mean the technical client has to work out whether it needs to
use quotes around the raw names or not. But it turns out the logic to
decide if you need to add quotes to the raw names is a lot simpler than
trying to deduce the original unqualified, unquoted, names from the
possible combinations of schema_name and table_name.
It would also make it easier to search for a table name that starts with
a prefix
SELECT .. WHERE raw_schema_name = 'viz4' AND raw_table_name LIKE
'B/avo%'
-------- --------
Note - the wording of the current TAP specification for
TAP_SCHEMA.schemas, TAP_SCHEMA.tables, and TAP_SCHEMA.columns says 'must
include the following columns'. It does not exclude the tables from
having additional columns. So adding new columns to the new version of
the specification will not break existing implementations.
Dave
ps.
Technically, the schema_name may (or may not) be qualified by the quoted
(or not) catalog name, adding even more combinations and making an
already long email totally unreadable.
Which means we should also add a 'raw_catalog_name' column (which may be
null), and explicitly state that raw_schema_name is both unquoted and
unqualified.
pps.
Does anyone actually use the catalog part of the
'[[catalog.]schema.]table' fully qualified name ?
More information about the dal
mailing list