table_name syntax

Dave Morris dave.morris at metagrid.co.uk
Tue Apr 28 04:23:02 CEST 2015


On 2015-04-27 15:00, Mark Taylor wrote:
> Hi DAL,
> 
> I have an arcane query about how to represent and interpret TAP
> service table names that appear in
>    (a) the table_name column of the TAP_SCHEMA.tables table
>    (b) tableset/schema/table/name elements in a TableSet document
> 

Short answer is yes. The server has to add quotes to the catalog, 
schema, table and column names.

So this is correct.

    +-------------+---------------------+
    | schema_name | table_name          |
    +-------------+---------------------+
    | viz4        | "B/avo.rad/catalog" |
    | viz4        | "B/avo.rad/wsrt"    |
    | viz4        | "B/bax/bax"         |
    +-------------+---------------------+

Longer answer ... the wording of the current specification means that 
all of these are valid entries for the same table.

    +-------------+-----------------------------+
    | schema_name | table_name                  |
    +-------------+-----------------------------+
    | viz4        | "B/avo.rad/catalog"         |
    | viz4        | viz4."B/avo.rad/catalog"    |
    | viz4        | "viz4"."B/avo.rad/catalog"  |
    | "viz4"      | "B/avo.rad/catalog"         |
    | "viz4"      | viz4."B/avo.rad/catalog"    |
    | "viz4"      | "viz4"."B/avo.rad/catalog"  |
    +-------------+-----------------------------+

With a regular table name, you have even more variants, both parts may, 
or may not, be quoted and the table_name may, or may not, be qualified.

    +-------------+-----------------+
    | schema_name | table_name      |
    +-------------+-----------------+
    | twomass     | psc             |
    | twomass     | "psc"           |
    | twomass     | twomass.psc     |
    | twomass     | "twomass"."psc" |
    | twomass     | "twomass".psc   |
    | twomass     | twomass."psc"   |
    | "twomass"   | psc             |
    | "twomass"   | "psc"           |
    | "twomass"   | twomass.psc     |
    | "twomass"   | "twomass"."psc" |
    | "twomass"   | "twomass".psc   |
    | "twomass"   | twomass."psc"   |
    +-------------+-----------------+

Which makes it hard to search for table names that start with a specific 
pattern,

     SELECT .. WHERE schema_name = 'twomass' AND table_name LIKE 'ps%'

would only match one of the possible combinations.

-------- --------

Also, the current ADQL specification explicitly includes a syntax for 
including double quotes inside a quoted identifier.

So it would be valid to have a schema name with double quotes in it, 
viz"4", and a table name with double quotes in it, "B"/avo.rad/catalog.

Resulting in either of these combinations

    +-------------+------------------------------------+
    | schema_name | table_name                         |
    +-------------+------------------------------------+
    | "viz""4"""  | """B""/avo.rad/catalog"            |
    | "viz""4"""  | "viz""4"""."""B""/avo.rad/catalog" |
    +-------------+------------------------------------+

which makes it hard to do reliable string matches without drowning in 
quotes.

-------- --------

In the current ADQL specification the definition of an identifier is

     <identifier> ::= <regular_identifier> | <delimited_identifier>

Where a regular_identifier is explicitly restricted to [0-9a-zA-Z_], but 
a delimited_identifier is defined as

     <delimited_identifier> ::=
         <double_quote> <delimited_identifier_body> <double_quote>

     <delimited_identifier_body> ::= <delimited_identifier_part>...

     <delimited_identifier_part> ::=
         <nondoublequote_character> | <double_quote_symbol>

The escaped double quote is defined as

     <double_quote_symbol> ::= <double_quote><double_quote>

However, the definitions for nondoublequote_character and 
nonquote_character are both blank

     <nondoublequote_character> ::=
     <nonquote_character> ::=

So apart from explicitly allowing double quotes, the specification 
doesn't actually define the list of allowed characters for a delimited 
identifier.

** This is a bug and needs fixing in the next version anyway **

-------- --------

I would like to suggest we take this opportunity to define a minimal 
list of allowed characters for a delimited identifier, without single or 
double quotes, percent, spaces, tabs or anything else that might cause 
problems in SELECT statements.

As far as I know, at the moment the Vizier tables are the only service 
using anything outside the range of [a-zA-Z0-9_]. So we could start with 
the minimal set of characters that would enable the Vizier tables to 
keep their current names and we explicitly exclude everything else until 
someone raises a specific case for another character to be included.

This would allow Vizier tables to keep their current names and should 
not affect anyone else. However, it does limit the problem to as small a 
set of additional characters as possible and may help to prevent further 
problems before they arise.

-------- --------

I would also like to request we add two new columns to TAP_SCHEMA, 
'raw_schema_name' and 'raw_table_name', which always contain the raw, 
unqualified, unquoted, schema and table names.

     +-------------+-----------------------------+--------------------+
     | schema_name | table_name                  | raw_schema_name    |
     +-------------+-----------------------------+--------------------+
     | viz4        | "B/avo.rad/catalog"         | viz4               |
     | viz4        | viz4."B/avo.rad/catalog"    | viz4               |
     | viz4        | "viz4"."B/avo.rad/catalog"  | viz4               |
     | "viz4"      | "B/avo.rad/catalog"         | viz4               |
     | "viz4"      | viz4."B/avo.rad/catalog"    | viz4               |
     | "viz4"      | "viz4"."B/avo.rad/catalog"  | viz4               |
     +-------------+-----------------------------+--------------------+

     +-------------+-----------------------------+--------------------+
     | schema_name | table_name                  | raw_table_name     |
     +-------------+-----------------------------+--------------------+
     | viz4        | "B/avo.rad/catalog"         | B/avo.rad/catalog  |
     | viz4        | viz4."B/avo.rad/catalog"    | B/avo.rad/catalog  |
     | viz4        | "viz4"."B/avo.rad/catalog"  | B/avo.rad/catalog  |
     | "viz4"      | "B/avo.rad/catalog"         | B/avo.rad/catalog  |
     | "viz4"      | viz4."B/avo.rad/catalog"    | B/avo.rad/catalog  |
     | "viz4"      | "viz4"."B/avo.rad/catalog"  | B/avo.rad/catalog  |
     +-------------+-----------------------------+--------------------+

A simple client may use the existing table_name, as recommended by the 
server, and not have to worry about adding quotes.

A more technical client may use raw_schema name and raw_table_name to 
get the raw, unqualified and unquoted, names.

This does mean the technical client has to work out whether it needs to 
use quotes around the raw names or not. But it turns out the logic to 
decide if you need to add quotes to the raw names is a lot simpler than 
trying to deduce the original unqualified, unquoted, names from the 
possible combinations of schema_name and table_name.

It would also make it easier to search for a table name that starts with 
a prefix

     SELECT .. WHERE raw_schema_name = 'viz4' AND raw_table_name LIKE 
'B/avo%'

-------- --------

Note - the wording of the current TAP specification for 
TAP_SCHEMA.schemas, TAP_SCHEMA.tables, and TAP_SCHEMA.columns says 'must 
include the following columns'. It does not exclude the tables from 
having additional columns. So adding new columns to the new version of 
the specification will not break existing implementations.


Dave

ps.
Technically, the schema_name may (or may not) be qualified by the quoted 
(or not) catalog name, adding even more combinations and making an 
already long email totally unreadable.
Which means we should also add a 'raw_catalog_name' column (which may be 
null), and explicitly state that raw_schema_name is both unquoted and 
unqualified.

pps.
Does anyone actually use the catalog part of the 
'[[catalog.]schema.]table' fully qualified name ?




More information about the dal mailing list