table_name syntax

Tue Apr 28 09:14:57 CEST 2015

Dear TAP exegets,

On Tue, Apr 28, 2015 at 03:23:02AM +0100, Dave Morris wrote:
> On 2015-04-27 15:00, Mark Taylor wrote:
> >I have an arcane query about how to represent and interpret TAP
> >service table names that appear in

I had hoped it would be arcane, and I had really, really hoped we
could do away with quoted identifiers as table names in later
versions of ADQL, but it seems we can't.  Well, most of
this also applies to column names, where we can't really do without
delimited identifiers even for TAP_SCHEMA itself.

Also note that delimited identifiers aren't just "there's quotes
around them", they have lots of interesting interactions.  Meaning:
They are a pain all around.  For your enjoyment, I'm including some
of the relevant language of SQL 1992 at the end of this message.  I
don't think I even want to see how this looks like in later editions.

> >   (a) the table_name column of the TAP_SCHEMA.tables table
> >   (b) tableset/schema/table/name elements in a TableSet document
>
> Short answer is yes. The server has to add quotes to the catalog, schema,
> table and column names.
> 
> So this is correct.
> 
>    +-------------+---------------------+
>    | schema_name | table_name          |
>    +-------------+---------------------+
>    | viz4        | "B/avo.rad/catalog" |
>    | viz4        | "B/avo.rad/wsrt"    |
>    | viz4        | "B/bax/bax"         |
>    +-------------+---------------------+

...provided the table names don't actually need to be qualified with
the schema in queries.

[.. Lots of permuatations of putting quotes in various places ...]

> Which makes it hard to search for table names that start with a specific
> pattern,
> 
>     SELECT .. WHERE schema_name = 'twomass' AND table_name LIKE 'ps%'
> 
> would only match one of the possible combinations.

True.  And IMHO a strong suggestions we should discourage the use of
delimited identifiers in general and in table names in particular.
And then essentially say: "If you absolutely must have them, this is
what to do".  Which would include: Always use regular identifiers in
lower case (i.e., in particular: no quotes in the literals) unless
you absolutely have no choice, in which chase put quotes around all
literals, including quoting inner quotes.

>     <nondoublequote_character> ::=
>     <nonquote_character> ::=
> 
> So apart from explicitly allowing double quotes, the specification doesn't
> actually define the list of allowed characters for a delimited identifier.

The ADQL authors clearly took this from SQL1992, where the rules are
written as

  <nondoublequote character> ::= !! See the Syntax Rules
  <nonquote character> ::= !! See the Syntax Rules.

> ** This is a bug and needs fixing in the next version anyway **

My suggestion is: copy the prose these rules refer to from SQL1992
and do away with extra provisions for crazy cases.

Which for me translates into: 

      A <nondoublequote_character> any <SQL language character> other
      than a <double quote>;

      A <nonquote_character> is any <SQL language character> other
      than a <quote>;

It seems the ADQL 2.0 BNF is missing the <sql_language_character>
rule, so that would have to be copied, too:

        <SQL language character> ::=
                <simple Latin letter>
              | <digit>
              | <SQL special character>

It would be great if people interested in delimited identifiers
could check if that's enough for what they need.

SQL 1992, on the other hand, in both cases additionally allows:

        b) Any character other than a <quote> in the character reper-
           toire identified by the <module character set specification>;
           or

        c) Any character other than a <quote> in the character reper-
           toire identified by the <character set specification> or
           implied by "N".

I really think we shouldn't go near these if we can avoid it.

> I would like to suggest we take this opportunity to define a minimal list of
> allowed characters for a delimited identifier, without single or double
> quotes, percent, spaces, tabs or anything else that might cause problems in
> SELECT statements.

I'd say let's not deviate further from sql1992 than we absolutely
have to to maintain our sanity [that comes from one who's just
planning to overthrow sql1992 set operators, but there you go].

Anyway: There's no hope to try and be robust against mixing delimited
and regular identifiers, so I'd say we just have to mandate that the
two be strictly told apart.

> I would also like to request we add two new columns to TAP_SCHEMA,
> 'raw_schema_name' and 'raw_table_name', which always contain the raw,
> unqualified, unquoted, schema and table names.

-1 from me.  I think it's fine if we mandate regular identifiers
without quotes and delimited identifiers with quotes.  People doing
discovery probably lavishly add wildcards anyway, and if we, as I
think we should, regard delimited identifiers as desperate emergency
measures, we should not complicate our data structures just to
accomodate them.

I'd like an amendment that "regular identifiers must be written
all-lowercase", though; that'll avoid some nasty surprises both in
query construction (where clients might add gratuitous quotes) and in
metadata discovery.

> Note - the wording of the current TAP specification for TAP_SCHEMA.schemas,
> TAP_SCHEMA.tables, and TAP_SCHEMA.columns says 'must include the following
> columns'. It does not exclude the tables from having additional columns. So
> adding new columns to the new version of the specification will not break
> existing implementations.

Right -- so, for VizieR it may make sense to add raw_* columns.  For
other services, clients couldn't count on them anyway, so I don't
think they'd be helped a lot by this addition.

I could see a regulation "If you use delimited identifiers a lot,
consider adding these columns to TAP_SCHEMA," so particularly
diligent clients would know where to look (TAP_SCHEMA is described in
TAP_SCHEMA...).

> Does anyone actually use the catalog part of the '[[catalog.]schema.]table'
> fully qualified name ?

Rephrasing this (and I'm really interested): Is there an ADQL/TAP
engine on top of SQLServer publicly available?  Does one exist at
all?

Cheers,

        Markus

As promised, here's a particularly lyrical excerpt from SQL1992
(sect. 5.2) regarding delimited identifiers (don't try to understand
it, just appreciate the flow of the language):

         13)A <regular identifier> and a <delimited identifier> are equiva-
            lent if the <identifier body> of the <regular identifier> (with
            every letter that is a lower-case letter replaced by the equiva-
            lent upper-case letter or letters) and the <delimited identifier
            body> of the <delimited identifier> (with all occurrences of
            <quote> replaced by <quote symbol> and all occurrences of <dou-
            blequote symbol> replaced by <double quote>), considered as
            the repetition of a <character string literal> that specifies a
            <character set specification> of SQL_TEXT and an implementation-
            defined collation that is sensitive to case, compare equally
            according to the comparison rules in Subclause 8.2, "<comparison
            predicate>".

         14)Two <delimited identifier>s are equivalent if their <delimited
            identifier body>s (with all occurrences of <quote> replaced
            by <quote symbol> and all occurrences of <doublequote symbol>
            replaced by <doublequote>), considered as the repetition of a
            <character string literal> that specifies a <character set spec-
            ification> of SQL_TEXT and an implementation-defined collation
            that is sensitive to case, compare equally according to the
            comparison rules in Subclause 8.2, "<comparison predicate>".