TAP, ADQL and field names inconsistencies between providers.

Wed Dec 9 09:18:44 CET 2020

Hi DAL,

On Tue, Dec 08, 2020 at 08:19:01PM +0000, Monkewitz, Serge wrote:
> On Dec 8, 2020, at 1:43 AM, Markus Demleitner <msdemlei at ari.uni-heidelberg.de<mailto:msdemlei at ari.uni-heidelberg.de>> wrote:
>> With that, implementations that accepts "colname" (or, in strict SQL,
>> "COLNAME") but not colname simply are in violation of SQL and need
>> to be fixed.  I have a hunch there's mysql behind this -- it's the
>> responsibility of implementors making such a choice to fix mysql's
>> quirks in the translation layer.
> 
> Whether or not colname/"COLNAME"/"colname" can be used to refer to
> the same column also depends on how the table was created. For
> example, if someone were to:
> 
> CREATE TABLE foo (
>     "colName" integer
> );
> 
> then none of the previously mentioned options will work (or at
> least, not in Postgres 9.5), and the user must use "colName" in

That's true -- but in that case the service wouldn't accept "colname"
either, which is why my logic above has been from delimited to
regular identifier and not the other way round.

> queries. If someone instead writes:
> 
> CREATE TABLE foo ("COLNAME" integer)
> 
> then only "COLNAME" will work. So, a user can be forced to quote

This is one of the cases where Postgres and SQL92 disagree.  In
SQL92, SELECT colname would work in this case.  In Postgres, you'd
have to write ("colname" integer) in the DDL for that to happen.
I've just consulted the (version 11) docs again, and it seems there's
no switch in postgres to make it use SQL-compatible folding-to-upper.
Which is an interesting fact in itself -- apparently, nobody really
cared so far.

> Given all this, I'm not sure how a TAP implementation could
> compensate for the non-standard fold-to-lowercase behavior in
> Postgres, unless there is a way to determine that a table column
> wasn't quoted at table creation time. I imagine there is some way
> to figure that out, but I don't know what it is.

Frankly, even going near this thing is full of pitfalls because SQL
name resolution (i.e., figuring out what a certain column or table
reference actually references in a complex query) is involved (think

  SELECT foo FROM (
      SELECT "FOO", quux FROM x) as a
    JOIN (
      SELECT "foo", quux FROM x) as b
    USING (quux)

).  So, if we were to second-guess our upstream SQL92, this would
involve a lot of headache.  On the other hand, implementations that
handle this kind of thing differently are a fact of life [just
experimented a bit with sqlite, and I'm still shuddering].

Our canonical answer has been that there's no sane way a translator
can safely fix all these idiosyncrasies.   Hence, when doing column
references, use the literals you find in TAP_SCHEMA and/or VOSI
tables exactly like given there.  It's only in TAP 1.1 that it was
made sufficiently clear that these have to include any delimiting
necessary, and several servers don't do that yet.  But that's where
this, IMHO, needs to be fixed.

Having said that, discouraging the use of mixed-case names and
strongly disouraging the use of delimited identifiers (i.e., "don't
do it unless strong external requirements force you to") is a good
idea for avoiding surprises on the side of our users.  With that,
almost none of all this matters.  

Oh, and user-facing documentation should of course stress that
delimited identifiers are the exception rather than the rule ("lift
the MySQL curse").

     -- Markus