TAP, ADQL and field names inconsistencies between providers.

Monkewitz, Serge smm at ipac.caltech.edu
Tue Dec 8 21:19:01 CET 2020


Serge Monkewitz | Applications Developer | Caltech/IPAC
NASA/IPAC Infrared Science Archive

On Dec 8, 2020, at 1:43 AM, Markus Demleitner <msdemlei at ari.uni-heidelberg.de<mailto:msdemlei at ari.uni-heidelberg.de>> wrote:

Hi Grégory,

On Mon, Dec 07, 2020 at 05:28:14PM +0100, Grégory Mantelet wrote:
As editor of the ADQL standard (since few time), I do not think we should
force one behavior more than another. After all, this flexibility also
exists in SQL which is the backend language of all the currently existing
TAP implementations I know. But we could maybe add a recommendation about
escaping and qualifying identifiers. Let's say something like:

    "ADQL implementations SHOULD not force identifiers to be escaped and/or
qualified unless necessary.".

@other ADQL authors: suggestions/comments?

First off, while the SQL rules for case folding are insane, we
shouldn't overwrite them in ADQL, except perhaps for the one thing:
SQL92 (haven't checked later standards) wants to normalise to
uppercase (meaning colname and "COLNAME" is the same), whereas
postgres normalises to lowercase (i.e., colname and "colname" is the
same).  As always when case folding comes into play, things become
messy.

With that, implementations that accepts "colname" (or, in strict SQL,
"COLNAME") but not colname simply are in violation of SQL and need
to be fixed.  I have a hunch there's mysql behind this -- it's the
responsibility of implementors making such a choice to fix mysql's
quirks in the translation layer.

So, I think we shouldn't have the prose a proposed; SQL is clear that
they MUST NOT.

What we might sneak into ADQL is something like: "a word of advice:
avoid mixed-case names in ADQL databases; they become trouble as soon
as people start gratuitously adding quotes to identifiers, which at
least a certain demography is likely to do".

      -- Markus

Whether or not colname/"COLNAME"/"colname" can be used to refer to the same column also depends on how the table was created. For example, if someone were to:

CREATE TABLE foo (
    "colName" integer
);

then none of the previously mentioned options will work (or at least, not in Postgres 9.5), and the user must use "colName" in queries. If someone instead writes:

CREATE TABLE foo ("COLNAME" integer)

then only "COLNAME" will work. So, a user can be forced to quote identifiers just based on how a particular table was created, rather than by any oddity of the TAP service implementation.

I agree that some advice pointing out pitfalls like this would be helpful.

Given all this, I'm not sure how a TAP implementation could compensate for the non-standard fold-to-lowercase behavior in Postgres, unless there is a way to determine that a table column wasn't quoted at table creation time. I imagine there is some way to figure that out, but I don't know what it is.

Serge

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20201208/9a3da1d7/attachment.html>


More information about the dal mailing list