table_name syntax

Mark Taylor m.b.taylor at bristol.ac.uk
Thu Apr 30 16:28:33 CEST 2015


Markus,

On Thu, 30 Apr 2015, Markus Demleitner wrote:

> Excuse me for joining the fray again.  But let me see if we can fix
> what's already consensus.  I believe we all agree that services have to
> declare when they are using delimited identifiers:
> 
> The consequence IMHO is that we need to amend REC-TAP 1.0 and
> REC-VODataService 1.1.  In REC-TAP 1.0 (PDF page numbers), I believe
> the last paragraph of p. 21 should read:
> 
>   All names in TAP_SCHEMA MUST follow the syntax rules of the query
>   languages supported by the the langues used to query TAP_SCHEMA in
>   that they can be used as-is in queries without further
>   qualifications.  In particular, were delimited identifiers in ADQL
>   or similar languages must be used, the names in TAP_SCHEMA MUST
>   come with double quote characters.  Services MUST not provide
>   delimited identifiers where regular identifiers are used in the
>   database's DDL.
> 
>   The descriptions below reference the respective symbols for the
>   ADQL 2.0 language; for other query languanges, analogous rules will
>   have to be used.
> 
> Then, in 2.6.1 (with some redaction), additional text:
> 
>   When querying through ADQL, schema_name MUST contain an expansion of
>   the ADQL <schema_name> nonterminal.
> 
> In 2.6.2, additional text:
> 
>   schema_name MUST reference a row in TAP_SCHEMA.schemas.
>   When querying through ADQL, table_name MUST contain an expansion of
>   the ADQL <table_name> nonterminal suitable for referencing the table
>   in a top-level select clause.
> 
> In 2.6.3, additional text:
> 
>   table_name MUST reference a row in TAP_SCHEMA.tables.
>   When querying through ADQL, column_name MUST contain an expansion
>   of the ADQL <column_name> nonterminal suitable for referencing the
>   column in a plain SELECT <column_name> FROM <table_name> query
>   (i.e., including double quotes in the case of delimited
>   identifiers).
> 
> 
> We'll have to find similar language for VODataService (to have
> consistency with the VOSI tables endpoint).  The good news is that
> this won't require a schema change unless trolls are around, as
> column/name and friends are already of type ts:token.  [The thing
> with the trolls: If you were perverse enough to have identifiers like
> '"    "' -- multiple spaces -- we'd be in trouble with XSD.  Can we
> agree to look the other way?]
> 
> 
> Agreed so far?  Where do I carry this stuff now that, I understand,
> the TAP Implementation Notes shouldn't be used any more?

That all looks good to me (bar a couple of minor spelling mistakes).

To be clear: as I understand it this is a clarification of the
intention of TAP 1.0, and not describing any change in the way
services may/should/must behave between this and the next
version of TAP.

> Then, there was Mark responding to Pat (Tue, 28 Apr 2015 17:36:06
> +0100):
> 
> >> PPS- tap-lint does a test where it quotes a column name it finds in
> >> a table in the tap_schema. I noted 2-4 query failures per day in
> >> our logs caused by this test. It broke because in postgresql it
> >> subverts the way case-insensitive table and column names are
> >> implemented (PG just converts everything to lower unless you quote
> >> them!). My "fix" was simply to strip the quotes in that case
> >> because I Know they can only break it. While the test is probably
> >> technically correct, it is really just asking for trouble :-)
> >
> > hmm - my emerging understanding from this discussion is that
> > that taplint test is probably technically *incorrrect*,
> > and needs changing.  Can we have a ruling from Markus
> > (as our resident delimited-identifier-gotcha expert) on that?
> 
> I've not actually looked at what the test really does, but the way
> Pat describes it, TOPCAT probably shouldn't be doing it.  If you go
> back to the artwork of delimited identifier prose I sent around
> yesterday, you'll see that no normalization takes place in delimited
> identifiers.  So, "FooBar" will indeed be distinct from "Foobar".
> 
> Regular identifiers, however, are normalised.  Postgres in spirit
> does the right thing by just having them all lower-cased, so, even if
> one puts FooBar (undelimited) into TAP_SCHEMA, you'd have to use
> "foobar" to match it with a delimited identifier.
> 
> That, however, will fail with standard SQL1992, which instead
> stipulates that "every letter that is a lower-case letter [has to be]
> replaced by the equivalent upper-case letter or letters" on comparing
> regular and delimited identifiers.  So, while for aestethic reasons I
> fully support Postgres' decision to lower-case, if we followed
> SQL1992 here, ADQL translators to Postgres would have a whole lot of
> fun.  So, I propose leaving the entire question of casiness in
> comparisons between regular and delimited identifier open (which
> right now it kinda is; you could say by default to sql1992 it's not,
> in which case we're all doing it wrong).
> 
> Hence I think STILTS would be prudent not to fuzz with quotes.
> Services that mess up delimited identifiers in their TAP_SCHEMAS will
> then get validation errors, which serves them right for using them in
> the first place.  With the above... shall we say, clarifications,
> that's actually correct (if necessary, I'll ask my lawyer friends to
> come up with a reason why they're just clarifications).

Thanks.  Without necessarily claiming I've followed the details here,
it looks like it would indeed be best for taplint to take table and
column names just as they are supplied by the TAP_SCHEMA or VOSI
metadata, and never add or strip quotes.  I will make that change
for future releases.

Mark

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/


More information about the dal mailing list