Qualified/unqualified quoted/unquoted

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Mon Oct 20 09:46:28 CEST 2014


Dear DAL,

On Thu, Oct 16, 2014 at 03:58:31PM +0100, Dave Morris wrote:
> Publishing the metadata for the TAPVizieR service highlights some
> gaps in the current VODataService and TAP_SCHEMA specifications that
> will need to be clarified before this service can interoperate with
> other TAP services in the VO.
[...]
> Section 3.3 of the VODataService-1.1 specification defines the <name>
> element as containing :
> 
>     "A fully qualified name for the table."
> 
>     "This name should include all catalog or schema
>     prefixes needed to sufficiently uniquely
>     distinguish it in a query to the table."
> 
> However the VODataService-1.1 specification does not describe how to
> handle a table name that includes non-delimiter dots in it.
[...]
> The current VODataService-1.1 specification needs to be updated to
> describe how the /tables output should use quotes to wrap names that
> contain non-delimiter dots or other characters outside the basic set
> of alphanumeric characters.

I have to admit that I've always hoped we can avoid delimited
identifiers at least in table and schema names -- delimited
identifiers in SQL are nasty, as they have some not terribly
intuitive interactions with the case folding rules in place almost
everywhere else in the language -- for instance, if you say

  create table "Bar"...

then none of

  select * from bar, select * from Bar, or select * from "bar"

will work -- there is actually no way to query that table without
using a delimited identifier.  On the other hand, somewhat
confusingly, if you say 

  create table "bar"...
 
then any of

  select * from bar, select * from Bar, and select * from "bar"

will work.
  .

Now, case folding in general is evil, but with SQL we don't have much
of a choice, so there's case folding.  Having one islands where
there's (in general) none -- string literals -- is not pretty, having
two such islands is worse.

VizieR, understandably, opted to keep the table names VizieR users
are accustomed to.  These are not SQL regular identifiers, hance we
have to support delimited ones, and hence I agree with Dave that some
language needs to be in the two relevant standards to after all deal
with the beasts.

[Incidentally, this doesn't change my conviction that we should
outlaw delimited identifiers for names in TAP_UPLOAD]

> Section 2.6 of the TAP-1.0 specification defines the table_name
> column as
> 
>     "table name as it should be used in queries"

I believe this is a very good regulation indeed, and I feel we should
not change it.

>     "The value of the table_name should be
>     the string that is recommended for use
>     in querying the table; it may or may not
>     be qualified by schema and catalog name(s)
>     depending on the implementation requirements."

... which I'd consider fairly clear given the "as it should be used
in queries.  If clients have any business pulling table names from
TAP_SCHEMA, they should be able to use the strings they get without
further digestion.  If at all possible.

> In this example the table name in the table_name column should
> probably be wrapped in double quotes to indicate that the dot is part
> of the table name and not a delimiter between schema and table.
> 
>     <TD>"J/other/PZ/29.1/table"</TD>

I agree, and I'd say we should still discourage the use of delimited
identifiers to protect our users' sanity.  But if you have to use
them, add quotes as you would in SQL queries.  To avoid surprises if
clients actually believe they have to parse this mess, we should
specifically note SQL's escaping of double quotes ("foo""Bar" will
match the table foo"Bar).

> We could simplify the parsing rules by defining both the schema name
> and table name as always unqualified, removing the need for using
> quotes within the metadata.
> 
>     <TR>
>         <TD>vbig</TD>
>         <TD>J/other/PZ/29.1/table</TD>
>     </TR>
> 
> and
> 
>     <TR>
>         <TD>twomass</TD>
>         <TD>data</TD>
>     </TR>
> 
> Note - in order to use the fully unqualified schema name we would
> have to add a separate column/element to the metadata to contain the
> catalog name.

-- which for me is reason enough to dislike such a solution.  Plus,
you'll have to specify how to say table in the global schema (ok,
that'd be a NULL, but then in VOTable "" and NULL is (almost) the
same, so you'll need special handling to avoid names like .table,
etc).

No, I believe the sane thing is to specify in both TAP_SCHEMA and
VODataService that SQL-qualified names are allowed and are to be
interpreted as such, while at the same time strongly discouraging
data providers to use them.

Howl: I want VODataService-Next in the Wiki (which is to say: I'm
still convinced we should have officially sanctioned -Next pages in
the new DocStds).  I expect it'll be a while until we actually change
VODataService, so having a place to keep this in would be really
useful.

> Note - the schema name also needs to be quoted because schema names
> may be qualified with a catalog name and both the schema and catalog
> names may themselves contain non-delimiter dots or other non
> alphanumeric characters.

I'd like to agree with Marco here: Given SQL's funny rules with
matching delimited identifiers I'm against gratuitously quoting names
that aren't actually created as delimited identifiers in the
underlying database.

Cheers,

       Markus



More information about the dal mailing list