TAP 1.1: The table_name values must be unique

Mon Nov 19 19:20:18 CET 2018

> On Nov 19, 2018, at 1:12 AM, Markus Demleitner <msdemlei at ari.uni-heidelberg.de> wrote:
> 
> Hi Brian,
> 
> On Fri, Nov 16, 2018 at 06:46:14PM +0000, Van Klaveren, Brian N. wrote:
>> For TAP_SCHEMA.Tables, TAP 1.1 says:
>> 
>> The table_name values must be unique. The value of the table_name
>> should be the string that is recommended for use in querying the
>> table; it may or may not be qualified by schema and catalog name(s)
>> depending on the implementation requirements.
>> [...]
>> Which, from a programmatic standpoint, means that an application
>> must never naively qualify a table name as ( schema_name + "." +
>> table_name ) when constructing a query.
> 
> That is exactly the reason for this language -- a client has a *very*
> hard time to do this construction, what with different hierarchy
> schemes and -- ugh -- delimited identifiers.  See, for instance,
> http://mail.ivoa.net/pipermail/dal/2015-May/007116.html and
> http://mail.ivoa.net/pipermail/dal/2015-May/007107.html for previous
> discussions that made us introduce this kind of thing.

> 
>> I suppose that's not a dealbreaker, but I do think it's awkward,
>> and it deviates from the sql99 approach of using the effective
>> "catalog_name", "schema_name", and "table_name" everywhere:
> 
> Yeah, but the SQL information schema can assume a bit more knowledge
> about the underlying system that we in TAP can.  Also, we couldn't
> identify use cases where Ikea-style table names ("assemble yourself")
> might come in handy.  Do you have one?
> 
>> I think I'd personally prefer there to be IDs and there to be joins
>> on IDs rather than names in a future TAP 2.0 so joins are easier,
>> but maybe at least adding schema_name columns to TAP_SCHEMA.Columns
> 
> Well, (table_name, column_name) already is a primary key in
> tap_schema.columns.  So, what would schema_name in there buy you?
> 
>> approach, and just force a unique constraint on TAP_SCHEMA.Tables
>> for schema_name and table_name, though I would like to see a
>> CATALOG_NAME in there too.
> 
> I suppose the plan with catalog has always been to absorb it into
> schema_name if you need it; no one's registered something like that
> yet, though (RegTAP query:
> 
> select schema_name from rr.res_schema where schema_name like '%.%'
> 
> ).  Due to the "publish ready-made table names only" rule, I'd be
> fairly relaxed about that anyway.  Clients have no business mucking
> with the table names, so if you have table names with five unquoted
> dots, you'd still be fine (though TOPCAT's TAP client would flag
> syntax errors in the editor, but that's an aesthetic defect at worst).
> 
>          Markus

Hi Markus,

After reading those older emails thinking about it a bit more, I think it's mostly okay, but it's still somewhat awkward. I think it implies our table names could be any of the following:

catalog.schema.table
catalog.schema."table"
catalog."schema"."table"
"catalog"."schema"."table"

... but we will probably need to use the latter everywhere.

I do still think that's awkward for some of the following reasons:

• It deviates from the SQL spec, where TABLE_NAME means something else in INFORMATION_SCHEMA/DEFINITION_SCHEMA, for example, DEFINITION_SCHEMA.COLUMNS:

> 2) The values of TABLE_CATALOG, TABLE_SCHEMA, and TABLE_NAME are the catalog name, unqualified schema name, and qualified identifier, respectively, of the table containing the column being described. 
> 3) The value of COLUMN_NAME is the name of the column being described.

... where "qualified identifier" in this scenario means that a standard identifier or an identifier with quoting, if necessary, but not schema-qualification (which _may_ include catalog qualification, according to SQL spec);

• It's somewhat surprising to come across if you're used to INFORMATION_SCHEMA. I realize we can just add our own columns in the spirit of fulfilling the goals of INFORMATION_SCHEMA (I guess "identifier_name" or something);

• It's somewhat inconsistent in terms of naming, because we don't  [[catalog.]schema.]table qualify the column names;

• The simple "qualified identifier" of the table is not stored anywhere;

In terms of use cases, making a tree-like catalog/schema/table browser is much easier when you don't have to worry about parsing the schema-qualified identifier. I'm not sure that's a killer app though. 

Brian