table_name syntax

Tue Apr 28 17:26:19 CEST 2015

Hi Markus,

Comments inline ..

On 2015-04-28 08:14, Markus Demleitner wrote:
> 
> [.. Lots of permuatations of putting quotes in various places ...]
> 
>> Which makes it hard to search for table names that start with a 
>> specific
>> pattern,
>> 
>>     SELECT .. WHERE schema_name = 'twomass' AND table_name LIKE 'ps%'
>> 
>> would only match one of the possible combinations.
> 
> True.  And IMHO a strong suggestions we should discourage the use of
> delimited identifiers in general and in table names in particular.
> And then essentially say: "If you absolutely must have them, this is
> what to do".  Which would include: Always use regular identifiers in
> lower case (i.e., in particular: no quotes in the literals) unless
> you absolutely have no choice, in which chase put quotes around all
> literals, including quoting inner quotes.
> 

"The following section describes aspects of the language that the 
authors of this document would prefer you not to use" - !?

One of our biggest data providers has a huge number of tables and 
columns, almost all of which will need quoted identifiers.

So however much we would prefer not to, we need to find a way to solve 
this issue in a reliable and robust way.

The problem with searching for table name starting with 'foo%' is not 
just due to quoted/unquoted, it is also due to qualified/unqualified 
names.

Likewise for a client that needs to build the schema->table->column 
tree, the combination of quoted/unquoted and qualified/unqualified in 
the table_name makes it hard to process.

Providing the raw values in separate columns makes searching for 
matching names simple.

     SELECT .. WHERE raw_schema_name = 'twomass' AND raw_table_name LIKE 
'ps%'

     SELECT .. WHERE raw_schema_name = 'viz4' AND raw_table_name LIKE 
'B/avo.rad%'

No need to discourage anything.

The existing table_name column provides the 'ready for use' 
quoted/unquoted, qualified/unqualified names.

The raw columns provide the values needed for text searches and for 
building the tree of schema->table->column.

The raw columns would also enable an external verification service to 
check that the 'ready for use' values have been correctly formatted.

> 
> -1 from me.  I think it's fine if we mandate regular identifiers
> without quotes and delimited identifiers with quotes.  People doing
> discovery probably lavishly add wildcards anyway, and if we, as I
> think we should, regard delimited identifiers as desperate emergency
> measures, we should not complicate our data structures just to
> accomodate them.
> 

As a client trying to analyse the data and build the tree.

Please, just provide the raw, unqualified, unquoted names in separate 
columns.

> 
>> Note - the wording of the current TAP specification for 
>> TAP_SCHEMA.schemas,
>> TAP_SCHEMA.tables, and TAP_SCHEMA.columns says 'must include the 
>> following
>> columns'. It does not exclude the tables from having additional 
>> columns. So
>> adding new columns to the new version of the specification will not 
>> break
>> existing implementations.
> 
> Right -- so, for VizieR it may make sense to add raw_* columns.  For
> other services, clients couldn't count on them anyway, so I don't
> think they'd be helped a lot by this addition.
> 

If the raw name columns are added to the TAP-1.1 specification, and a 
service advertises a TAP-1.1 capability, the client should be able to 
rely on the raw names being there.

The 'ready to use' qualified/unqualified, quoted/unquoted values in 
table_name are ideal for a client that just wants to get a simple string 
that they can use in a query without further processing. Which is what I 
think Mark is looking for.

In answer to Mark's questions - yes, the 'ready to use' values should be 
quoted both to enclose special characters and to avoid clashes with ADQL 
reserved words.

However, the resulting permutations and combinations makes it hard for a 
client that needs the raw names to recover them from the complex string.

Our service needs to build the tree of schema->table->column, which 
means we need the raw names.

Reverse engineering the raw names from the mixed qualified/unqualified, 
quoted/unquoted string is messy.

> I could see a regulation "If you use delimited identifiers a lot,
> consider adding these columns to TAP_SCHEMA," so particularly
> diligent clients would know where to look (TAP_SCHEMA is described in
> TAP_SCHEMA...).
> 

https://www.youtube.com/watch?v=pXw7LYWNi5E
Please, just provide the raw, unqualified, unquoted names in separate 
columns.

However rare they are, however much we try to discourage their use, if 
the specification allows the different combinations of quoted/unquoted 
qualified/unqualified, we still end up having to write code that 
untangles them to get at the original raw values.

The world would be a much simpler place if TAP_SCHEMA just included the 
raw values as well.

Thanks,
Dave

--------
Dave Morris
Software Developer
Wide Field Astronomy Unit
Institute for Astronomy
University of Edinburgh
--------