[TAP] case sensitivity of query

Wed Mar 4 01:35:37 PST 2009

Dear Tom 

> The real issue here is that for a case-sensitive database to 
> handle a case insensitive query in general, the SQL must be 
> completely parsed and all references to columns and tables 
> analyzed and compared so that the case of the input can be 
> adjusted to whatever the native case is.  Thus instead of 
> being able to defer the parsing of the SQL largely to the 
> database (perhaps with some filtering of the REGION 
> constraint), the server code must completely analyze it.  No 
> thin layer on top of existing frameworks here.  This is 
> independent of whether there are multiple fields 
> distinguished only by case.  If the database happens to 
> contain such fields, then any query which references them is 
> ambiguous and probably would have to be rejected by a fully 
> conformant implementation.
> 
> Solution 1, basically says that case sensitive databases may 
> reject queries where the 'wrong' case is used for inputs.  
> The idea is that users will build the query based upon 
> metadata supplied by the database which should supply values 
> in the proper case.  So long as users don't change the names 
> they should be OK.
> 
> Solution 2 is the fully conformant implementation discussed above. 
> Speaking only for the HEASARC, we're not likely to implement 
> anything like that on the soon.  However we may move to a 
> Postgres database sometime next year, so the point may become 
> moot for us.
> 
I agree with you here that 2 is cumbersome.
There is another solution which I'll number 0 (because this is what I
suggested in my comments):
Allow both case-sensitive and case-insensitive TAP services, but add
a metadata keyword (CASE_SENSITIVE=TRUE/FALSE) to the TAP capabilities that
indicates which choice is made.
Case sensitive services are allowed to throw an error if table/column names
do not use the case that is provided by the metadata queries.

The checking of this can in general be left to the database engine itself.

> D. Gerard brings up the case sensitivity of data within the database. 
> Here I believe the SQL standard is that the '=' operator is 
> case sensitive.  It's also true in practice in the Postgres, 
> MySQL and Sybase implementations I've tried.  I believe that 
> the 'LIKE' operator is supposed to be case-insensitive.
> 
> While I agree with Gerard that it is often the case that the 
> query we want to provide to the user is case insensitive, I 
> don't think that making the '=' operator work that way is 
> correct.  E.g., joins based on exact match are much faster 
> than those based upon a case-insensitive match.  Maybe keys 
> are build case-insensitively when '=' is not case sensitive.
> 
In SQLServer one can define per string column the "collation" using the
COLLATE keyword. The choice influences alphabet, case sensitivity, ordering
etc.
Choosing a case-insensitive collation does not influence comparisons.

> In practice I don't think this will matter too much.  Once 
> users can get
>   to the databases, they'll learn how to filter queries as they wish. 
I don't think that is quite so easy. 
Especially for "wordy" data models, where one wants to explore contents,
case insensitive queries are useful. 
I am thinking of metadata models like SimDB.

> Perhaps there should be some metadata which indicates the 
> case-sensitivity of the '=' operator.
> 
This is comparable but not quite equivalent to what I proposed.
I suggested that per CHAR/VARCHAR(/text?)column we could define whether it
is case sensitive or not.
It would add a column to the TAP_SCHEMA.columns table.

Best regards

Gerard