[TAP] case sensitivity of query
Tom McGlynn
Thomas.A.McGlynn at nasa.gov
Tue Mar 3 10:43:30 PST 2009
I'll stay out of the Sync/Async debate, but I'd like to clarify some
points regarding case sensitivity.
There are at least 4 classes of strings where one might worry about case
sensitivity.
A. SQL keywords (SELECT, WHERE, BETWEEN, ...). It is my belief that
these are treated in a case insensitive fashion in all databases.
There may be a related issue when a nominally available word is reserved
in a particular database, e.g., dec is a reserved word in MySQL, but
that probably deserves it's own discussion.
B. Function names (sin(x), someUserDefinedFunction, region, ...). I
believe that ADQL defines a standard subset of these and that they are
case insensitive in all of the databases that I know of. If we ever
allow user defined functions, then perhaps the issues in C will also
arise here.
C. Names defined in the schema, i.e., table and column names. These are
mostly what have been talked about previously. The default configuration
of at least MySQL and Sybase seems to be to have these be case sensitive
although this is not in the SQL standard. There are some tempting
precedents for use of names distinguished only by case, e.g., v and V
are standard names for the apparent and absolute visual magnitudes.
The real issue here is that for a case-sensitive database to handle a
case insensitive query in general, the SQL must be completely parsed and
all references to columns and tables analyzed and compared so that the
case of the input can be adjusted to whatever the native case is. Thus
instead of being able to defer the parsing of the SQL largely to the
database (perhaps with some filtering of the REGION constraint), the
server code must completely analyze it. No thin layer on top of
existing frameworks here. This is independent of whether there are
multiple fields distinguished only by case. If the database happens to
contain such fields, then any query which references them is ambiguous
and probably would have to be rejected by a fully conformant implementation.
Solution 1, basically says that case sensitive databases may reject
queries where the 'wrong' case is used for inputs. The idea is that
users will build the query based upon metadata supplied by the database
which should supply values in the proper case. So long as users don't
change the names they should be OK.
Solution 2 is the fully conformant implementation discussed above.
Speaking only for the HEASARC, we're not likely to implement anything
like that on the soon. However we may move to a Postgres database
sometime next year, so the point may become moot for us.
D. Gerard brings up the case sensitivity of data within the database.
Here I believe the SQL standard is that the '=' operator is case
sensitive. It's also true in practice in the Postgres, MySQL and Sybase
implementations I've tried. I believe that the 'LIKE' operator is
supposed to be case-insensitive.
While I agree with Gerard that it is often the case that the query we
want to provide to the user is case insensitive, I don't think that
making the '=' operator work that way is correct. E.g., joins based on
exact match are much faster than those based upon a case-insensitive
match. Maybe keys are build case-insensitively when '=' is not case
sensitive.
In practice I don't think this will matter too much. Once users can get
to the databases, they'll learn how to filter queries as they wish.
Perhaps there should be some metadata which indicates the
case-sensitivity of the '=' operator.
So to summarize my understanding:
Keywords are case insensitive in the standard and in practice.
Functions are case insensitive in the standard and in practice for
standard functions -- and we don't yet support user defined ones.
Schema defined names are supposed to be case insensitive according to
the standard but are not in practice on many databases. Accommodating
case insensitivity on such databases can be difficult.
String comparisons are supposed to be case sensitive according to the
standard and are on many databases, but Gerard suggests that they are
not on others. In practice I don't think this will be to big of a deal
Tom
Gerard wrote:
> Hi Pat
>
>> ...
>> I see two possible options that would make things work:
>>
>> 1. The service exposes the correct case-sensitive names and
>> TAP clients do not modify these names in any way. This is
>> less extreme than what is stated in
>> 2.3.4 but in the same spirit (which is explained in that
>> subsection). We could modify the text to say that the query
>> is not case sensitive except in this one respect.
>>
> What should happen if clients *do* use table/column names with different
> case.
> Should the TAP service report an error? Also for case-insensitive databases?
>
>> 2. The service has to accept a completely non-case query and
>> if the backend db is case-sensitive the service has to map
>> the table and column names from the metadata as necessary to
>> make it work. This would get harder if the service really
>> used case to distinguish things (like columns ra,dec,RA,DEC)
>> but could still be done as long as the column names in the
>> exposed metadata are unique.
>> I don't know if anyone has/would make a database schema that way...
>>
> That would be very bad design indeed.
>
>> Note: It should be noted that the current metadata (TBD) plan
>> is that the service exposes the fully-qualified table name
>> (e.g. with as many of the catalog and schema specified as are
>> needed for the query to be correct) and the client is
>> expected to use the fully-qualified table names "as-is" and
>> not to parse them to get at these constructs. This greatly
>> simplifies the metadata (esp. the XML schema, I understand)
>> and works fine. Thus, clients are already supposed to use use
>> table names "as-is" and #1 above just extends it to include
>> case and apply to column names.
>>
>> Note: For TAP implementors where the DB is not case
>> sensitive, neither of these effect you, but #1 marginally
>> effects the client.
>>
> That depends on what the reaction should be in case clients do not use the
> same
> case as in the metadata. See above.
>
>
> Btw, I think ni my comments I mentioned two types of case sensitivity.
> One of table/column names etc, another of string literals.
>
> At least my SQLServer databases are case-insensitive regarding string
> literals.
> i.e.
> select * from coords where coordsystem='icrs'
> will return same as
> select * from coords where coordsystem='ICRS'
>
> Though a postgres installation is not.
> (Both are case-insensitive regarding table and column names.)
> There are arguments why case-insensitivity for string literals is indeed a
> good thing.
>
>
>> Please comment asap.
> If the choice is between 1 and 2, I vote for 2.
>
> Gerard
>
>
More information about the dal
mailing list