[TAP] case sensitivity of query

Wed Mar 4 09:58:08 PST 2009

On Tuesday 03 March 2009 08:58:28 Gerard wrote:
> Hi Pat
>
> > ...
> > I see two possible options that would make things work:
> >
> > 1. The service exposes the correct case-sensitive names and
> > TAP clients do not modify these names in any way. This is
> > less extreme than what is stated in
> > 2.3.4 but in the same spirit (which is explained in that
> > subsection). We could modify the text to say that the query
> > is not case sensitive except in this one respect.
>
> What should happen if clients *do* use table/column names with different
> case.
> Should the TAP service report an error? Also for case-insensitive
> databases?

If the db is case-sensitive w.r.t. table and column names and the client 
changes the case of these names (from what is published in the metadata) then 
the the direct result would likely be that the query would fail due to an 
unknown table or column.

Now, if the service tried to fix them so the query would work anyway, that 
would in fact be implementing option #2.

> > 2. The service has to accept a completely non-case query and
> > if the backend db is case-sensitive the service has to map
> > the table and column names from the metadata as necessary to
> > make it work. This would get harder if the service really
> > used case to distinguish things (like columns ra,dec,RA,DEC)
> > but could still be done as long as the column names in the
> > exposed metadata are unique.
> > I don't know if anyone has/would make a database schema that way...

I would like to focus on this topic because it seems that people agree that #2 
is the better way to go but there appears to be a feeling that it is hard. In 
the CVO TAP prototype I used an off-the-shelf SQL parser (jsqlparser from 
sourceforge). Since ADQL is syntactically legal SQL it can be parsed with 
such a parser.

My implementation experience:

This parser lets you write your own code to easily navigate/process the parse 
tree using the Visitor pattern. I wrote a visitor that goes through and makes 
sure that only tables and columns I listed in the metadata are used anywhere 
in the query. I then trivially extended this second visitor to map the names 
of tables and columns from the metadata into those used in the db. While it 
does not also handle case-insensitive to case-sensitive mapping that would be 
a 1-2 line change.

So, because ADQL is legal SQL, any existing SQL parser can be used to get a 
parse tree of the entire query (and that can be complex - SQL is very 
flexible :-). From there, it is pretty straightforward, assuming the 
parser/API is decent, to navigate the parse tree and check/modify things you 
care about.

I really want to emphasize two points:

- you do really need a full parser because SQL lets one obfuscate things and 
embed things in places you may not expect

- once parsed, you only have to actually check/modify things where your 
backend differs from ADQL, so you don't need to really interpret everything

PS-I think I spent about a week on the parser... not insignificant but also 
not insurmountable. My intention is to make this code available, hopefully 
right after I redo the prototype following the current spec changes and 
before the interop; I would expect others could use that with minimal work as 
half of the week was understanding the jsqlparser api and writing the basic 
tree navigation visitor

-- 

Patrick Dowler
Tel/Tél: (250) 363-0044
Canadian Astronomy Data Centre
National Research Council Canada
5071 West Saanich Road
Victoria, BC V9E 2M7

Centre canadien de donnees astronomiques
Conseil national de recherches Canada
5071, chemin West Saanich
Victoria (C.-B.) V9E 2M7