[TAP] case sensitivity of query
Patrick Dowler
patrick.dowler at nrc-cnrc.gc.ca
Wed Mar 4 09:58:08 PST 2009
On Tuesday 03 March 2009 08:58:28 Gerard wrote:
> Hi Pat
>
> > ...
> > I see two possible options that would make things work:
> >
> > 1. The service exposes the correct case-sensitive names and
> > TAP clients do not modify these names in any way. This is
> > less extreme than what is stated in
> > 2.3.4 but in the same spirit (which is explained in that
> > subsection). We could modify the text to say that the query
> > is not case sensitive except in this one respect.
>
> What should happen if clients *do* use table/column names with different
> case.
> Should the TAP service report an error? Also for case-insensitive
> databases?
If the db is case-sensitive w.r.t. table and column names and the client
changes the case of these names (from what is published in the metadata) then
the the direct result would likely be that the query would fail due to an
unknown table or column.
Now, if the service tried to fix them so the query would work anyway, that
would in fact be implementing option #2.
> > 2. The service has to accept a completely non-case query and
> > if the backend db is case-sensitive the service has to map
> > the table and column names from the metadata as necessary to
> > make it work. This would get harder if the service really
> > used case to distinguish things (like columns ra,dec,RA,DEC)
> > but could still be done as long as the column names in the
> > exposed metadata are unique.
> > I don't know if anyone has/would make a database schema that way...
I would like to focus on this topic because it seems that people agree that #2
is the better way to go but there appears to be a feeling that it is hard. In
the CVO TAP prototype I used an off-the-shelf SQL parser (jsqlparser from
sourceforge). Since ADQL is syntactically legal SQL it can be parsed with
such a parser.
My implementation experience:
This parser lets you write your own code to easily navigate/process the parse
tree using the Visitor pattern. I wrote a visitor that goes through and makes
sure that only tables and columns I listed in the metadata are used anywhere
in the query. I then trivially extended this second visitor to map the names
of tables and columns from the metadata into those used in the db. While it
does not also handle case-insensitive to case-sensitive mapping that would be
a 1-2 line change.
So, because ADQL is legal SQL, any existing SQL parser can be used to get a
parse tree of the entire query (and that can be complex - SQL is very
flexible :-). From there, it is pretty straightforward, assuming the
parser/API is decent, to navigate the parse tree and check/modify things you
care about.
I really want to emphasize two points:
- you do really need a full parser because SQL lets one obfuscate things and
embed things in places you may not expect
- once parsed, you only have to actually check/modify things where your
backend differs from ADQL, so you don't need to really interpret everything
PS-I think I spent about a week on the parser... not insignificant but also
not insurmountable. My intention is to make this code available, hopefully
right after I redo the prototype following the current spec changes and
before the interop; I would expect others could use that with minimal work as
half of the week was understanding the jsqlparser api and writing the basic
tree navigation visitor
--
Patrick Dowler
Tel/Tél: (250) 363-0044
Canadian Astronomy Data Centre
National Research Council Canada
5071 West Saanich Road
Victoria, BC V9E 2M7
Centre canadien de donnees astronomiques
Conseil national de recherches Canada
5071, chemin West Saanich
Victoria (C.-B.) V9E 2M7
More information about the dal
mailing list