ADQL grammar validation

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Wed Apr 19 09:13:05 CEST 2017


Hi all,

First: Thanks, Grégory, for taking this this far already.

Though I mourn a bit that we're leaving the cozy haven of SQL-92
grammar behind -- it was always easy to say "Well, it's what SQL-92
says" --, your arguments are pretty stringent.  And since I've never
been too fond of ABNF's concrete form myself, PEG (rather than
something that has a proper RFC) seems a reasonable tech to use.  

Let's go for it.

On Tue, Apr 18, 2017 at 03:14:55PM -0700, Walter Landry wrote:
> Grégory Mantelet <gmantele at ari.uni-heidelberg.de> wrote:
> > * Ambiguity with the `AS` keyword ; because it is optional, depending of
> >   the parser the following element may always be interpreted as an
> >   identifier for the alias, even though it is `FROM` or `(*)` (after a
> >   `COUNT` for example)
> 
> This is an ickyness that is carried over from SQL 92.  I am hesitant
> to get rid of it just because it is ugly.  I feel that will trip up
> users.  I think that requiring identifiers to not be a reserved word
> cleans up this ambiguity.

That is the implicit requirement, and that's also the reason for
enumerating the reserved words.  I've (accidentally) required AS
until quite recently, and it was a long while until someone
complained, but yeah, I don't think we can deviate from SQL92 here,
not in a point update anyway.

> > Nevertheless, a small drawback to know about PEG is that left
> > recursion is not allowed. Small re-writing of the ADQL grammar
> > (e.g. numerical and boolean operations) would be needed. But as far
> > as I know, few parsers are allowing left recursion so most of the
> > existing ADQL parsers may have already rewritten such special
> > syntaxes, but please, tell me if I am wrong here.  (for those not
> > familiar with language parsing, just compare side by side
> > "adql_min.bnf" and "adql_min.peg", l.62)
> 
> I had to do the same thing.
> 
> All in all, this is a big project.  I worry that if we start anew, we
> will end up with a grammar that is gratuitously different from SQL 92.

Perhaps, but I'd say internal consistency between the various TAP
services in the VO is more important than a, given Grégory's
findings, vague alignment with a, from that perspective, flawed
upstream standard.

We also have collected a fairly large body of ADQL we expect to parse
(and a much smaller one we expect not to parse), so gross mistakes we
should notice quickly.

Also, it seems you're quite far with having ADQL in PEG with:

> [1] https://github.com/Caltech-IPAC/libadql
> 
>     The grammar is mostly defined in the ADQL_parser directory
> 
>     https://github.com/Caltech-IPAC/libadql/tree/master/src/Query/ADQL_parser/ADQL_parser

Could you comment on the completeness of your implementation?  Also,
since you're much more familar with the organisation of the code, do
you think you could extract the grammar from the C++ source files
into a single text file?

Once that's done, I volunteer for putting in whatever is missing to
get to 2.0.  For 2.1 features, let's see...

        -- Markus


More information about the dal mailing list