ADQL grammar validation
Walter Landry
wlandry at caltech.edu
Wed Apr 19 17:56:49 CEST 2017
Markus Demleitner <msdemlei at ari.uni-heidelberg.de> wrote:
> Also, it seems you're quite far with having ADQL in PEG with:
>
>> [1] https://github.com/Caltech-IPAC/libadql
>>
>> The grammar is mostly defined in the ADQL_parser directory
>>
>> https://github.com/Caltech-IPAC/libadql/tree/master/src/Query/ADQL_parser/ADQL_parser
>
> Could you comment on the completeness of your implementation?
It is pretty complete. I have 200+ tests in
https://github.com/Caltech-IPAC/libadql/blob/master/test/parse_adql.cxx
It is a bit wonky because of the restrictions I put on geometry.
Simple JOIN's work, but there is a bug in multiple JOIN's that I am
working on right now. I am not sure that it handles fully recursive
subqueries inside subqueries.
It would not be crazy to start from it.
> Also, since you're much more familar with the organisation of the
> code, do you think you could extract the grammar from the C++ source
> files into a single text file?
I tried this below with the identifier code using the syntax from
https://en.wikipedia.org/wiki/Parsing_expression_grammar
'char' is any character. These rules are pretty simple because they
do not implictly skip spaces. About half of my rules implicitly skip
spaces, so they would require a little more care.
Should I continue? Is this a good syntax?
Cheers,
Walter Landry
keyword = (SQL_reserved_word / ADQL_reserved_word) &!identifier_character
simple_Latin_letter = [a-zA-Z]
identifier_character = digit / simple_Latin_letter / '_'
/// nonidentifier_character is to signal that, for example, in an
/// AND, clause, AND is followed by something that is not an
/// identifier (e.g. a space or parentheses).
nonidentifier_character = char - identifier_character
all_identifiers = simple_Latin_letter identifier_character*
regular_identifier = all_identifiers - keyword
nondoublequote_character = char - '"'
delimited_identifier_part = nondoublequote_character / '""'
delimited_identifier_body = delimited_identifier_part+
delimited_identifier = '"' delimited_identifier_body '"'
identifier = regular_identifier / delimited_identifier
More information about the dal
mailing list