ADQL grammar validation

Walter Landry wlandry at caltech.edu
Wed Apr 19 17:56:49 CEST 2017


Markus Demleitner <msdemlei at ari.uni-heidelberg.de> wrote:
> Also, it seems you're quite far with having ADQL in PEG with:
> 
>> [1] https://github.com/Caltech-IPAC/libadql
>> 
>>     The grammar is mostly defined in the ADQL_parser directory
>> 
>>     https://github.com/Caltech-IPAC/libadql/tree/master/src/Query/ADQL_parser/ADQL_parser
> 
> Could you comment on the completeness of your implementation?

It is pretty complete.  I have 200+ tests in

  https://github.com/Caltech-IPAC/libadql/blob/master/test/parse_adql.cxx

It is a bit wonky because of the restrictions I put on geometry.
Simple JOIN's work, but there is a bug in multiple JOIN's that I am
working on right now.  I am not sure that it handles fully recursive
subqueries inside subqueries.

It would not be crazy to start from it.

> Also, since you're much more familar with the organisation of the
> code, do you think you could extract the grammar from the C++ source
> files into a single text file?

I tried this below with the identifier code using the syntax from

  https://en.wikipedia.org/wiki/Parsing_expression_grammar

'char' is any character.  These rules are pretty simple because they
do not implictly skip spaces.  About half of my rules implicitly skip
spaces, so they would require a little more care.

Should I continue?  Is this a good syntax?

Cheers,
Walter Landry

  keyword = (SQL_reserved_word / ADQL_reserved_word) &!identifier_character

  simple_Latin_letter = [a-zA-Z]
  identifier_character = digit / simple_Latin_letter / '_'
  /// nonidentifier_character is to signal that, for example, in an
  /// AND, clause, AND is followed by something that is not an
  /// identifier (e.g. a space or parentheses).
  nonidentifier_character = char - identifier_character
  all_identifiers = simple_Latin_letter identifier_character*
  regular_identifier = all_identifiers - keyword

  nondoublequote_character = char - '"'
  delimited_identifier_part = nondoublequote_character / '""'
  delimited_identifier_body = delimited_identifier_part+
  delimited_identifier = '"' delimited_identifier_body '"'

  identifier = regular_identifier / delimited_identifier


More information about the dal mailing list