ADQL grammar validation

Wed Apr 19 14:42:30 CEST 2017

Hi all,

On 2017-04-19 08:13, Markus Demleitner wrote:
> 
> First: Thanks, Grégory, for taking this this far already.
> 

I agree.

Based on what we have found so far, it looks like PEG is the better 
option.

>> 
>> All in all, this is a big project.  I worry that if we start anew, we
>> will end up with a grammar that is gratuitously different from SQL 92.
> 
> Perhaps, but I'd say internal consistency between the various TAP
> services in the VO is more important than a, given Grégory's
> findings, vague alignment with a, from that perspective, flawed
> upstream standard.
> 

I agree.

Interoperability between VO services is the primary goal.

ADQL was based on SQL 92 to give us a common starting point, it wasn't a 
hard requirement.

The the ADQL specification says

     "ADQL is based on the Structured Query Language (SQL), especially on 
SQL 92."

It doesn't say it *is* SQL 92.

I think going forward, we should view SQL 92 as a guide rather than a 
strict requirement. Particularly as, so far, no one has found definitive 
machine readable definition of SQL 92.

The best option seems to be to start a new PEG grammar and then update 
it whenever we find differences compared to SQL 92.

By default we should do what SQL does, but we do have the choice.

If our query language evolves away from SQL 92 and ends up being more 
strictly defined with a machine readable grammar, then I'd argue that is 
a good thing.

> We also have collected a fairly large body of ADQL we expect to parse
> (and a much smaller one we expect not to parse), so gross mistakes we
> should notice quickly.

I think this is an important goal.

It should behave the way our users expect it to.

A query language that is easy to understand and does what people expect 
is preferable to an accurate replica of a specific version of the SQL 
standard.

> 
> Once that's done, I volunteer for putting in whatever is missing to
> get to 2.0.  For 2.1 features, let's see...
> 

I agree this looks like the best way to proceed.

However, I don't think this is a minor version step.

Updating the existing BNF to fix the obvious inconsistencies, yes, 2.1.

Replacing the existing BNF with a new one developed from the ground up, 
was already bit of a stretch for 2.x.

Replacing the grammar definition with a completely new one, using a 
completely different language to define the grammar, that has to be 3.x.

Not least because the main text of the standard includes several 
fragments of BNF code which will need to be updated to match the new 
grammar.

What if, while developing the PEG grammar we discover that an important 
part of 2.x is wrong.

If we are trying to fit PEG into the 2.x series then we would have 
mangle the PEG grammar to replicate the broken behaviour of 2.x.

Making the PEG grammar the start of a new 3.x series allows us more 
leeway to fix broken things.

     CONTAINS()=1 .. ?

----

Based on this, I would like to propose the following plan.

1) We create a new WD of 2.1 with the changes from discussions since the 
last interop.
2) We add a note to say the BNF will probably be replaced in the next 
version.
3) In May we put the 2.1 draft forward as good enough for PR.

4) Work on 3.0 starts now.
5) We use Grégory's and Walter's work as a basis for a new PEG grammar.
6) We work on increasing the coverage of SQL features in the validation 
queries.
7) We work on the tools to validate the new grammar against those 
queries.
8) We work on updating the main text to match the new grammar.

If all goes to plan then 3.0 could be ready by May 2018.

If it turns out to be harder that we expect and 3.0 is delayed, then at 
least 2.1 will be published and people can use it.

If we hold 2.1 back until we have the new grammar, then best case it is 
delayed until 2018, worst case, we still don't have 2.1 in 2019.

What do you think ?

Is moving to PEG a good idea ?

Does moving to PEG mean a major version step, 2.x to 3.x ?

Is 2.1 good enough for now ?

Thanks,
Dave

--------
Dave Morris
Research Software Engineer
Wide Field Astronomy Unit
Institute for Astronomy
University of Edinburgh
--------