ADQL 2.1: boolean literals and expressions

Mon Mar 5 15:11:01 CET 2018

Hi DAL,

There was a bit of discussion on ADQL 2.1 booleans in a github issue
(https://github.com/ivoa/lyonetia/issues/8#issuecomment-368472038)
already, but I think we'll have to do a bit more consensus building
on that.

The short version is that the current grammar also allows something
like

  SELECT * FROM table
  WHERE contains_bool(pt1, circ2) or col_ref

where contains_bool would be a boolean-valued function and col_ref
would be reference to a boolean-valued column.  Well, the latter
isn't actually in the grammar AFAICT, but that looks like an
oversight given that we have boolean functions.

I think we shouldn't do that.  I have two reasons:

(a) it makes it an order of magnitude harder to come up with halfway
non-confusing error messages on malformed queries.  This is because
if someone writes

  WHERE ra OR dec BETWEEN 12 and 14

it is only when I realise that ra is not a boolean column that I can
say something is wrong.  Type inference is not something I want to do
in my parser, as it's *much* easier when the tree is completely
parsed.  Think about what you need to do when things become

  SELECT * FROM
    (SELECT a=2 as b FROM src)
  WHERE b

Ugh.  Let's keep types out of the syntax if we can.  Syntactic sugar
doesn't count in my book.

(b) We already have 1=INTERSECTS, 1=CONTAINS, 1=ivo_nocase_match, and
more.  If we now create boolean functions, we'll have more explaining
to do as to where these odd beasts come from.

On the other hand, I see that we want to be able to deal with boolean
columns in ADQL.  I guess that, essentially, we want to compare
column references against boolean literals.

Against the ADQL 2.0 grammar, I think we should have something like:

  <boolean_literal> ::= True | False
  <value_expression> ::=              (all but one old)
        <numeric_value_expression>
      | <string_value_expression>
      | <boolean_value_expression>   
      | <geometry_value_expression>
  <boolean_value_expression> ::= 
    <numeric_value_expression> <comp_op> <boolean_literal>
    | <boolean_literal> <comp_op> <numeric_value_expression>

-- which certainly is not terribly elegant (and I could see stepping
down from numeric_value_expression to save on explanantions) but at
least disallows "naked" boolean column references and "naked" boolean
functions that complicate parsing as described above.  It would of
course be even cooler if we could just require True=1 and False=0,
because then we could let people write

  True=CONTAINS(...)

But I can see how that might not fly with databases that have a
strict boolean type (are there any?).

For reference, the boolean-related rules in the current draft are

  <boolean_function> ::=  (MISSING IN CURRENT PR)

  <boolean_literal> ::= True | False

  <boolean_primary> ::=
       <left_paren> <search_condition> <right_paren>
     | <predicate>
     | <boolean_value_expression>

  <boolean_factor> ::= [ NOT ] <boolean_primary>

  <boolean_term> ::=
        <boolean_factor>
      | <boolean_term> AND <boolean_factor>

  <boolean_value_expression> ::= 
       <boolean_literal>
     | <boolean_function>
     | <user_defined_function>

  <value_expression> ::=
      <numeric_value_expression>
    | <string_value_expression>
    | <boolean_value_expression>
    | <geometry_value_expression>

  <search_condition> ::=
      <boolean_term>
    | <search_condition> OR <boolean_term>

-- as I said, this is still missing the boolean column references
against what I *think* the plan was, and I've not thought through
where this will let True and False crop up.  For now, I've filed this
under "creepy".

Has anyone implemented that already?

          -- Markus