ADQL 2.1: Grammar simplification?

Wed Aug 24 10:54:02 CEST 2022

Dear DAL,

I have to admit it is only now that I realise that ADQL 2.1
liberalises ORDER BY and GROUP BY to also accept value expressions.
That's going beyond SQL92, which has, as ADQL 2.0,

   <sort key> ::=
          <column name>
        | <unsigned integer>

In ADQL 2.1, we now say:

    <order_by_expression> ::=
          <unsigned_decimal>
        | <column_reference>
        | <value_expression>

I'm not *really* objecting to these changes, in particular because
it's not a problem for DaCHS with its postgres backend (and DaCHS
will support this from version 2.6.2 onwards).  But since it's a
fairly profound change beyond SQL92 that *may* give people a headache
on simpler SQL engines I figured it should be mentioned on the
mailing list at least once.

Meanwhile, I do not understand why the ADQL 2.1 rule has both
<column_reference> *and* <value_expression> as alternatives.  Perhaps
I'm missing something, but since

<value_expression> 
  →  <numeric_value_expression> 
  →  <term>
  →  <factor>
  →  <factor>
  →  <numeric_primary>
  →  <value_expression_primary>
  →  <column_reference>

<value_expression> expands to <column_reference> as is.  Was the
explicit inclusion of <column_reference> done on purpose?  If not,
can we drop the explicit <column_reference> to avoid future
confusion?

A similar consideration applies to

    <group_by_term> ::=
          <column_reference>
        | <value_expression>

In this case, by the way, I'm particularly hesitant about endorsing
the change.  You see, anything you do not group by can only be in the
select list via aggregate functions, and thus allowing
<value_expression>-s here raises the question of expression
equivalence big time.  For instance, if this rule has to have any
sense at all, then

  select nrows+1 from tap_schema.tables group by nrows+1

would be a valid statement (it is in postgres).

Mathematically,

  select 1+nrows from tap_schema.tables group by nrows+1

would plausibly be the same thing (whenever + is a commutative
operator).  Should engines realise that?  Postgres, for one, does
not.

If they should realise that, would they also have to work out that

  select nrows*nrows+2*nrows*table_index+table_index*table_index
  from tap_schema.tables group by power(nrows+table_index, 2)

is fine, too?

All that is so subtle, and you can easily avoid having to group by
expressions at all -- in the example, you'd just say

  select modrows from
    (select 1+nrows as modrows from tap_schema.tables) as q
  group by modrows

-- that I'd suggest reverting the GROUP BY grammar to what it was in
ADQL 2.1 unless someone remembers a strong reason for why it was
changed in the first place.

         -- Markus