ADQL: Bitwise operators

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Wed Feb 14 16:09:07 CET 2018


Hi DAL,

Although I'd still like to see the question of whether many of the
new ADQL features should be mandatory discussed (cf.
http://mail.ivoa.net/pipermail/dal/2018-February/007961.html), here's
the next installment of my bit^Wcomments on ADQL 2.1.  This time,
it's on the bitwise operators:

Full disclosure: I was always convinced we'd be doing ourselves a
favour if the bitwise operations were done as functions.

With the PR out, I'm *pretty* sure the way the grammar is written for
bitwise operators is "bad" (in the sense of: implies things we
don't want and that imply behaviour not in line with what usual
database engines do).

Specifically, I believe the precendence and associativity rules
implicit in the grammar should ensure commutative operators stay
commutative.  Now, with the current grammar, as far as I can see, 

3 * 5 & 4 

will parse as bitwise(term(3, *, 5), &, 4) (evaluating to 4), whereas

4 & 3 * 5

will parse as term(bitwise(4, &, 3), *, 5) (which evaluates to 0).

This is, essentially, because the grammar has term and
bitwise_expression on one level.  Admittedly, that doesn't matter if
all we expect is that the grammar accepts valid statements and
rejects invalid ones.  But, really, people use the grammar to build
trees, and if the trees don't match what people think they should be,
I'm sure we'll end up having trouble.  And either way, *if* we
overrode implicit precedence, we'd have to put up an operator
precendence table.

The right way to deal with this would to have term and bitwise on two
different levels.  Inspecting the C precendence table as well as
postgres' behaviour, bitwise operators ought to bind less strongly
than + and -.  This would mean that <bitwise_expression> has to go
from <numeric_value_expression> and is used whereever right now we
have <numeric_value_expression>.  And then <bitwise_expression> would
have <numeric_value_expression> as its first alternative.

In practice, this is a fairly nasty operation in existing code,
because <numeric_value_expression> crops up everywhere in the backend
and annotation code.  Changing the "root" of what we accept as
expression in many places is painful.  Possible, yes, but to me this
seems a lot of work for a fairly marginal feature.

So, I come back to proposing using functions for the bitwise
operations rather than operators.  Granted, some users might be a bit
unhappy at first, but they'll be rewarded with consistent behaviour
across many services when we go for functions, and whether they're
aware of that or not, they'll like that better and the small comfort
of being able to write the C operators (with their nasty precendence
rules that lead many C textbook authors to recommend to always write
parentheses when writing expressions containing them).

To prove it's easy I've now put in functions BITWISE_AND, _OR, _XOR,
_NOT into DaCHS.  It was close to trivial (add reserved words, add
the names and argument count in <math_function>, and add about 5
lines of morphing code -- it was 30 minutes including writing test
cases), lets users do all they can do with the operators, and has
essentially zero impact on existing code.  100% my choice.

I'd volunteer to update the grammar, too.

Oh, additional advantage: when we want to add bitshifts and/or bitwise
concatenation (both of which postgres supports), we won't have to add
another level of expression rules in the grammar, again shifting the
top-level expression rule.

Asked the other way round: Has anyone implemented the bitwise
operators on their end?

        -- Markus

PS: I claim the service at http://dc.g-vo.org/tap now understands all
of ADQL 2.1, *except*:

 * BOX constructed with POINT (I'll propose to deprecate BOX anyway)
 * CTEs (more on that later)
 * boolean_value_expression (more on that later)
 * bitwise expressions (see this mail)

On the set operations, some further experiments would be necessary,
as my implementaion is quite a bit different from what our grammar
says, and I distinctly remember something was wrong with the rules
from the TAP implemenation note (that, I think, are what's in the
grammar now).  Further implementation needed before that can go to
REC.

I'll probably declare the "optional" features present later, even
though I don't think they should be optional in the the first place.
Also, the service still says it's doing ADQL-2.0; let me know if you
need this updated.


More information about the dal mailing list