ADQL: Bitwise operators
Ger van Diepen
diepen at astron.nl
Thu Feb 15 08:08:55 CET 2018
Hi Markus,
I agree that the bitwise operators should have the same precedence as In C, etc. I’ve implemented them that way in TaQL.
I do not understand why it should be so much work to change the precedence of operators in existing code. If a tool such as flex/bison is used, it is really trivial. I assume similar tools exist for Java.
Also note the precedence of OR and AND. I don’t know how it is defined in ADQL, but normally AND has a higher precedence than OR which many people do not realise. Therefore many compilers give a warning if AND and OR are used without parentheses. Probably the same could be done for bitwise operators.
On another note.
I’ve never understood why function names (in SQL and ADQL) have to be reserved words; in principle it makes it impossible (in a backward compatible way) to add new functions because somewhere columns with such names could exist. Every parser (with 1 look-ahead) should be able to distinguish between a column name and a function name using the subsequent opening parenthesis.
Cheers,
Ger
> On 14 Feb 2018, at 16:09, Markus Demleitner <msdemlei at ari.uni-heidelberg.de> wrote:
>
> Hi DAL,
>
> Although I'd still like to see the question of whether many of the
> new ADQL features should be mandatory discussed (cf.
> http://mail.ivoa.net/pipermail/dal/2018-February/007961.html), here's
> the next installment of my bit^Wcomments on ADQL 2.1. This time,
> it's on the bitwise operators:
>
> Full disclosure: I was always convinced we'd be doing ourselves a
> favour if the bitwise operations were done as functions.
>
> With the PR out, I'm *pretty* sure the way the grammar is written for
> bitwise operators is "bad" (in the sense of: implies things we
> don't want and that imply behaviour not in line with what usual
> database engines do).
>
> Specifically, I believe the precendence and associativity rules
> implicit in the grammar should ensure commutative operators stay
> commutative. Now, with the current grammar, as far as I can see,
>
> 3 * 5 & 4
>
> will parse as bitwise(term(3, *, 5), &, 4) (evaluating to 4), whereas
>
> 4 & 3 * 5
>
> will parse as term(bitwise(4, &, 3), *, 5) (which evaluates to 0).
>
> This is, essentially, because the grammar has term and
> bitwise_expression on one level. Admittedly, that doesn't matter if
> all we expect is that the grammar accepts valid statements and
> rejects invalid ones. But, really, people use the grammar to build
> trees, and if the trees don't match what people think they should be,
> I'm sure we'll end up having trouble. And either way, *if* we
> overrode implicit precedence, we'd have to put up an operator
> precendence table.
>
> The right way to deal with this would to have term and bitwise on two
> different levels. Inspecting the C precendence table as well as
> postgres' behaviour, bitwise operators ought to bind less strongly
> than + and -. This would mean that <bitwise_expression> has to go
> from <numeric_value_expression> and is used whereever right now we
> have <numeric_value_expression>. And then <bitwise_expression> would
> have <numeric_value_expression> as its first alternative.
>
> In practice, this is a fairly nasty operation in existing code,
> because <numeric_value_expression> crops up everywhere in the backend
> and annotation code. Changing the "root" of what we accept as
> expression in many places is painful. Possible, yes, but to me this
> seems a lot of work for a fairly marginal feature.
>
> So, I come back to proposing using functions for the bitwise
> operations rather than operators. Granted, some users might be a bit
> unhappy at first, but they'll be rewarded with consistent behaviour
> across many services when we go for functions, and whether they're
> aware of that or not, they'll like that better and the small comfort
> of being able to write the C operators (with their nasty precendence
> rules that lead many C textbook authors to recommend to always write
> parentheses when writing expressions containing them).
>
> To prove it's easy I've now put in functions BITWISE_AND, _OR, _XOR,
> _NOT into DaCHS. It was close to trivial (add reserved words, add
> the names and argument count in <math_function>, and add about 5
> lines of morphing code -- it was 30 minutes including writing test
> cases), lets users do all they can do with the operators, and has
> essentially zero impact on existing code. 100% my choice.
>
> I'd volunteer to update the grammar, too.
>
> Oh, additional advantage: when we want to add bitshifts and/or bitwise
> concatenation (both of which postgres supports), we won't have to add
> another level of expression rules in the grammar, again shifting the
> top-level expression rule.
>
> Asked the other way round: Has anyone implemented the bitwise
> operators on their end?
>
> -- Markus
>
> PS: I claim the service at http://dc.g-vo.org/tap now understands all
> of ADQL 2.1, *except*:
>
> * BOX constructed with POINT (I'll propose to deprecate BOX anyway)
> * CTEs (more on that later)
> * boolean_value_expression (more on that later)
> * bitwise expressions (see this mail)
>
> On the set operations, some further experiments would be necessary,
> as my implementaion is quite a bit different from what our grammar
> says, and I distinctly remember something was wrong with the rules
> from the TAP implemenation note (that, I think, are what's in the
> grammar now). Further implementation needed before that can go to
> REC.
>
> I'll probably declare the "optional" features present later, even
> though I don't think they should be optional in the the first place.
> Also, the service still says it's doing ADQL-2.0; let me know if you
> need this updated.
More information about the dal
mailing list