Problems with RegTAP, ADQL, natural joins, and MS SQL Server

Thu Feb 25 10:02:39 CET 2016

On Wed, Feb 24, 2016 at 09:33:12PM +0100, Menelaus Perdikeas wrote:
> I am pretty set in considering NATURAL JOINS an anti-pattern
> (mostly due to being opaque) and far from being elegant. 
> Personally, I always use aliases since they are very explicit,
> precise and reduce redundancy in typing as well. 
> 
> But I don't want to start a religious discussion. 

Phewy.  Because, perhaps not surprisingly, I'm a moderate fan of
natural joins.  Sure, they are less explicit than USING or OR joins,
but I claim that's reasonable encapsulation -- in non-ad hoc joins,
users should not need to know the entire schema and can just combine
tables as they want.  And yes, TAP allows the discovery of "intended"
join patterns using the foreign key tables, but really -- it's an
order of magnitude  more hassle for our users.

I think the root of the disagreement is that I am still hoping that
astronomers will, in large numbers, write one-off queries.  For
these, simplicity in query authoring and reducing requirements as to
familiarity with the underlying schema is IMHO more important than
explicitness and perhaps maintainability.

Be that as it may, ADQL 2.0 is clear that compliant implementations
need to understand NATURAL, and it'd suck^Wbe regrettable if we had a
sizeable number of non-compliant-by-design TAP services out there.
We could start booting out NATURAL in ADQL 2.1 if antipathies are
really that large.  Nobody has put that forward so far, though, so
I'd *really* like to see NATURAL in all (major) TAP implementations.
RegTAP is only a minor factor in there.

The good news is that NATURAL isn't terribly hard to do in the query
morpher.  Well, it isn't if you have a sufficiently annotated parse
tree (and I believe Grégory's code produces that; DaCHS does, at
least) -- then, in the two operands of a NATURAL join, enumerate all
columns (of the table or the subquery) having the same name into
col_1,...,col_n and re-write the query to 

  <op_1> JOIN <op_2> USING (<col_1>,...,<col_n>).

I'll Bambi-eye Grégory later today to implement this on his end.

> We too are using Gregory's parser and have encountered a couple of
> cases where the specific syntax of queries generated by TopCAT
> seems to not sit well with the library's ADQL parser or our native
> back-end's (PostgreSQL 9.2) capabilities, so when you contribute

Oh?  What would that be?  My gripe have been the pattern queries with
leading % in the pattern (because they don't use an index), but other
than that I found the query fairly ok.  Will you be in Edinburgh to
the Tech Forum?  If so, that could be a good place to sit together
with Mark and optimise the query a bit.

Cheers,

         Markus