ADQL 2.1: Preferred crossmatch syntax

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Mon Nov 6 15:52:33 CET 2017


Hi DAL,

On Fri, Nov 03, 2017 at 09:26:59PM +0000, Mark Taylor wrote:
> On Fri, 3 Nov 2017, Francois-Xavier PINEAU wrote:
> > Although I agree with the approach, there is one point that bothers me.
> > 
> > I know that not everyone agrees on the matter, but I personally
> > prefer to declare the cross-match condition in the JOIN (like in
> > your InteropOct2015DAL presentation) rather than in the WHERE.
> >
> > In fact, in relational algebra, I prefer to see the cross-match as a
> > theta-join, not as a selection on the result of a cross-product.

+1 to FX' proposal; it's a good thing to be explicit on what's a join
condition and what's a constraint on the resulting relation.

Compare:

SELECT
  some_catalog AS t
  JOIN gaia AS g
  ON (
    DISTANCE(t.pos, g.pos)<1/3600.
    AND ABS(t.vmag-g.phot_mag_G_mean)<0.1)
WHERE 
  parallax>10

(for some fictional mapping of the data that has ra and dec in
points) with

SELECT
  some_catalog AS t,
  gaia AS g
WHERE 
  DISTANCE(t.pos, g.pos)<1/3600.
  AND ABS(t.vmag-g.phot_mag_G_mean)<0.1)
  AND parallax>10

-- in the second case, the actual physics ("nearby stars") is harder
to discern, as is what the actual match criteria are.

Now scale that to four tables and you'll see one good reason for FX'
preference.

> Is it likely to make a difference in performance terms which of these
> two ways it's done?  I.e. do we need to specify here what's the
> preferred way to use a constraint like DISTANCE()<threshold,
> or is this something we can harmlessly leave to the taste of
> the query author?

In principle, the query planner should even it all out.

In practice, it's hard to predict.  But when we say "join conditions
should sit next to the join clause (and not in some WHERE clause
possibly lexically far remote)", that can help implementors when tuning
their query planners' parameters.

            -- Markus


More information about the dal mailing list