ADQL 2.1: Preferred crossmatch syntax

Mark Taylor m.b.taylor at bristol.ac.uk
Mon Nov 6 16:23:30 CET 2017


On Mon, 6 Nov 2017, Markus Demleitner wrote:

> Hi DAL,
> 
> On Fri, Nov 03, 2017 at 09:26:59PM +0000, Mark Taylor wrote:
> > On Fri, 3 Nov 2017, Francois-Xavier PINEAU wrote:
> > > Although I agree with the approach, there is one point that bothers me.
> > > 
> > > I know that not everyone agrees on the matter, but I personally
> > > prefer to declare the cross-match condition in the JOIN (like in
> > > your InteropOct2015DAL presentation) rather than in the WHERE.
> > >
> > > In fact, in relational algebra, I prefer to see the cross-match as a
> > > theta-join, not as a selection on the result of a cross-product.
> 
> +1 to FX' proposal; it's a good thing to be explicit on what's a join
> condition and what's a constraint on the resulting relation.

Sounds reasonable enough to me, so unless anybody speaks up against,
let's go with that instead.

So if we drop the business about order of positions in the DISTANCE
evaluation, and prefer the condition in the JOIN rather than the
WHERE clause (along with a few minor wording changes that have
occurred to me in the mean time) we have:

   \subsection{Preferred Crossmatch Syntax}

   An especially common operation that astronomers require when working
   with source catalogues is the positional sky crossmatch.
   In its simplest form this is a join between two tables with the
   requirement that the distance along a great circle between the
   sky positions of the two associated rows is less than or equal to
   a given threshold.

   The geometrical functions provided by ADQL offer a number of
   semantically equivalent ways to specify such a condition in
   the JOIN or WHERE clause, for instance using various 
   combinations of POINT, CIRCLE and DISTANCE.
   While a correct implementation will generate the same result for
   any of these specifications, the performance characteristics may
   differ dramatically depending on implementation.
   Given this, it is difficult for (human or machine) ADQL authors
   to know how to phrase a crossmatch with the expectation that it
   will be executed efficiently, and difficult for services to know
   which forms of query to optimise.  The result can be the 
   unnecessarily slow operation of the common sky crossmatch operation.

   The purpose of this section is to propose a preferred form of ADQL
   to use for sky crossmatches.  Clients posing crossmatch-like
   queries are advised to phrase them like this rather than semantically
   equivalent alternatives, and services are encouraged to ensure that
   this form of join is executed efficiently; this might involve 
   identifying such ADQL input clauses and rewriting them appropriately 
   for efficient processing on the database backend.

   The preferred way to specify a sky position-only crossmatch is:
   \begin{verbatim}
      JOIN ... ON DISTANCE(lon1, lat1, lon2, lat2) < r_max_deg
   \end{verbatim}
   or equivalently
   \begin{verbatim}
      JOIN ... ON DISTANCE(POINT(lon1, lat1), POINT(lon2, lat2)) < r_max_deg
   \end{verbatim}

   Alternative semantically equivalent forms however MAY still be
   used by clients, and MUST still be handled correctly by services.

Any new or old objections, please post them.

Mark

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/


More information about the dal mailing list