ADQL 2.1: Preferred crossmatch syntax
Mark Taylor
m.b.taylor at bristol.ac.uk
Mon Nov 6 16:23:30 CET 2017
On Mon, 6 Nov 2017, Markus Demleitner wrote:
> Hi DAL,
>
> On Fri, Nov 03, 2017 at 09:26:59PM +0000, Mark Taylor wrote:
> > On Fri, 3 Nov 2017, Francois-Xavier PINEAU wrote:
> > > Although I agree with the approach, there is one point that bothers me.
> > >
> > > I know that not everyone agrees on the matter, but I personally
> > > prefer to declare the cross-match condition in the JOIN (like in
> > > your InteropOct2015DAL presentation) rather than in the WHERE.
> > >
> > > In fact, in relational algebra, I prefer to see the cross-match as a
> > > theta-join, not as a selection on the result of a cross-product.
>
> +1 to FX' proposal; it's a good thing to be explicit on what's a join
> condition and what's a constraint on the resulting relation.
Sounds reasonable enough to me, so unless anybody speaks up against,
let's go with that instead.
So if we drop the business about order of positions in the DISTANCE
evaluation, and prefer the condition in the JOIN rather than the
WHERE clause (along with a few minor wording changes that have
occurred to me in the mean time) we have:
\subsection{Preferred Crossmatch Syntax}
An especially common operation that astronomers require when working
with source catalogues is the positional sky crossmatch.
In its simplest form this is a join between two tables with the
requirement that the distance along a great circle between the
sky positions of the two associated rows is less than or equal to
a given threshold.
The geometrical functions provided by ADQL offer a number of
semantically equivalent ways to specify such a condition in
the JOIN or WHERE clause, for instance using various
combinations of POINT, CIRCLE and DISTANCE.
While a correct implementation will generate the same result for
any of these specifications, the performance characteristics may
differ dramatically depending on implementation.
Given this, it is difficult for (human or machine) ADQL authors
to know how to phrase a crossmatch with the expectation that it
will be executed efficiently, and difficult for services to know
which forms of query to optimise. The result can be the
unnecessarily slow operation of the common sky crossmatch operation.
The purpose of this section is to propose a preferred form of ADQL
to use for sky crossmatches. Clients posing crossmatch-like
queries are advised to phrase them like this rather than semantically
equivalent alternatives, and services are encouraged to ensure that
this form of join is executed efficiently; this might involve
identifying such ADQL input clauses and rewriting them appropriately
for efficient processing on the database backend.
The preferred way to specify a sky position-only crossmatch is:
\begin{verbatim}
JOIN ... ON DISTANCE(lon1, lat1, lon2, lat2) < r_max_deg
\end{verbatim}
or equivalently
\begin{verbatim}
JOIN ... ON DISTANCE(POINT(lon1, lat1), POINT(lon2, lat2)) < r_max_deg
\end{verbatim}
Alternative semantically equivalent forms however MAY still be
used by clients, and MUST still be handled correctly by services.
Any new or old objections, please post them.
Mark
--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-9288776 http://www.star.bris.ac.uk/~mbt/
More information about the dal
mailing list