ADQL 2.1: Preferred crossmatch syntax
Mark Taylor
m.b.taylor at bristol.ac.uk
Thu Nov 2 13:26:43 CET 2017
Dear DAL,
as discussed briefly in Chile, and also at previous interops etc
(see e.g. http://mail.ivoa.net/pipermail/dal/2016-February/007331.html
and its context) I would like to see a preferred/recommended syntax
for positional sky crossmatching in ADQL.
I have drafted a section for possible inclusion in the ADQL 2.1 draft.
Here it is:
\subsection{Preferred Crossmatch Syntax}
An especially common operation that astronomers require when working
with source catalogues is the positional sky crossmatch.
In its simplest form this is a join between two tables with the
requirement that the distance along a great circle between the
sky positions of the two associated rows is less than or equal to
a given threshold.
The geometrical operations provided by ADQL offer a number of
semantically equivalent ways to specify such a condition in
the WHERE clause,
for instance using various combinations of POINT, CIRCLE and DISTANCE.
While a correct implementation will generate the same result for
any of these specifications, the performance characteristics may
differ dramatically depending on implementation.
Given this, it is difficult for (human or machine) ADQL authors
to know how to phrase an crossmatch with the expectation that it
will be executed efficiently, and difficult for services to know
which forms of query to optimise.
The result can be the unnecessarily slow operation of the common
sky crossmatch operation.
The purpose of this section is to propose a preferred form of the WHERE
clause used for sky crossmatches. Clients posing crossmatch-like
queries are advised to phrase them like this rather than semantically
equivalent alternatives, and services are encouraged to ensure that
this form of join constraint is evaluated efficiently;
this might involve identifying such ADQL input clauses and rewriting
them appropriately for efficient processing on the database backend.
Alternative semantically equivalent forms however MAY still be
used by clients, and MUST still be handled correctly by services.
The preferred form of WHERE clause to constrain a sky position
crossmatch is
\begin{verbatim}
WHERE DISTANCE(lon1, lat1, lon2, lat2) < r_max_deg
\end{verbatim}
or equivalently
\begin{verbatim}
WHERE DISTANCE(POINT(lon1, lat1), POINT(lon2, lat2)) < r_max_deg
\end{verbatim}
In some cases the performance characteristics will depend on which
way round the coordinates are specified.
The preferred form is that the coordinate pair more likely to be handled
efficiently (with an index, or from a small table)
should be in the {\em first\/} position ({\tt lon1}, {\tt lat1})
and the pair less likely to be handled efficiently
(without an index, or from a large or maybe uploaded table)
should be in the {\em second\/} position ({\TT lon2}, {\tt lat2}).
I think this (sub)section should probably go near the end of
section 4.2, but you could argue for other places in sec 4,
or maybe even section 2.
Questions on the details:
- Is DISTANCE (in 2 flavours) the best syntax to use here?
I think it's what we agreed on in Cape Town, partly on the
grounds that its meaning is transparent and it doesn't require
any new ADQL syntax. But I don't care that much what the
syntax is, as long as it's agreed. The other possibility
that I know has been used (is currently recommended in TOPCAT,
but never to my knowledge explicitly endorsed in any standard
or Note) is
1=CONTAINS(POINT(lon1, lat1), CIRCLE(lon2, lat2, r_max_deg)).
But that looks much less user-friendly to me.
- Is this promotion of a preferred crossmatch form a good idea?
I think it is, for the reasons sketched above, but if anybody
is unconvinced I can try to argue the case more strongly.
- Does the wording about which point goes in first/second position
make sense? Is it the right way round? I know that different
services prefer different orders (e.g. TAPVizieR vs. DaCHS).
Again, I don't care which it is, but if it's the sort of thing
that TAP implementations can benefit from (and I have the
impression that it is) there should be an agreement.
- Any other wording changes that people want, fine. I'm not a
database expert.
Thanks for considering,
Mark
--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-9288776 http://www.star.bris.ac.uk/~mbt/
More information about the dal
mailing list