ADQL 2.1: Preferred crossmatch syntax

Francois-Xavier PINEAU francois-xavier.pineau at astro.unistra.fr
Fri Nov 3 11:29:06 CET 2017


Dear Mark and DAL,

Although I agree with the approach, there is one point that bothers me.

I know that not everyone agrees on the matter, but I personally prefer 
to declare the cross-match condition in the JOIN (like in your 
InteropOct2015DAL presentation) rather than in the WHERE.

In fact, in relational algebra, I prefer to see the cross-match as a 
theta-join, not as a selection on the result of a cross-product.

Best regards,


fx



Le 02/11/2017 à 13:26, Mark Taylor a écrit :
> Dear DAL,
>
> as discussed briefly in Chile, and also at previous interops etc
> (see e.g. http://mail.ivoa.net/pipermail/dal/2016-February/007331.html
> and its context) I would like to see a preferred/recommended syntax
> for positional sky crossmatching in ADQL.
>
> I have drafted a section for possible inclusion in the ADQL 2.1 draft.
> Here it is:
>
>     \subsection{Preferred Crossmatch Syntax}
>
>     An especially common operation that astronomers require when working
>     with source catalogues is the positional sky crossmatch.
>     In its simplest form this is a join between two tables with the
>     requirement that the distance along a great circle between the
>     sky positions of the two associated rows is less than or equal to
>     a given threshold.
>
>     The geometrical operations provided by ADQL offer a number of
>     semantically equivalent ways to specify such a condition in
>     the WHERE clause,
>     for instance using various combinations of POINT, CIRCLE and DISTANCE.
>     While a correct implementation will generate the same result for
>     any of these specifications, the performance characteristics may
>     differ dramatically depending on implementation.
>     Given this, it is difficult for (human or machine) ADQL authors
>     to know how to phrase an crossmatch with the expectation that it
>     will be executed efficiently, and difficult for services to know
>     which forms of query to optimise.
>     The result can be the unnecessarily slow operation of the common
>     sky crossmatch operation.
>
>     The purpose of this section is to propose a preferred form of the WHERE
>     clause used for sky crossmatches.  Clients posing crossmatch-like
>     queries are advised to phrase them like this rather than semantically
>     equivalent alternatives, and services are encouraged to ensure that
>     this form of join constraint is evaluated efficiently;
>     this might involve identifying such ADQL input clauses and rewriting
>     them appropriately for efficient processing on the database backend.
>     Alternative semantically equivalent forms however MAY still be
>     used by clients, and MUST still be handled correctly by services.
>
>     The preferred form of WHERE clause to constrain a sky position
>     crossmatch is
>     \begin{verbatim}
>        WHERE DISTANCE(lon1, lat1, lon2, lat2) < r_max_deg
>     \end{verbatim}
>     or equivalently
>     \begin{verbatim}
>        WHERE DISTANCE(POINT(lon1, lat1), POINT(lon2, lat2)) < r_max_deg
>     \end{verbatim}
>
>     In some cases the performance characteristics will depend on which
>     way round the coordinates are specified.
>     The preferred form is that the coordinate pair more likely to be handled
>     efficiently (with an index, or from a small table)
>     should be in the {\em first\/} position ({\tt lon1}, {\tt lat1})
>     and the pair less likely to be handled efficiently
>     (without an index, or from a large or maybe uploaded table)
>     should be in the {\em second\/} position ({\TT lon2}, {\tt lat2}).
>
>
> I think this (sub)section should probably go near the end of
> section 4.2, but you could argue for other places in sec 4,
> or maybe even section 2.
>
> Questions on the details:
>
>     - Is DISTANCE (in 2 flavours) the best syntax to use here?
>       I think it's what we agreed on in Cape Town, partly on the
>       grounds that its meaning is transparent and it doesn't require
>       any new ADQL syntax.  But I don't care that much what the
>       syntax is, as long as it's agreed.  The other possibility
>       that I know has been used (is currently recommended in TOPCAT,
>       but never to my knowledge explicitly endorsed in any standard
>       or Note) is
>       1=CONTAINS(POINT(lon1, lat1), CIRCLE(lon2, lat2, r_max_deg)).
>       But that looks much less user-friendly to me.
>
>     - Is this promotion of a preferred crossmatch form a good idea?
>       I think it is, for the reasons sketched above, but if anybody
>       is unconvinced I can try to argue the case more strongly.
>
>     - Does the wording about which point goes in first/second position
>       make sense?  Is it the right way round?  I know that different
>       services prefer different orders (e.g. TAPVizieR vs. DaCHS).
>       Again, I don't care which it is, but if it's the sort of thing
>       that TAP implementations can benefit from (and I have the
>       impression that it is) there should be an agreement.
>
>     - Any other wording changes that people want, fine.  I'm not a
>       database expert.
>
> Thanks for considering,
>
> Mark
>
> --
> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/



More information about the dal mailing list