ADQL 2.1: Preferred crossmatch syntax
Francois-Xavier PINEAU
francois-xavier.pineau at astro.unistra.fr
Fri Nov 3 11:29:06 CET 2017
Dear Mark and DAL,
Although I agree with the approach, there is one point that bothers me.
I know that not everyone agrees on the matter, but I personally prefer
to declare the cross-match condition in the JOIN (like in your
InteropOct2015DAL presentation) rather than in the WHERE.
In fact, in relational algebra, I prefer to see the cross-match as a
theta-join, not as a selection on the result of a cross-product.
Best regards,
fx
Le 02/11/2017 à 13:26, Mark Taylor a écrit :
> Dear DAL,
>
> as discussed briefly in Chile, and also at previous interops etc
> (see e.g. http://mail.ivoa.net/pipermail/dal/2016-February/007331.html
> and its context) I would like to see a preferred/recommended syntax
> for positional sky crossmatching in ADQL.
>
> I have drafted a section for possible inclusion in the ADQL 2.1 draft.
> Here it is:
>
> \subsection{Preferred Crossmatch Syntax}
>
> An especially common operation that astronomers require when working
> with source catalogues is the positional sky crossmatch.
> In its simplest form this is a join between two tables with the
> requirement that the distance along a great circle between the
> sky positions of the two associated rows is less than or equal to
> a given threshold.
>
> The geometrical operations provided by ADQL offer a number of
> semantically equivalent ways to specify such a condition in
> the WHERE clause,
> for instance using various combinations of POINT, CIRCLE and DISTANCE.
> While a correct implementation will generate the same result for
> any of these specifications, the performance characteristics may
> differ dramatically depending on implementation.
> Given this, it is difficult for (human or machine) ADQL authors
> to know how to phrase an crossmatch with the expectation that it
> will be executed efficiently, and difficult for services to know
> which forms of query to optimise.
> The result can be the unnecessarily slow operation of the common
> sky crossmatch operation.
>
> The purpose of this section is to propose a preferred form of the WHERE
> clause used for sky crossmatches. Clients posing crossmatch-like
> queries are advised to phrase them like this rather than semantically
> equivalent alternatives, and services are encouraged to ensure that
> this form of join constraint is evaluated efficiently;
> this might involve identifying such ADQL input clauses and rewriting
> them appropriately for efficient processing on the database backend.
> Alternative semantically equivalent forms however MAY still be
> used by clients, and MUST still be handled correctly by services.
>
> The preferred form of WHERE clause to constrain a sky position
> crossmatch is
> \begin{verbatim}
> WHERE DISTANCE(lon1, lat1, lon2, lat2) < r_max_deg
> \end{verbatim}
> or equivalently
> \begin{verbatim}
> WHERE DISTANCE(POINT(lon1, lat1), POINT(lon2, lat2)) < r_max_deg
> \end{verbatim}
>
> In some cases the performance characteristics will depend on which
> way round the coordinates are specified.
> The preferred form is that the coordinate pair more likely to be handled
> efficiently (with an index, or from a small table)
> should be in the {\em first\/} position ({\tt lon1}, {\tt lat1})
> and the pair less likely to be handled efficiently
> (without an index, or from a large or maybe uploaded table)
> should be in the {\em second\/} position ({\TT lon2}, {\tt lat2}).
>
>
> I think this (sub)section should probably go near the end of
> section 4.2, but you could argue for other places in sec 4,
> or maybe even section 2.
>
> Questions on the details:
>
> - Is DISTANCE (in 2 flavours) the best syntax to use here?
> I think it's what we agreed on in Cape Town, partly on the
> grounds that its meaning is transparent and it doesn't require
> any new ADQL syntax. But I don't care that much what the
> syntax is, as long as it's agreed. The other possibility
> that I know has been used (is currently recommended in TOPCAT,
> but never to my knowledge explicitly endorsed in any standard
> or Note) is
> 1=CONTAINS(POINT(lon1, lat1), CIRCLE(lon2, lat2, r_max_deg)).
> But that looks much less user-friendly to me.
>
> - Is this promotion of a preferred crossmatch form a good idea?
> I think it is, for the reasons sketched above, but if anybody
> is unconvinced I can try to argue the case more strongly.
>
> - Does the wording about which point goes in first/second position
> make sense? Is it the right way round? I know that different
> services prefer different orders (e.g. TAPVizieR vs. DaCHS).
> Again, I don't care which it is, but if it's the sort of thing
> that TAP implementations can benefit from (and I have the
> impression that it is) there should be an agreement.
>
> - Any other wording changes that people want, fine. I'm not a
> database expert.
>
> Thanks for considering,
>
> Mark
>
> --
> Mark Taylor Astronomical Programmer Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk +44-117-9288776 http://www.star.bris.ac.uk/~mbt/
More information about the dal
mailing list