ADQL XMATCH

Thu Feb 11 00:14:02 CET 2016

On Wed, 10 Feb 2016, Tom McGlynn (NASA/GSFC Code 660.1) wrote:

> More significantly: In deciding what functions to provide, it seems like we
> should be primarily be designing ADQL to support our astronomical use cases.
> Regardless of what we choose it will be easy to build queries which will fail
> to use indices optimally. This is true regardless of Point/coordinate
> distinction or use of the xmatch or whatever.  E.g., in my tests of q3c and
> pgsphere apparently trivial changes in the query could determine whether the
> indexes git used efficiently.  I suspect that the same will be true in other
> Postgres libraries and in non-Postgres databases.
> 
> Walter gave a talk in Sydney describing elements of the geometry that IRSA
> tables would be able to support and I think we need to build upon that so that
> we have a recommended syntax for use in joining tables that we all endeavor to
> support.  I believe it is premature to base the ADQL standard upon our
> preconceptions about what is easy or hard for the query optimizer to support.
> The query optimizer is not our customer.  E.g., as I've mentioned at the
> HEASARC we found that the xxx()=1 syntax itself defeated the optimizer and we
> had to work to address that.  So regardless of what we decide to put in ADQL,
> we should suggest a specific idiom that we will do our best to optimize.  But

I agree with this.  I have been working under the assumption that
the 1=CONTAINS(POINT,CIRCLE) idiom is the "right" or recommended
way to do it, but honestly I'm not sure where I got that from.
For context, my gripes on this topic from the Sydney interop
(possibly it's what started this discussion, though if anybody
else wants to claim responsibility it's fine by me) can be found
on page 8 of this presentation:

   http://wiki.ivoa.net/internal/IVOA/InteropOct2015DAL/tap-feedback.pdf

So what I'd like to see is: some sort of agreed and documented
(ADQL standard? IVOA Note?) recommended way to write performant
sky positional "crossmatches", i.e. to find objects within a given
radius of other objects (yes I know this is a blunt instrument and
in many cases not good enough, but really it's a common requirement).
Such a 'standard' form of query would be used by TAP implementors
and ADQL writers alike, to take away the guesswork for both parties
about what to optimise.  And I'd like the syntax not to be horrible;
something like DISTANCE(ra1,dec1,ra2,dec2)<radius would be my preference.

> we must recognize that users may employ whatever functions we define in ways
> that are likely to be non-optimized.  And often that will be fine since the
> tables will be small enough or some other constraint will catch the eye of the
> optimizer.

Yes, that's fine too.

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/