ADQL XMATCH

Mark Taylor M.B.Taylor at bristol.ac.uk
Wed Feb 10 10:14:34 CET 2016


On Tue, 9 Feb 2016, Tom McGlynn (NASA/GSFC Code 660.1) wrote:

> If I understand it, the xmatch function proposed here will return a 1 or 0
> based purely upon two positions and a radius.  Presumably the function returns
> 1 if the two positions are within the specified radius of each other and 0
> otherwise, but maybe something else has been discussed.  No other information
> is used in the xmatch.

yes.

> In that context I would prefer to use four real variables so that coordinate
> system is irrelevant. If there are functions that can create point objects
> from coordinates or get the coordinates from point objects, then the two
> approaches are equivalent in the functionality they provide to users, but
> using reals is simpler to implement since we can simply assume that whatever
> the coordinate system is, all four of the values are using the same one.

agree.

> However I think this overall approach is flawed.  If we want to create a
> logical function then we should do that.  In our implementations of existing
> geometry functions at the HEASARC we've found that Postgres query  optimizer
> is confused by this idiom where we use a
>      function() = integer
> substitution for a logical value.  If you really want logical values, then I
> think we should just implement xmatch that way.  Of course given that we've
> already implemented other functions this way that's probably a boat that's
> already sailed.

A logical function certainly makes more sense here; 1=XMATCH(..) is
clunky and unintuitive.  However, as I understand it, there is no
logical type defined in ADQL, so it's not possible to define a new
function like that without significant changes to the ADQL syntax.

> However if the xmatch function is doing what I indicated above, I think the
> whole function is superfluous.
> 
> Rather than
>     (xmatch(ra1,dec1,ra2,dec2,rad) = 1)
> or if we use a logical value
>    (xmatch(ra1,dec1,ra2,dec2,rad))
> 
> it seems far more natural to use
>     (distance(ra1,dec1,ra2,dec2) < rad)
> 
> This is clear and easily implemented.  In our experience it can be translated
> into functions that can take advantage of spatial indices.

>From a user point of view, I think that would be absolutely fine;
in fact as you say better than a dedicated XMATCH function
because it's more transparent and more flexible.  I was under
the impression that constraints written like that were difficult
for TAP implementors to use in a way that led to an efficient
crossmatch, and that the 1=CONTAINS(POINT,CIRCLE) business was
the recommended way to specify a performant crossmatch in ADQL.
However, I don't know anywhere that's written down in a standard,
and maybe I'm just wrong about it.  I'm not at all knowledgeable
about the DB end of this, so I'm largely in the dark about what
makes sense here from an implementation point of view.

My interest is that I want to be able to write example ADQL
queries and provide documentation to my ADQL-using users that
tell them how to perform a spatial crossmatch on the sky,
without too much ugly syntax.

Mark

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/


More information about the dal mailing list