ADQL XMATCH

Alex Szalay szalay at jhu.edu
Tue Feb 9 20:55:33 CET 2016


I agree with Tom, that this approach is completely flawed.
The notion of a spatial crossmatch is not a logical value, but it is a posterior likelihood, or at
best a Bayes factor which can (and should) be combined with additional attributes like shape, flux, color etc.

Tis would give a misguided suggestion to the user that we know what a true crossmatch is.

I most strongly oppose this.

--Alex


-----Original Message-----
From: dal-bounces at ivoa.net [mailto:dal-bounces at ivoa.net] On Behalf Of Tom McGlynn (NASA/GSFC Code 660.1)
Sent: Tuesday, February 09, 2016 12:50 PM
To: <dal at ivoa.net> <dal at ivoa.net>
Subject: Re: ADQL XMATCH

First I'd like to make sure I understand the context since I was not involved in the eariler discussions, then I'll try to respond to the specific question, and then I'll expand the discussion a little:

If I understand it, the xmatch function proposed here will return a 1 or
0 based purely upon two positions and a radius.  Presumably the function returns 1 if the two positions are within the specified radius of each other and 0 otherwise, but maybe something else has been discussed.  No other information is used in the xmatch.

In that context I would prefer to use four real variables so that coordinate system is irrelevant. If there are functions that can create point objects from coordinates or get the coordinates from point objects, then the two approaches are equivalent in the functionality they provide to users, but using reals is simpler to implement since we can simply assume that whatever the coordinate system is, all four of the values are using the same one.

However I think this overall approach is flawed.  If we want to create a logical function then we should do that.  In our implementations of existing geometry functions at the HEASARC we've found that Postgres query  optimizer is confused by this idiom where we use a
      function() = integer
substitution for a logical value.  If you really want logical values, then I think we should just implement xmatch that way.  Of course given that we've already implemented other functions this way that's probably a boat that's already sailed.

However if the xmatch function is doing what I indicated above, I think the whole function is superfluous.

Rather than
     (xmatch(ra1,dec1,ra2,dec2,rad) = 1) or if we use a logical value
    (xmatch(ra1,dec1,ra2,dec2,rad))

it seems far more natural to use
     (distance(ra1,dec1,ra2,dec2) < rad)

This is clear and easily implemented.  In our experience it can be translated into functions that can take advantage of spatial indices.
If users want to deal with radii from two separate tables, per Arnold, they can write the constraint as
   (distance(t1.ra,t1.dec, t2.ra,t2.dec) <( t1_rad + t2_rad)) and the dependence of both tables' radius limits are shown.  Using the distance function is easy to turn into a binary decision, but it can also be used if we are don't want to make a binary choice but work in a probabilistic space.

E.g., maybe we want to use
    exp(-distance(t1.ra,t1.dec,t2.ra,t2.dec)/(t1_rad+t2_rad))  *
exp(-abs(t1.mag-t2.mag)/mag_err) *  exp(-abs(t1.size-t2.size)) as a probabalistic criterion for the match.


Personally I have nothing against using more some more complex function 
allowing for elliptical errors and such but these would still be easier 
to understand and more powerful to use if they returned a real number 
rather than performing the logical comparison that the
user can easily add themselves if that's what they want.

The case for a special xmatch in the old SkyServer days was that xmatch 
was a join between rows of tables and it could use columns not 
explicitly referenced in the xmatch function in determining the quality 
of the match.  I thought then, and still do, that this kind of usage
of implicit columns is bad practice -- it's completely against the 
standard usages of SQL.  ADQL shares it in REGION's but I would prefer 
to see it expunged.

     Tom


Marco Molinaro wrote:
> Dear DAL members and ADQL fans,
> to go on with the ADQL-2.1 working draft
> one issue is left, from Sydney interop,
> to be discussed.
>
> In the DAL splinter at the interop
> it was agreed to add an XMATCH function
> of binary type and definition
>
> 1 = XMATCH(a,b,radius)
>
> However no agreement was reached about
> the 'a' and 'b' parameters, whether they
> should be points (ADQL:POINT) or RA&Dec
> couples (floating point values).
>
> Both choices have advantages and disadvantages.
> Points are more into the logic
> of a sky cross-match but require geometric
> types to be directly available to the DB.
> Coordinates couples are directly available
> in whatever DB and would also let the XMATCH
> function work for non-orthodox coordinates
> matching, but of course loosing the sky matching
> logic.
>
> As I said (also due to time constraints) no
> agreement was found in Sydney.
>
> What's your opinion on this, and why?
>
> Cheers,
>     Marco



More information about the dal mailing list