ADQL XMATCH
Tom McGlynn (NASA/GSFC Code 660.1)
tom.mcglynn at nasa.gov
Tue Feb 9 18:50:08 CET 2016
First I'd like to make sure I understand the context since I was not
involved in the eariler discussions, then I'll try to respond to the
specific question, and then I'll expand the discussion a little:
If I understand it, the xmatch function proposed here will return a 1 or
0 based purely upon two positions and a radius. Presumably the function
returns 1 if the two positions are within the specified radius of each
other and 0 otherwise, but maybe something else has been discussed. No
other information is used in the xmatch.
In that context I would prefer to use four real variables so that
coordinate system is irrelevant. If there are functions that can create
point objects from coordinates or get the coordinates from point
objects, then the two approaches are equivalent in the functionality
they provide to users, but using reals is simpler to implement since we
can simply assume that whatever the coordinate system is, all four of
the values are using the same one.
However I think this overall approach is flawed. If we want to create a
logical function then we should do that. In our implementations of
existing geometry functions at the HEASARC we've found that Postgres
query optimizer is confused by this idiom where we use a
function() = integer
substitution for a logical value. If you really want logical values,
then I think we should just implement xmatch that way. Of course given
that we've already implemented other functions this way that's probably
a boat that's already sailed.
However if the xmatch function is doing what I indicated above, I think
the whole function is superfluous.
Rather than
(xmatch(ra1,dec1,ra2,dec2,rad) = 1)
or if we use a logical value
(xmatch(ra1,dec1,ra2,dec2,rad))
it seems far more natural to use
(distance(ra1,dec1,ra2,dec2) < rad)
This is clear and easily implemented. In our experience it can be
translated into functions that can take advantage of spatial indices.
If users want to deal with radii from two separate tables, per Arnold,
they can write the constraint as
(distance(t1.ra,t1.dec, t2.ra,t2.dec) <( t1_rad + t2_rad))
and the dependence of both tables' radius limits are shown. Using the
distance function is easy to turn into a binary decision, but it can
also be used if we are don't want to make a binary choice but work in a
probabilistic space.
E.g., maybe we want to use
exp(-distance(t1.ra,t1.dec,t2.ra,t2.dec)/(t1_rad+t2_rad)) *
exp(-abs(t1.mag-t2.mag)/mag_err) * exp(-abs(t1.size-t2.size))
as a probabalistic criterion for the match.
Personally I have nothing against using more some more complex function
allowing for elliptical errors and such but these would still be easier
to understand and more powerful to use if they returned a real number
rather than performing the logical comparison that the
user can easily add themselves if that's what they want.
The case for a special xmatch in the old SkyServer days was that xmatch
was a join between rows of tables and it could use columns not
explicitly referenced in the xmatch function in determining the quality
of the match. I thought then, and still do, that this kind of usage
of implicit columns is bad practice -- it's completely against the
standard usages of SQL. ADQL shares it in REGION's but I would prefer
to see it expunged.
Tom
Marco Molinaro wrote:
> Dear DAL members and ADQL fans,
> to go on with the ADQL-2.1 working draft
> one issue is left, from Sydney interop,
> to be discussed.
>
> In the DAL splinter at the interop
> it was agreed to add an XMATCH function
> of binary type and definition
>
> 1 = XMATCH(a,b,radius)
>
> However no agreement was reached about
> the 'a' and 'b' parameters, whether they
> should be points (ADQL:POINT) or RA&Dec
> couples (floating point values).
>
> Both choices have advantages and disadvantages.
> Points are more into the logic
> of a sky cross-match but require geometric
> types to be directly available to the DB.
> Coordinates couples are directly available
> in whatever DB and would also let the XMATCH
> function work for non-orthodox coordinates
> matching, but of course loosing the sky matching
> logic.
>
> As I said (also due to time constraints) no
> agreement was found in Sydney.
>
> What's your opinion on this, and why?
>
> Cheers,
> Marco
More information about the dal
mailing list