ADQL XMATCH
Tom McGlynn (NASA/GSFC Code 660.1)
tom.mcglynn at nasa.gov
Wed Feb 10 18:21:54 CET 2016
Two thoughts: one minor and one more significant....
I think the issue of whether we use Points or coordinates is a
relatively minor one. Personally I'd think that in a sane world one
would implement both. Since one can be transformed to the other with
essentially a single line of code giving users both to match what
they want to do makes sense to me. The time we've spent on this debate
is probably greater than the time it would take to do this for a typical
implementation. E.g. in psusdocode we can implement the point version
on top of the coordinate version as
function xmatch(Point a, Point b, radius) {
return xmatch(a.getCoordinate(0), a.getCoordinate(1),
b.getCoordinate(0), b.getCoordinate(1), radius)
}
or if we want to make the Point one more fundamental
function xmatch(double ra1, double dec1, double ra2, double dec2,
radius) {
return xmatch(new Point(ra1,dec1), new Point(ra2, dec2), radius);
}
Not sure why a rule that one couldn't overload methods was promulgated.
Given that this changes not just the type but the number of arguments,
supporting these overloads should be easy, but if absolutely necessary
one could have slightly different names, e.g.,
distance(Point,Point) and distanceC(double,double,double,double).
More significantly: In deciding what functions to provide, it seems like
we should be primarily be designing ADQL to support our astronomical use
cases. Regardless of what we choose it will be easy to build queries
which will fail to use indices optimally. This is true regardless of
Point/coordinate distinction or use of the xmatch or whatever. E.g., in
my tests of q3c and pgsphere apparently trivial changes in the query
could determine whether the indexes git used efficiently. I suspect
that the same will be true in other Postgres libraries and in
non-Postgres databases.
Walter gave a talk in Sydney describing elements of the geometry that
IRSA tables would be able to support and I think we need to build upon
that so that we have a recommended syntax for use in joining tables that
we all endeavor to support. I believe it is premature to base the ADQL
standard upon our preconceptions about what is easy or hard for the
query optimizer to support. The query optimizer is not our customer.
E.g., as I've mentioned at the HEASARC we found that the xxx()=1 syntax
itself defeated the optimizer and we had to work to address that. So
regardless of what we decide to put in ADQL, we should suggest a
specific idiom that we will do our best to optimize. But we must
recognize that users may employ whatever functions we define in ways
that are likely to be non-optimized. And often that will be fine since
the tables will be small enough or some other constraint will catch the
eye of the optimizer.
Walter and Theresa's notes make it clear that we're doing looking inside
the query and adapting it to our specific implementations. I suspect
that at some level all of us are doing that and will continue to do so.
Tom
Theresa Dower wrote:
> While the issue of distance() being overloaded in ADQL remains, I wanted to note that for our use of SQL Server at STScI, we do enough query rewriting already that a translation from [something like] distance(....) would be basically the same work as we already have to do with contains().
>
> I echo Alex's concern about calling something simple 'xmatch' when it isn't, or effectively putting the burden of a better crossmatch implementation on service providers. Something like distance(...) would be more honest, though I have no suggestion for quite what to call it without overloading that function.
>
> --Theresa
>
> -----Original Message-----
> From: dal-bounces at ivoa.net [mailto:dal-bounces at ivoa.net] On Behalf Of Walter Landry
> Sent: Wednesday, February 10, 2016 9:46 AM
> To: dal at ivoa.net
> Subject: Re: ADQL XMATCH
>
> Grégory Mantelet <gmantele at ari.uni-heidelberg.de> wrote:
>> However, this kind of expression is performed by a sequential scan in
>> the database. As far as I know, there is no way to index or optimize
>> such constraint in a database (but I may be wrong so correct me if
>> needed). On the contrary, "contains(point, circle)" can use an index
>> (using PgSphere+Postgres for instance). So, I agree, it is ugly, but
>> it is more efficient.
>>
>> Then, maybe it is also possible to use some trick like detecting
>> "distance(ra1,dec1,ra2,dec2) < something" inside the ADQL query and
>> translate it into the equivalent of "contains(point,circle)" in
>> SQL....but it is really a ugly trick and may not be so trivial to
>> implement.
> In my parser (I can not speak for others), implementing this is just recognizing this pattern to be semantically the same as CONTAINS. It would be about the same amount of work as changing the parser to recognize XMATCH. So not very much at all. I think this is the easiest, most intuitive way forward, particularly with point literals.
>
> Cheers,
> Walter Landry
More information about the dal
mailing list