ADQL XMATCH

Juan Gonzalez juan.gonzalez at sciops.esa.int
Wed Feb 10 14:50:00 CET 2016


Dear DAL WG,

In the context of the Gaia archive we are also evaluating this kind of 
extensions for integrating crossmatch in ADQL. Our view is that we shall 
concentrate the syntax extensions on what is common to all possible 
crossmatch computations, and leave the peculiarities of what is 
implemented on each different infrastructure to a set of different 
functions. It is very likely that we will have different crossmatch 
functionalities on each service, and ADQL syntax shall not constraint 
these of their future evolution. What we are seeing in this interesting 
discussion thread seems like the best prove of this.

Our current view goes in this direction (with create table statements, 
supported by services like ours with persistent storage):

CREATE TABLE g10_gal_tycho2_1s AS

   CROSSMATCH g10_gal as g WITH tycho2 as t
     ESTIMATOR spherical_distance (g, t) as distance
   ON distance < 1/3600

This is, to include:

-A set of ESTIMATOR clauses that define the likelihood estimators, etc. 
that provide accuracy estimations based in existing functions in that 
service DB.
-An ON clause that sets the cut criteria based in the 
estimators/measures computed by the previous clause.

Depending on the computational implementation the service does, some of 
this may be mandatory; for instance, if computation is based on a first 
positional cut, as in our service, one distance measure shall be 
mandatory. Implemented functions can be declared in the Registry record 
for the service, so that apps are aware of them.

We have been recently discussing this with Tamas, that has referred us 
to the work that you Alex and Laszlo have done as part of SkyQuery. Your 
service has tackled crossmatching syntax in the most comprehensive way, 
and I strongly believe in Alex statement that we shall think about how 
to adapt a powerful syntax to more simple use cases, and not the other 
way around.

This syntax may be extended as follows, incorporating some additional 
estimators as well as other ideas from SkyQuery:

CREATE TABLE g10_gal_tycho2_1s AS

   CROSSMATCH tycho2 as t WITH g10_gal as g

     ESTIMATOR spherical_distance(g, t) as distance,
              ABS(ABS(g.g_mag)-ABS(t.g_mag)) as mag_diff,
          bayesian_likelihood(g, t) as likelihood,
          position_estimate(g, t) as pos_est

   ON distance < 1/3600 AND likelihood > 0.5


Best Regards,
Juan


On 02/10/2016 10:14 AM, Mark Taylor wrote:
> On Tue, 9 Feb 2016, Tom McGlynn (NASA/GSFC Code 660.1) wrote:
>
>> If I understand it, the xmatch function proposed here will return a 1 or 0
>> based purely upon two positions and a radius.  Presumably the function returns
>> 1 if the two positions are within the specified radius of each other and 0
>> otherwise, but maybe something else has been discussed.  No other information
>> is used in the xmatch.
> yes.
>
>> In that context I would prefer to use four real variables so that coordinate
>> system is irrelevant. If there are functions that can create point objects
>> from coordinates or get the coordinates from point objects, then the two
>> approaches are equivalent in the functionality they provide to users, but
>> using reals is simpler to implement since we can simply assume that whatever
>> the coordinate system is, all four of the values are using the same one.
> agree.
>
>> However I think this overall approach is flawed.  If we want to create a
>> logical function then we should do that.  In our implementations of existing
>> geometry functions at the HEASARC we've found that Postgres query  optimizer
>> is confused by this idiom where we use a
>>       function() = integer
>> substitution for a logical value.  If you really want logical values, then I
>> think we should just implement xmatch that way.  Of course given that we've
>> already implemented other functions this way that's probably a boat that's
>> already sailed.
> A logical function certainly makes more sense here; 1=XMATCH(..) is
> clunky and unintuitive.  However, as I understand it, there is no
> logical type defined in ADQL, so it's not possible to define a new
> function like that without significant changes to the ADQL syntax.
>
>> However if the xmatch function is doing what I indicated above, I think the
>> whole function is superfluous.
>>
>> Rather than
>>      (xmatch(ra1,dec1,ra2,dec2,rad) = 1)
>> or if we use a logical value
>>     (xmatch(ra1,dec1,ra2,dec2,rad))
>>
>> it seems far more natural to use
>>      (distance(ra1,dec1,ra2,dec2) < rad)
>>
>> This is clear and easily implemented.  In our experience it can be translated
>> into functions that can take advantage of spatial indices.
>  From a user point of view, I think that would be absolutely fine;
> in fact as you say better than a dedicated XMATCH function
> because it's more transparent and more flexible.  I was under
> the impression that constraints written like that were difficult
> for TAP implementors to use in a way that led to an efficient
> crossmatch, and that the 1=CONTAINS(POINT,CIRCLE) business was
> the recommended way to specify a performant crossmatch in ADQL.
> However, I don't know anywhere that's written down in a standard,
> and maybe I'm just wrong about it.  I'm not at all knowledgeable
> about the DB end of this, so I'm largely in the dark about what
> makes sense here from an implementation point of view.
>
> My interest is that I want to be able to write example ADQL
> queries and provide documentation to my ADQL-using users that
> tell them how to perform a spatial crossmatch on the sky,
> without too much ugly syntax.
>
> Mark
>
> --
> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/


-- 
Juan Gonzalez			    juan.gonzalez at sciops.esa.int
ESAC Science Archive Team
European Space Agency (ESA) - SERCO

European Space Astronomy Centre (ESAC)
28691 Villanueva de la Cañada		   Tel: +34 91 813 14 82
P.O. Box 78, Madrid, SPAIN		   Fax: +34 91 813 13 22
----------------------------------------------------------------


This message and any attachments are intended for the use of the addressee or addressees only.
The unauthorised disclosure, use, dissemination or copying (either in whole or in part) of its
content is not permitted.
If you received this message in error, please notify the sender and delete it from your system.
Emails can be altered and their integrity cannot be guaranteed by the sender.

Please consider the environment before printing this email.



More information about the dal mailing list