ADQL XMATCH
Juan Gonzalez
juan.gonzalez at sciops.esa.int
Wed Feb 10 14:50:00 CET 2016
Dear DAL WG,
In the context of the Gaia archive we are also evaluating this kind of
extensions for integrating crossmatch in ADQL. Our view is that we shall
concentrate the syntax extensions on what is common to all possible
crossmatch computations, and leave the peculiarities of what is
implemented on each different infrastructure to a set of different
functions. It is very likely that we will have different crossmatch
functionalities on each service, and ADQL syntax shall not constraint
these of their future evolution. What we are seeing in this interesting
discussion thread seems like the best prove of this.
Our current view goes in this direction (with create table statements,
supported by services like ours with persistent storage):
CREATE TABLE g10_gal_tycho2_1s AS
CROSSMATCH g10_gal as g WITH tycho2 as t
ESTIMATOR spherical_distance (g, t) as distance
ON distance < 1/3600
This is, to include:
-A set of ESTIMATOR clauses that define the likelihood estimators, etc.
that provide accuracy estimations based in existing functions in that
service DB.
-An ON clause that sets the cut criteria based in the
estimators/measures computed by the previous clause.
Depending on the computational implementation the service does, some of
this may be mandatory; for instance, if computation is based on a first
positional cut, as in our service, one distance measure shall be
mandatory. Implemented functions can be declared in the Registry record
for the service, so that apps are aware of them.
We have been recently discussing this with Tamas, that has referred us
to the work that you Alex and Laszlo have done as part of SkyQuery. Your
service has tackled crossmatching syntax in the most comprehensive way,
and I strongly believe in Alex statement that we shall think about how
to adapt a powerful syntax to more simple use cases, and not the other
way around.
This syntax may be extended as follows, incorporating some additional
estimators as well as other ideas from SkyQuery:
CREATE TABLE g10_gal_tycho2_1s AS
CROSSMATCH tycho2 as t WITH g10_gal as g
ESTIMATOR spherical_distance(g, t) as distance,
ABS(ABS(g.g_mag)-ABS(t.g_mag)) as mag_diff,
bayesian_likelihood(g, t) as likelihood,
position_estimate(g, t) as pos_est
ON distance < 1/3600 AND likelihood > 0.5
Best Regards,
Juan
On 02/10/2016 10:14 AM, Mark Taylor wrote:
> On Tue, 9 Feb 2016, Tom McGlynn (NASA/GSFC Code 660.1) wrote:
>
>> If I understand it, the xmatch function proposed here will return a 1 or 0
>> based purely upon two positions and a radius. Presumably the function returns
>> 1 if the two positions are within the specified radius of each other and 0
>> otherwise, but maybe something else has been discussed. No other information
>> is used in the xmatch.
> yes.
>
>> In that context I would prefer to use four real variables so that coordinate
>> system is irrelevant. If there are functions that can create point objects
>> from coordinates or get the coordinates from point objects, then the two
>> approaches are equivalent in the functionality they provide to users, but
>> using reals is simpler to implement since we can simply assume that whatever
>> the coordinate system is, all four of the values are using the same one.
> agree.
>
>> However I think this overall approach is flawed. If we want to create a
>> logical function then we should do that. In our implementations of existing
>> geometry functions at the HEASARC we've found that Postgres query optimizer
>> is confused by this idiom where we use a
>> function() = integer
>> substitution for a logical value. If you really want logical values, then I
>> think we should just implement xmatch that way. Of course given that we've
>> already implemented other functions this way that's probably a boat that's
>> already sailed.
> A logical function certainly makes more sense here; 1=XMATCH(..) is
> clunky and unintuitive. However, as I understand it, there is no
> logical type defined in ADQL, so it's not possible to define a new
> function like that without significant changes to the ADQL syntax.
>
>> However if the xmatch function is doing what I indicated above, I think the
>> whole function is superfluous.
>>
>> Rather than
>> (xmatch(ra1,dec1,ra2,dec2,rad) = 1)
>> or if we use a logical value
>> (xmatch(ra1,dec1,ra2,dec2,rad))
>>
>> it seems far more natural to use
>> (distance(ra1,dec1,ra2,dec2) < rad)
>>
>> This is clear and easily implemented. In our experience it can be translated
>> into functions that can take advantage of spatial indices.
> From a user point of view, I think that would be absolutely fine;
> in fact as you say better than a dedicated XMATCH function
> because it's more transparent and more flexible. I was under
> the impression that constraints written like that were difficult
> for TAP implementors to use in a way that led to an efficient
> crossmatch, and that the 1=CONTAINS(POINT,CIRCLE) business was
> the recommended way to specify a performant crossmatch in ADQL.
> However, I don't know anywhere that's written down in a standard,
> and maybe I'm just wrong about it. I'm not at all knowledgeable
> about the DB end of this, so I'm largely in the dark about what
> makes sense here from an implementation point of view.
>
> My interest is that I want to be able to write example ADQL
> queries and provide documentation to my ADQL-using users that
> tell them how to perform a spatial crossmatch on the sky,
> without too much ugly syntax.
>
> Mark
>
> --
> Mark Taylor Astronomical Programmer Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk +44-117-9288776 http://www.star.bris.ac.uk/~mbt/
--
Juan Gonzalez juan.gonzalez at sciops.esa.int
ESAC Science Archive Team
European Space Agency (ESA) - SERCO
European Space Astronomy Centre (ESAC)
28691 Villanueva de la Cañada Tel: +34 91 813 14 82
P.O. Box 78, Madrid, SPAIN Fax: +34 91 813 13 22
----------------------------------------------------------------
This message and any attachments are intended for the use of the addressee or addressees only.
The unauthorised disclosure, use, dissemination or copying (either in whole or in part) of its
content is not permitted.
If you received this message in error, please notify the sender and delete it from your system.
Emails can be altered and their integrity cannot be guaranteed by the sender.
Please consider the environment before printing this email.
More information about the dal
mailing list