ADQL XMATCH

Grégory Mantelet gmantele at ari.uni-heidelberg.de
Wed Feb 10 14:41:12 CET 2016


Hi DAL,

> I see multiple benefits to use
>
>     distance(ra1,dec1,ra2,dec2) < something
>
> 1) That is intuitive for users
> 2) The operation actually done is unambiguous. That wouldn't be the 
> case with a function named Xmatch for instance since doing a cross 
> match can refer to a large panel of algorithms or processing
> 3) It gets rid of the pseudo boolean operator.
> 4) It is quite flexible: no constraint on both operator and operand 
> (e.g. < something, > somethingelse, = function(anything)... are valid)
> 5) One can also point out that this function could express a simple 
> cone search in a readable form
> e.g. distance(ra1,dec1,12.7,-13.8) < 10


     I agree with all these points and particularly the first one: it's 
clearly more intuitive for users.

     However, this kind of expression is performed by a sequential scan 
in the database. As far as I know, there is no way to index or optimize 
such constraint in a database (but I may be wrong so correct me if 
needed). On the contrary, "contains(point, circle)" can use an index 
(using PgSphere+Postgres for instance). So, I agree, it is ugly, but it 
is more efficient.

     Then, maybe it is also possible to use some trick like detecting 
"distance(ra1,dec1,ra2,dec2) < something" inside the ADQL query and 
translate it into the equivalent of "contains(point,circle)" in 
SQL....but it is really a ugly trick and may not be so trivial to implement.

     All that said, I would rather prefer the syntax: "xmatch(ra1, dec1, 
ra2, dec2, radius)" or "xmatch(point1, point2, radius)". They can be 
easily translated into the equivalent of "contains(point,circle)" in SQL 
(i.e. "spoint @ scircle" for those who know PgSphere).

     Though, considering all the discussions there were about having a 
"better" crossmatch function, maybe the name of this function should be 
revised so that making a clear difference between this 
"cheap"-crossmatch and the "proper"-crossmatch (which may come in the 
future). But sorry I do not have any suggestion...

     About what this cheap "xmatch(...)" function should return, I also 
agree that a boolean value would be much better/intuitive...but as Mark 
said, this type does not exist in ADQL and adding it implies several 
changes in the ADQL language and would not be something to do in a minor 
revision of the standard.
     Then, whether it should return an integer (0 or 1) or a floating 
number/double for a likelihood, I also think that returning a likelihood 
make more sense. However, seeing that from a database point of view, it 
means either adapting the used database extension (at least PgSphere for 
Postgres) or creating a new function. So, I am not at all against the 
idea of returning a likelihood, but it should be kept in mind that this 
small change may imply some delay in the implementations to support it.

     About the parameters - "point" or "ra, dec" -, I would go for "ra, 
dec" because coordinates are always (?) provided in astronomical 
databases. So if we choose to have parameters of type POINT, it will 
force the user to always create a POINT as it is done for CONTAINS and 
INTERSECTS....it is a bit painful for the user in addition of being 
source of syntactical error. So: "xmatch(ra1, dec1, ra2, dec2, radius)" 
is my favorite function signature (except the name that could be 
changed, as said above).


> But: I think there is a conflict with the DISTANCE(POINT,POINT) 
> function since ADQL functions can not be overloaded. Is that right?


As a User Defined Function, maybe not since DISTANCE is a reserved 
keyword. But if this function is defined by the ADQL language, it should 
not be a problem, according to me, because the type and the number of 
parameters is not the same.

Cheers,
Grégory

PS: For reminder:
     - "distance(ra1, dec1, ra2, dec2)" is equivalent to 
"distance(point('', ra1,dec1), point('', ra2, dec2))"....it is more 
verbose and painful to write, but it is exactly the same function.
     - similarly, ADQL provides functions to get the coordinates of a 
point: coord1(point) and coord2(point) (and also coordsys(point) for 
fans of coordinate systems in ADQL).


> Le 10/02/2016 10:14, Mark Taylor a écrit :
>> On Tue, 9 Feb 2016, Tom McGlynn (NASA/GSFC Code 660.1) wrote:
>>
>>> If I understand it, the xmatch function proposed here will return a 
>>> 1 or 0
>>> based purely upon two positions and a radius.  Presumably the 
>>> function returns
>>> 1 if the two positions are within the specified radius of each other 
>>> and 0
>>> otherwise, but maybe something else has been discussed.  No other 
>>> information
>>> is used in the xmatch.
>>
>> yes.
>>
>>> In that context I would prefer to use four real variables so that 
>>> coordinate
>>> system is irrelevant. If there are functions that can create point 
>>> objects
>>> from coordinates or get the coordinates from point objects, then the 
>>> two
>>> approaches are equivalent in the functionality they provide to 
>>> users, but
>>> using reals is simpler to implement since we can simply assume that 
>>> whatever
>>> the coordinate system is, all four of the values are using the same 
>>> one.
> I'm a bit disturbed by this sentence: At the level of the language 
> definition, we can not "assume that all four values are using the same 
> coordinate system". Making this assumption true is the responsibility 
> of the query author. The distance() function must be  CooSys neutral 
> in a sense of it just computes the distance between the 2 points given 
> by the parameters and without consideration to their frames.
> In the case of matching 2 tables with different frames, the ADQL 
> distance(POINT, POINT)  should be used indeed.
>
>
> Cheers
> Laurent
>>
>> agree.
>>
>>> However I think this overall approach is flawed.  If we want to 
>>> create a
>>> logical function then we should do that.  In our implementations of 
>>> existing
>>> geometry functions at the HEASARC we've found that Postgres query  
>>> optimizer
>>> is confused by this idiom where we use a
>>>       function() = integer
>>> substitution for a logical value.  If you really want logical 
>>> values, then I
>>> think we should just implement xmatch that way.  Of course given 
>>> that we've
>>> already implemented other functions this way that's probably a boat 
>>> that's
>>> already sailed.
>>
>> A logical function certainly makes more sense here; 1=XMATCH(..) is
>> clunky and unintuitive.  However, as I understand it, there is no
>> logical type defined in ADQL, so it's not possible to define a new
>> function like that without significant changes to the ADQL syntax.
>>
>>> However if the xmatch function is doing what I indicated above, I 
>>> think the
>>> whole function is superfluous.
>>>
>>> Rather than
>>>      (xmatch(ra1,dec1,ra2,dec2,rad) = 1)
>>> or if we use a logical value
>>>     (xmatch(ra1,dec1,ra2,dec2,rad))
>>>
>>> it seems far more natural to use
>>>      (distance(ra1,dec1,ra2,dec2) < rad)
>>>
>>> This is clear and easily implemented.  In our experience it can be 
>>> translated
>>> into functions that can take advantage of spatial indices.
>>
>>  From a user point of view, I think that would be absolutely fine;
>> in fact as you say better than a dedicated XMATCH function
>> because it's more transparent and more flexible.  I was under
>> the impression that constraints written like that were difficult
>> for TAP implementors to use in a way that led to an efficient
>> crossmatch, and that the 1=CONTAINS(POINT,CIRCLE) business was
>> the recommended way to specify a performant crossmatch in ADQL.
>> However, I don't know anywhere that's written down in a standard,
>> and maybe I'm just wrong about it.  I'm not at all knowledgeable
>> about the DB end of this, so I'm largely in the dark about what
>> makes sense here from an implementation point of view.
>>
>> My interest is that I want to be able to write example ADQL
>> queries and provide documentation to my ADQL-using users that
>> tell them how to perform a spatial crossmatch on the sky,
>> without too much ugly syntax.
>>
>> Mark
>>
>> -- 
>> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
>> m.b.taylor at bris.ac.uk +44-117-9288776 http://www.star.bris.ac.uk/~mbt/
>>
>



More information about the dal mailing list