ADQL 2.1: Preferred crossmatch syntax

Patrick Dowler pdowler.cadc at gmail.com
Wed Nov 8 00:56:18 CET 2017


Laszlo,

I'm not a big fan of using DISTANCE for this, but I am in favour of
putting it in the JOIN rather than the WHERE clause. Can you explain
which exactly is causing the join order to be prescribed? I certainly
don't see an obvious way to translate this into efficient SQL that
uses an index, but it looks to me like the use of DISTANCE is making
that hard, not the location in the JOIN clause.

Pat

On 7 November 2017 at 05:06, Francois-Xavier Pineau
<francois-xavier.pineau at astro.unistra.fr> wrote:
> Laszlo, DAL,
>
> Before a more complete answer, just one question: with your notation, is it
> possible to ask for all sources of table1 matching or not the result of a
> cross-match between table 2 and table t3?
>
> I.e. it is possible to ask for
>
> t1 LEFT JOIN (t2 INNER JOIN t3) or, equivalently t2 INNER JOIN t3 RIGHT
> OUTER JOIN t3
>
> e.g. using a sub-crossmatch in a IN without having to create a (temporary)
> table?
>
>
> fx
>
>
>
> Le 04/11/2017 à 19:30, Dobos, Laszlo a écrit :
>>
>> Hi guys,
>>
>> I'm not on the DAL list but Tamas Budavari forwarded the mail to me so let
>> me chip in. In the new skyquery, we put the cross-match definition into
>> the
>> FROM clause but not as a list of joined tables but rather something like a
>> table-valued function. Putting it into the FROM clause is much better than
>> in the WHERE clause, not because it's much easier to parse but because if
>> it's in the WHERE clause, it's quite hard to sort the other conditions out
>> if there's complex boolean algebra allowed in the clause. If xmatching has
>> some special, custom syntax, then the possibility of writing queries that
>> cannot be executed can be minimized. Also, putting it into the xmatch
>> clause
>> makes query rewriting much easier because you simply replace the xmatch
>> part
>> with the output of the xmatch operation and leave all other joins in place
>> to execute the final query.
>>
>> Here's an example:
>>
>> SELECT s.objid, s.ra, s.dec, g.objid, g.ra, g.dec, x.ra, x.dec
>> INTO twowayxmatch
>> FROM XMATCH
>>       (MUST EXIST IN SDSSDR7:PhotoObjAll AS s WITH(POINT(s.ra, s.dec),
>> ERROR(0.1, 0.1, 0.1)),
>>        MUST EXIST IN GALEXGR6:PhotoObjAll AS g WITH(POINT(g.ra, g.dec),
>> ERROR(0.2, 0.2, 0.2)),
>>        LIMIT BAYESFACTOR TO 1e3) AS x
>> WHERE s.ra BETWEEN 0 AND 5 AND s.dec BETWEEN 0 AND 5
>>      AND g.ra BETWEEN 0 AND 5 AND g.dec BETWEEN 0 AND 5
>>
>> The MUST EXIST will soon be extended to MAY EXIST and MUST NOT EXIST
>> (drop-outs, but for those, you need the footprints). This syntax is quite
>> extensible, because you can add more catalog-specific options to the
>> WITH()
>> part. For instance, we could support extended source matching by replacing
>> POINT with ELLIPSE, etc. LIMIT BAYESFACTOR could be replaced with LIMIT
>> SEPARATION etc. to use a different algorithm.
>>
>> We also support region-based filtering with the REGION clause. This is
>> intentionally not on defined on a per-table basis to avoid query plan
>> complexities.
>>
>> SELECT s.objid, s.ra, s.dec, g.objid, g.ra, g.dec, x.ra, x.dec
>> INTO twowayxmatch
>> FROM XMATCH
>>       (MUST EXIST IN SDSSDR7:PhotoObjAll AS s WITH(POINT(s.ra, s.dec),
>> ERROR(0.1, 0.1, 0.1)),
>>        MUST EXIST IN GALEXGR6:PhotoObjAll AS g WITH(POINT(g.ra, g.dec),
>> ERROR(0.2, 0.2, 0.2)),
>>        LIMIT BAYESFACTOR TO 1e3) AS x
>> REGION 'CIRCLE J2000 0 0 5'
>>
>> You can try it here: http://test.voservices.net/skyquery . It doesn't
>> support adql yet but I'm working on that.
>>
>> -Laszlo
>>
>> -----Original Message-----
>> From: dal-bounces at ivoa.net [mailto:dal-bounces at ivoa.net] On Behalf Of Mark
>> Taylor
>> Sent: Friday, November 3, 2017 5:27 PM
>> To: Francois-Xavier PINEAU <francois-xavier.pineau at astro.unistra.fr>
>> Cc: dal at ivoa.net
>> Subject: Re: ADQL 2.1: Preferred crossmatch syntax
>>
>>
>> On Fri, 3 Nov 2017, Francois-Xavier PINEAU wrote:
>>
>>> Dear Mark and DAL,
>>>
>>> Although I agree with the approach, there is one point that bothers me.
>>>
>>> I know that not everyone agrees on the matter, but I personally prefer
>>> to declare the cross-match condition in the JOIN (like in your
>>> InteropOct2015DAL
>>> presentation) rather than in the WHERE.
>>>
>>> In fact, in relational algebra, I prefer to see the cross-match as a
>>> theta-join, not as a selection on the result of a cross-product.
>>
>> OK, I don't really have an opinion on that, I never really know about what
>> syntax is best for joins, or whether it makes a difference.
>> [And I'm afraid I wouldn't know a theta-join if one trod on my foot]
>>
>> Is it likely to make a difference in performance terms which of these two
>> ways it's done?  I.e. do we need to specify here what's the preferred way
>> to
>> use a constraint like DISTANCE()<threshold, or is this something we can
>> harmlessly leave to the taste of the query author?
>>
>> Mark
>>
>> --
>> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
>> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
>
>



-- 
Patrick Dowler
Canadian Astronomy Data Centre
Victoria, BC, Canada


More information about the dal mailing list