ADQL 2.1: Preferred crossmatch syntax

Francois-Xavier Pineau francois-xavier.pineau at astro.unistra.fr
Tue Nov 7 14:06:05 CET 2017


Laszlo, DAL,

Before a more complete answer, just one question: with your notation, is 
it possible to ask for all sources of table1 matching or not the result 
of a cross-match between table 2 and table t3?

I.e. it is possible to ask for

t1 LEFT JOIN (t2 INNER JOIN t3) or, equivalently t2 INNER JOIN t3 RIGHT 
OUTER JOIN t3

e.g. using a sub-crossmatch in a IN without having to create a 
(temporary) table?


fx


Le 04/11/2017 à 19:30, Dobos, Laszlo a écrit :
> Hi guys,
>
> I'm not on the DAL list but Tamas Budavari forwarded the mail to me so let
> me chip in. In the new skyquery, we put the cross-match definition into the
> FROM clause but not as a list of joined tables but rather something like a
> table-valued function. Putting it into the FROM clause is much better than
> in the WHERE clause, not because it's much easier to parse but because if
> it's in the WHERE clause, it's quite hard to sort the other conditions out
> if there's complex boolean algebra allowed in the clause. If xmatching has
> some special, custom syntax, then the possibility of writing queries that
> cannot be executed can be minimized. Also, putting it into the xmatch clause
> makes query rewriting much easier because you simply replace the xmatch part
> with the output of the xmatch operation and leave all other joins in place
> to execute the final query.
>
> Here's an example:
>
> SELECT s.objid, s.ra, s.dec, g.objid, g.ra, g.dec, x.ra, x.dec
> INTO twowayxmatch
> FROM XMATCH
>       (MUST EXIST IN SDSSDR7:PhotoObjAll AS s WITH(POINT(s.ra, s.dec),
> ERROR(0.1, 0.1, 0.1)),
>        MUST EXIST IN GALEXGR6:PhotoObjAll AS g WITH(POINT(g.ra, g.dec),
> ERROR(0.2, 0.2, 0.2)),
>        LIMIT BAYESFACTOR TO 1e3) AS x
> WHERE s.ra BETWEEN 0 AND 5 AND s.dec BETWEEN 0 AND 5
>      AND g.ra BETWEEN 0 AND 5 AND g.dec BETWEEN 0 AND 5
>
> The MUST EXIST will soon be extended to MAY EXIST and MUST NOT EXIST
> (drop-outs, but for those, you need the footprints). This syntax is quite
> extensible, because you can add more catalog-specific options to the WITH()
> part. For instance, we could support extended source matching by replacing
> POINT with ELLIPSE, etc. LIMIT BAYESFACTOR could be replaced with LIMIT
> SEPARATION etc. to use a different algorithm.
>
> We also support region-based filtering with the REGION clause. This is
> intentionally not on defined on a per-table basis to avoid query plan
> complexities.
>
> SELECT s.objid, s.ra, s.dec, g.objid, g.ra, g.dec, x.ra, x.dec
> INTO twowayxmatch
> FROM XMATCH
>       (MUST EXIST IN SDSSDR7:PhotoObjAll AS s WITH(POINT(s.ra, s.dec),
> ERROR(0.1, 0.1, 0.1)),
>        MUST EXIST IN GALEXGR6:PhotoObjAll AS g WITH(POINT(g.ra, g.dec),
> ERROR(0.2, 0.2, 0.2)),
>        LIMIT BAYESFACTOR TO 1e3) AS x
> REGION 'CIRCLE J2000 0 0 5'
>
> You can try it here: http://test.voservices.net/skyquery . It doesn't
> support adql yet but I'm working on that.
>
> -Laszlo
>
> -----Original Message-----
> From: dal-bounces at ivoa.net [mailto:dal-bounces at ivoa.net] On Behalf Of Mark
> Taylor
> Sent: Friday, November 3, 2017 5:27 PM
> To: Francois-Xavier PINEAU <francois-xavier.pineau at astro.unistra.fr>
> Cc: dal at ivoa.net
> Subject: Re: ADQL 2.1: Preferred crossmatch syntax
>
>
> On Fri, 3 Nov 2017, Francois-Xavier PINEAU wrote:
>
>> Dear Mark and DAL,
>>
>> Although I agree with the approach, there is one point that bothers me.
>>
>> I know that not everyone agrees on the matter, but I personally prefer
>> to declare the cross-match condition in the JOIN (like in your
>> InteropOct2015DAL
>> presentation) rather than in the WHERE.
>>
>> In fact, in relational algebra, I prefer to see the cross-match as a
>> theta-join, not as a selection on the result of a cross-product.
> OK, I don't really have an opinion on that, I never really know about what
> syntax is best for joins, or whether it makes a difference.
> [And I'm afraid I wouldn't know a theta-join if one trod on my foot]
>
> Is it likely to make a difference in performance terms which of these two
> ways it's done?  I.e. do we need to specify here what's the preferred way to
> use a constraint like DISTANCE()<threshold, or is this something we can
> harmlessly leave to the taste of the query author?
>
> Mark
>
> --
> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/



More information about the dal mailing list