ADQL 2.1: Preferred crossmatch syntax

Dobos, Laszlo dobos at complex.elte.hu
Sat Nov 4 19:30:20 CET 2017


Hi guys,

I'm not on the DAL list but Tamas Budavari forwarded the mail to me so let
me chip in. In the new skyquery, we put the cross-match definition into the
FROM clause but not as a list of joined tables but rather something like a
table-valued function. Putting it into the FROM clause is much better than
in the WHERE clause, not because it's much easier to parse but because if
it's in the WHERE clause, it's quite hard to sort the other conditions out
if there's complex boolean algebra allowed in the clause. If xmatching has
some special, custom syntax, then the possibility of writing queries that
cannot be executed can be minimized. Also, putting it into the xmatch clause
makes query rewriting much easier because you simply replace the xmatch part
with the output of the xmatch operation and leave all other joins in place
to execute the final query.

Here's an example:

SELECT s.objid, s.ra, s.dec, g.objid, g.ra, g.dec, x.ra, x.dec
INTO twowayxmatch
FROM XMATCH
     (MUST EXIST IN SDSSDR7:PhotoObjAll AS s WITH(POINT(s.ra, s.dec),
ERROR(0.1, 0.1, 0.1)),
      MUST EXIST IN GALEXGR6:PhotoObjAll AS g WITH(POINT(g.ra, g.dec),
ERROR(0.2, 0.2, 0.2)),
      LIMIT BAYESFACTOR TO 1e3) AS x
WHERE s.ra BETWEEN 0 AND 5 AND s.dec BETWEEN 0 AND 5
    AND g.ra BETWEEN 0 AND 5 AND g.dec BETWEEN 0 AND 5

The MUST EXIST will soon be extended to MAY EXIST and MUST NOT EXIST
(drop-outs, but for those, you need the footprints). This syntax is quite
extensible, because you can add more catalog-specific options to the WITH()
part. For instance, we could support extended source matching by replacing
POINT with ELLIPSE, etc. LIMIT BAYESFACTOR could be replaced with LIMIT
SEPARATION etc. to use a different algorithm.

We also support region-based filtering with the REGION clause. This is
intentionally not on defined on a per-table basis to avoid query plan
complexities.

SELECT s.objid, s.ra, s.dec, g.objid, g.ra, g.dec, x.ra, x.dec
INTO twowayxmatch
FROM XMATCH
     (MUST EXIST IN SDSSDR7:PhotoObjAll AS s WITH(POINT(s.ra, s.dec),
ERROR(0.1, 0.1, 0.1)),
      MUST EXIST IN GALEXGR6:PhotoObjAll AS g WITH(POINT(g.ra, g.dec),
ERROR(0.2, 0.2, 0.2)),
      LIMIT BAYESFACTOR TO 1e3) AS x
REGION 'CIRCLE J2000 0 0 5'

You can try it here: http://test.voservices.net/skyquery . It doesn't
support adql yet but I'm working on that.

-Laszlo

-----Original Message-----
From: dal-bounces at ivoa.net [mailto:dal-bounces at ivoa.net] On Behalf Of Mark
Taylor
Sent: Friday, November 3, 2017 5:27 PM
To: Francois-Xavier PINEAU <francois-xavier.pineau at astro.unistra.fr>
Cc: dal at ivoa.net
Subject: Re: ADQL 2.1: Preferred crossmatch syntax


On Fri, 3 Nov 2017, Francois-Xavier PINEAU wrote:

> Dear Mark and DAL,
> 
> Although I agree with the approach, there is one point that bothers me.
> 
> I know that not everyone agrees on the matter, but I personally prefer 
> to declare the cross-match condition in the JOIN (like in your 
> InteropOct2015DAL
> presentation) rather than in the WHERE.
> 
> In fact, in relational algebra, I prefer to see the cross-match as a 
> theta-join, not as a selection on the result of a cross-product.

OK, I don't really have an opinion on that, I never really know about what
syntax is best for joins, or whether it makes a difference.
[And I'm afraid I wouldn't know a theta-join if one trod on my foot]

Is it likely to make a difference in performance terms which of these two
ways it's done?  I.e. do we need to specify here what's the preferred way to
use a constraint like DISTANCE()<threshold, or is this something we can
harmlessly leave to the taste of the query author?

Mark

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/




More information about the dal mailing list