ADQL 2.1: Preferred crossmatch syntax

Mark Taylor m.b.taylor at bristol.ac.uk
Sun Nov 5 11:40:41 CET 2017


Laszlo,

thank you for this input.  It might be a promising direction for
future major revisions of ADQL, but since I'm no kind of database
expert, I'm not qualified to say.

However, it's addressing a somewhat different problem than I've
raised in this thread, which is proposing one of the several
existing equivalent options within the existing ADQL 2.1 syntax 
to use as a preferred form for a common type of crossmatch.
Your suggestion would be a significant new addition to the ADQL 
syntax which I doubt (? unless the authors disagree) is 
appropriate for a minor ADQL update, especially near the end of the
revision process.

I'll leave it to others to consider whether this looks like
something worth pursuing for ADQL v3.

Best wishes,

Mark

On Sat, 4 Nov 2017, Dobos, Laszlo wrote:

> Hi guys,
> 
> I'm not on the DAL list but Tamas Budavari forwarded the mail to me so let
> me chip in. In the new skyquery, we put the cross-match definition into the
> FROM clause but not as a list of joined tables but rather something like a
> table-valued function. Putting it into the FROM clause is much better than
> in the WHERE clause, not because it's much easier to parse but because if
> it's in the WHERE clause, it's quite hard to sort the other conditions out
> if there's complex boolean algebra allowed in the clause. If xmatching has
> some special, custom syntax, then the possibility of writing queries that
> cannot be executed can be minimized. Also, putting it into the xmatch clause
> makes query rewriting much easier because you simply replace the xmatch part
> with the output of the xmatch operation and leave all other joins in place
> to execute the final query.
> 
> Here's an example:
> 
> SELECT s.objid, s.ra, s.dec, g.objid, g.ra, g.dec, x.ra, x.dec
> INTO twowayxmatch
> FROM XMATCH
>      (MUST EXIST IN SDSSDR7:PhotoObjAll AS s WITH(POINT(s.ra, s.dec),
> ERROR(0.1, 0.1, 0.1)),
>       MUST EXIST IN GALEXGR6:PhotoObjAll AS g WITH(POINT(g.ra, g.dec),
> ERROR(0.2, 0.2, 0.2)),
>       LIMIT BAYESFACTOR TO 1e3) AS x
> WHERE s.ra BETWEEN 0 AND 5 AND s.dec BETWEEN 0 AND 5
>     AND g.ra BETWEEN 0 AND 5 AND g.dec BETWEEN 0 AND 5
> 
> The MUST EXIST will soon be extended to MAY EXIST and MUST NOT EXIST
> (drop-outs, but for those, you need the footprints). This syntax is quite
> extensible, because you can add more catalog-specific options to the WITH()
> part. For instance, we could support extended source matching by replacing
> POINT with ELLIPSE, etc. LIMIT BAYESFACTOR could be replaced with LIMIT
> SEPARATION etc. to use a different algorithm.
> 
> We also support region-based filtering with the REGION clause. This is
> intentionally not on defined on a per-table basis to avoid query plan
> complexities.
> 
> SELECT s.objid, s.ra, s.dec, g.objid, g.ra, g.dec, x.ra, x.dec
> INTO twowayxmatch
> FROM XMATCH
>      (MUST EXIST IN SDSSDR7:PhotoObjAll AS s WITH(POINT(s.ra, s.dec),
> ERROR(0.1, 0.1, 0.1)),
>       MUST EXIST IN GALEXGR6:PhotoObjAll AS g WITH(POINT(g.ra, g.dec),
> ERROR(0.2, 0.2, 0.2)),
>       LIMIT BAYESFACTOR TO 1e3) AS x
> REGION 'CIRCLE J2000 0 0 5'
> 
> You can try it here: http://test.voservices.net/skyquery . It doesn't
> support adql yet but I'm working on that.
> 
> -Laszlo
> 
> -----Original Message-----
> From: dal-bounces at ivoa.net [mailto:dal-bounces at ivoa.net] On Behalf Of Mark
> Taylor
> Sent: Friday, November 3, 2017 5:27 PM
> To: Francois-Xavier PINEAU <francois-xavier.pineau at astro.unistra.fr>
> Cc: dal at ivoa.net
> Subject: Re: ADQL 2.1: Preferred crossmatch syntax
> 
> 
> On Fri, 3 Nov 2017, Francois-Xavier PINEAU wrote:
> 
> > Dear Mark and DAL,
> > 
> > Although I agree with the approach, there is one point that bothers me.
> > 
> > I know that not everyone agrees on the matter, but I personally prefer 
> > to declare the cross-match condition in the JOIN (like in your 
> > InteropOct2015DAL
> > presentation) rather than in the WHERE.
> > 
> > In fact, in relational algebra, I prefer to see the cross-match as a 
> > theta-join, not as a selection on the result of a cross-product.
> 
> OK, I don't really have an opinion on that, I never really know about what
> syntax is best for joins, or whether it makes a difference.
> [And I'm afraid I wouldn't know a theta-join if one trod on my foot]
> 
> Is it likely to make a difference in performance terms which of these two
> ways it's done?  I.e. do we need to specify here what's the preferred way to
> use a constraint like DISTANCE()<threshold, or is this something we can
> harmlessly leave to the taste of the query author?
> 
> Mark
> 
> --
> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
> 
> 
> 

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/


More information about the dal mailing list