Two very loose ends of the ADQL 2.1 PR

Mark Taylor m.b.taylor at bristol.ac.uk
Thu May 31 20:02:41 CEST 2018


László,

thanks for that clarification, sounds entirely reasonable.

Mark

On Thu, 31 May 2018, Dobos, László wrote:

> Hi Mark,
> 
> Could these be good definitions?
> 
> - Join is a database term and means an operator between two tables. If one has three tables, they need two joins.
> - Cross-match is an astronomy term and in general, defined on multiple tables with spatial coordinates and errors. Drop outs and varying resolution make it a very different problem (both conceptually and implementationally) than spatial joins.
> 
> I'm suggesting this seemingly insignificant change with a future cross-match syntax extension in mind.
> 
> Cheers,
> 
> -Laszlo
> 
> -----Original Message-----
> From: Mark Taylor [mailto:m.b.taylor at bristol.ac.uk] 
> Sent: Thursday, May 31, 2018 6:03 PM
> To: Arnold Rots <arots at cfa.harvard.edu>
> Cc: Dobos, László <dobos at complex.elte.hu>; Dave Morris <dmr at roe.ac.uk>; DAL mailing list <dal at ivoa.net>
> Subject: Re: Two very loose ends of the ADQL 2.1 PR
> 
> I have to admit I'm not sure what the definitions of the terms are or where they come from, but I've no objection to changing the language in this section to avoid the term "crossmatch".
> "Spatial join" sounds reasonable. 
> 
> On Thu, 31 May 2018, Arnold Rots wrote:
> 
> > Laszlo makes an excellent point.
> > Aside from the fact that cross-matching is not restricted to pairs of 
> > catalogs, "spatial join" is a much more accurate description of most 
> > services (including the operation we are talking about here) that are 
> > advertised as cross-matching.
> > Proper cross-matching involves more than just looking for matching 
> > objects within a radius of *n *arcsec - that operation is indeed truly 
> > a spatial join.
> > 
> > Cheers,
> > 
> >   - Arnold
> > 
> > -------------------------------------------------------------------------------------------------------------
> > Arnold H. Rots                                          Chandra X-ray
> > Science Center
> > Smithsonian Astrophysical Observatory                   tel:  +1 617 496
> > 7701
> > 60 Garden Street, MS 67                                      fax:  +1 617
> > 495 7356
> > Cambridge, MA 02138
> > arots at cfa.harvard.edu
> > USA
> > http://hea-www.harvard.edu/~arots/
> > ----------------------------------------------------------------------
> > ----------------------------------------
> > 
> > 
> > On Tue, May 29, 2018 at 12:28 PM, Dobos, László 
> > <dobos at complex.elte.hu>
> > wrote:
> > 
> > > Hi everyone,
> > >
> > > I've been working on SkyQuery for years at JHU and it became clear 
> > > quite early that cross-matching is not an operator like join because 
> > > it can be defined between more than two tables. So a different 
> > > wording, maybe "spatial join" would be better and reserve cross-match for later use.
> > >
> > > -Laszlo
> > >
> > > -----Original Message-----
> > > From: dal-bounces at ivoa.net [mailto:dal-bounces at ivoa.net] On Behalf 
> > > Of Markus Nullmeier
> > > Sent: Saturday, May 26, 2018 4:17 AM
> > > To: dal at ivoa.net
> > > Cc: Dave Morris <dmr at roe.ac.uk>
> > > Subject: Two very loose ends of the ADQL 2.1 PR
> > >
> > > Hello list,
> > >
> > > Problem A:
> > >
> > > I have been made aware of Section 4.2.7: "Preferred crossmatch syntax"
> > > of the ADQL 2.1 PR. As one of the maintainers of pgSphere, which is 
> > > actually used by many a data centre to run various other software on 
> > > top of it to implement ADQL, I claim to have some, if indirect, 
> > > insight on real-world deployment of ADQL.
> > >
> > > While I do not have an opinion on ADQL syntax, I find the following 
> > > sentence to be highly problematic:
> > >   "Clients posing crossmatch-like queries are advised to phrase them
> > >    this way rather than semantically equivalent alternatives, and
> > >    services are encouraged to ensure that this form of join is executed
> > >    efficiently;"
> > > For, in the real world, quite a few existing and very important 
> > > services will virtually certainly, for ages to come, refrain from 
> > > the effort to upgrade the ADQL implementations they are using with 
> > > the necessary updates to rewrite queries accordingly -- however 
> > > small and seemingly simple these changes appear to be.
> > > But the net result of that sentence will be that some users or even 
> > > client implementers are going to pick up that "good advice", giving 
> > > them a spectacularly bad VO experience on many real services, where 
> > > the underlying database software (whatever it is) will use 
> > > sequential scans instead of index scans, with the latter of course being orders of magnitude faster.
> > > Besides, there is a rather odd mismatch between the quite strong 
> > > choice of "advised" for users / clients and the much weaker word "encouraged"
> > > for services.
> > >
> > > My recommendation is therefore to remove the words
> > >   "Clients posing crossmatch-like queries are advised to phrase them
> > >    this way rather than semantically equivalent alternatives"
> > > for good. Also, if the net effect of pushing some kind of crossmatch 
> > > syntax (by the way, what about cone search?) is to have any hope, 
> > > then a sentence such as "This syntax MUST be handled as efficient or 
> > > better as semantically equivalent queries" would be in order.
> > > But I do believe enacting such a "quick fix" would be premature, 
> > > because it is the antitheses of meaningful interoperability:
> > >
> > > The proposed preferred syntax of Section 4.2.7 did not exist before 
> > > ADQL 2.1. Thus, the most efficient crossmatch syntax of older 
> > > services is necessarily something else. But by following the old 
> > > rule "be liberal in what you accept, be conservative in what you 
> > > send", new ADQL services should really make sure that _any_ 
> > > efficient crossmatch syntax that had had a significant following in past would be executed in the most efficient way.
> > > However, I personally lack the data about extant efficient ADQL 
> > > crossmatch queries. Lacking this data, it may be wise to postpone 
> > > Section 4.2.7 to ADQL 3.0.
> > >
> > >
> > > Problem B:
> > >
> > > To the best of my knowledge, the TAP 1.1 PR does not mention ADQL 
> > > boxes. I wonder what still having boxes in ADQL means in this context.
> > > But be that as it may, this is not the chief problem of ADQL's box.
> > >
> > > I know that all the problems with box are inherited from a long time 
> > > ago, but still probably then any "dot-one" release should really fix 
> > > the following misfeatures. First, let me quote the relevant parts of 
> > > the PR text (Section 4.2.9) below to give everybody reading this 
> > > full
> > > context:
> > >
> > >       The BOX function expresses a box on the sky. A BOX is a special
> > >       case of POLYGON, defined purely for convenience, and
> > >       it corresponds semantically to the equivalent term, Box,
> > >       defined in the STC specification.
> > >       It is specified by a center position and size (in both axes)
> > >       defining a cross centered on the center position and with arms
> > >       extending, parallel to the coordinate axes at the center position,
> > >       for half the respective sizes on either side. The box’s sides are
> > >       line segments or great circles intersecting the arms of the cross
> > >       in its end points at right angles with the arms.
> > >
> > > A small nitpick for warming up: the phrase
> > >    "[...] it corresponds semantically to the equivalent term, Box,
> > >     defined in the STC specification"
> > > is a bit weird, because the ADQL box is a specialised syntax to 
> > > specify a spherical polygon (with great circle segments a edges), 
> > > but somehow STC's box probably allows for other kinds of edges than great circle segments.
> > > This also applies to other ADQL geometries(!). Maybe one could 
> > > somehow, more correctly, state that ADQL geometries are subset of 
> > > the possible geometries envisaged by STC, rather than being "equivalent".
> > >
> > > Now, the real problem is that "box" is perfectly ill-defined. The 
> > > text speaks of
> > >   "arms extending, parallel to the coordinate axes at the center
> > >    position",
> > > but nowhere it defines what these "arms" should be. There are at 
> > > least two equally plausible interpretations that easily come to mind:
> > >
> > > a) The "arms" are great circles. A very good argument for that is
> > >    that the text requires them to be parallel to the coordinate
> > >    axes only at the so-called centre position of the box.
> > >
> > > b) The "arms" are circles with constant RA, or constant DEC,
> > >    respectively. A very good argument for that is that the text
> > >    speaks of the actual edges of an ADQL box as
> > >      "line segments or great circles",
> > >    presumably another category than "arms".
> > >           [By the way, the "line segment" expression, nowhere else
> > >            to be found in the ADQL 2.1 PR, is obviously a very old
> > >            copy-and-paste leftover from the then current STC
> > >            document, where it alludes to curves on the unit sphere
> > >            that are not great circles.] (Note that a) and b) are 
> > > different only for the "arms parallel to  the coordinate axes" 
> > > tangential to circles of constant DEC.)
> > >
> > > I guess other interpretations might have their merits, too. Anyway, 
> > > the question now is what may be done with this bane to interoperability.
> > >
> > > First, everybody reading the above carefully should agree that ADQL 
> > > 2.1 must NOT pass with "box" being in this dire state.
> > > Second, from the above follows that because of its woefully 
> > > incomplete specification, there has _never_ been a compliant 
> > > implementation of ADQL's box, by anybody.
> > >        [Now, I actually have spoken to people who did claim to have had
> > >         to-the-spec implementations at some point in time, but there was
> > >         not sufficient time to discuss which of the above (or even
> > >         another) interpretation they implemented, and if they thus had
> > >         implemented the same interpretation.
> > >         Also, they interestingly had no intention at all to put these
> > >         implementations forward in any way, they rather had a good laugh
> > >         when a conclusion along the lines 'happy to have that time
> > >         wasted' came up.]
> > >
> > > In the real world, nowadays actually many ADQL implementations just 
> > > offer coordinate box semantics for "box" [they are a far cry from 
> > > the ADQL text, especially because they are _not_ spherical polygons].
> > >
> > > Coordinate boxes are, from what I understand, requested by a 
> > > sizeable fraction of users. But, for what it's worth, they are not 
> > > universally appreciated by ADQL implementers (see also the TAP 1.1 issue above).
> > > At least one widely used ADQL implementation _does_ create a 
> > > four-sided polygon for "box", but it implements very simple 
> > > calculations that are totally incompatible with _any_ interpretation 
> > > of the ADQL text, 2.0 or
> > > 2.1 PR.
> > >
> > > Probably the original motivation for the failed attempt of ADQL's 
> > > box was the idea to have "something similar to a coordinate box that 
> > > works around the poles". I wonder if there is any meaningful use 
> > > case for that -- one can always use a spherical circle to probe the 
> > > neighbourhood of a coordinate. But even if the case for a use case 
> > > could be made, those who are proposing such a thing should come up 
> > > with a sound definition of "box", or whatever else such a 
> > > convenience polygon construction function would be called.
> > > Furthermore, there really should be a freely licensed and 
> > > sufficiently documented reference implementation for that before 
> > > such a thing would be standardised, because the numerical 
> > > calculations for either of the above interpretations a) or b) are quite involved and proportionally error-prone.
> > >
> > > The most correct solution for the "box" problem can thus only be  1. 
> > > to remove it from ADQL 2.0 via an erratum, because of its complete
> > >     ill-specification (see above).
> > >  2. to remove it from extant ADQL implementations, 2.0 or otherwise,
> > >     with error messages that clearly explain the problem at hand.
> > >     This is actually a very good service to users, who in many cases
> > >     _today_ are getting results form their queries that are very much
> > >     different from what they expect. "Be arbitrary in what you send"
> > >     is not to be recommended for interoperability.
> > >  3. to discuss the necessity to specify coordinate boxes, imperatively
> > >     with a different name, such as "cbox". For a start, remember that
> > >     these kinds of geometries are optional ADQL features.
> > >  4. if 3. is answered in the positive, to discuss putting coordinate
> > >     boxes into ADQL 2.1 and TAP 1.1 to accommodate for users who appear
> > >     to have a need for them.
> > >
> > > Best regards,
> > > Markus Nullmeier
> > >
> > >
> > >
> > 
> 
> --
> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
> 
> 

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/


More information about the dal mailing list