Two very loose ends of the ADQL 2.1 PR

Arnold Rots arots at cfa.harvard.edu
Thu May 31 17:16:25 CEST 2018


Laszlo makes an excellent point.
Aside from the fact that cross-matching is not restricted to pairs of
catalogs, "spatial join" is a much more accurate description of most
services (including the operation we are talking about here) that are
advertised as cross-matching.
Proper cross-matching involves more than just looking for matching objects
within a radius of *n *arcsec - that operation is indeed truly a spatial
join.

Cheers,

  - Arnold

-------------------------------------------------------------------------------------------------------------
Arnold H. Rots                                          Chandra X-ray
Science Center
Smithsonian Astrophysical Observatory                   tel:  +1 617 496
7701
60 Garden Street, MS 67                                      fax:  +1 617
495 7356
Cambridge, MA 02138
arots at cfa.harvard.edu
USA
http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------------------------------------------


On Tue, May 29, 2018 at 12:28 PM, Dobos, László <dobos at complex.elte.hu>
wrote:

> Hi everyone,
>
> I've been working on SkyQuery for years at JHU and it became clear quite
> early that cross-matching is not an operator like join because it can be
> defined between more than two tables. So a different wording, maybe
> "spatial join" would be better and reserve cross-match for later use.
>
> -Laszlo
>
> -----Original Message-----
> From: dal-bounces at ivoa.net [mailto:dal-bounces at ivoa.net] On Behalf Of
> Markus Nullmeier
> Sent: Saturday, May 26, 2018 4:17 AM
> To: dal at ivoa.net
> Cc: Dave Morris <dmr at roe.ac.uk>
> Subject: Two very loose ends of the ADQL 2.1 PR
>
> Hello list,
>
> Problem A:
>
> I have been made aware of Section 4.2.7: "Preferred crossmatch syntax"
> of the ADQL 2.1 PR. As one of the maintainers of pgSphere, which is
> actually used by many a data centre to run various other software on top of
> it to implement ADQL, I claim to have some, if indirect, insight on
> real-world deployment of ADQL.
>
> While I do not have an opinion on ADQL syntax, I find the following
> sentence to be highly problematic:
>   "Clients posing crossmatch-like queries are advised to phrase them
>    this way rather than semantically equivalent alternatives, and
>    services are encouraged to ensure that this form of join is executed
>    efficiently;"
> For, in the real world, quite a few existing and very important services
> will virtually certainly, for ages to come, refrain from the effort to
> upgrade the ADQL implementations they are using with the necessary updates
> to rewrite queries accordingly -- however small and seemingly simple these
> changes appear to be.
> But the net result of that sentence will be that some users or even client
> implementers are going to pick up that "good advice", giving them a
> spectacularly bad VO experience on many real services, where the underlying
> database software (whatever it is) will use sequential scans instead of
> index scans, with the latter of course being orders of magnitude faster.
> Besides, there is a rather odd mismatch between the quite strong choice of
> "advised" for users / clients and the much weaker word "encouraged"
> for services.
>
> My recommendation is therefore to remove the words
>   "Clients posing crossmatch-like queries are advised to phrase them
>    this way rather than semantically equivalent alternatives"
> for good. Also, if the net effect of pushing some kind of crossmatch
> syntax (by the way, what about cone search?) is to have any hope, then a
> sentence such as "This syntax MUST be handled as efficient or better as
> semantically equivalent queries" would be in order.
> But I do believe enacting such a "quick fix" would be premature, because
> it is the antitheses of meaningful interoperability:
>
> The proposed preferred syntax of Section 4.2.7 did not exist before ADQL
> 2.1. Thus, the most efficient crossmatch syntax of older services is
> necessarily something else. But by following the old rule "be liberal in
> what you accept, be conservative in what you send", new ADQL services
> should really make sure that _any_ efficient crossmatch syntax that had had
> a significant following in past would be executed in the most efficient way.
> However, I personally lack the data about extant efficient ADQL crossmatch
> queries. Lacking this data, it may be wise to postpone Section 4.2.7 to
> ADQL 3.0.
>
>
> Problem B:
>
> To the best of my knowledge, the TAP 1.1 PR does not mention ADQL boxes. I
> wonder what still having boxes in ADQL means in this context.
> But be that as it may, this is not the chief problem of ADQL's box.
>
> I know that all the problems with box are inherited from a long time ago,
> but still probably then any "dot-one" release should really fix the
> following misfeatures. First, let me quote the relevant parts of the PR
> text (Section 4.2.9) below to give everybody reading this full
> context:
>
>       The BOX function expresses a box on the sky. A BOX is a special
>       case of POLYGON, defined purely for convenience, and
>       it corresponds semantically to the equivalent term, Box,
>       defined in the STC specification.
>       It is specified by a center position and size (in both axes)
>       defining a cross centered on the center position and with arms
>       extending, parallel to the coordinate axes at the center position,
>       for half the respective sizes on either side. The box’s sides are
>       line segments or great circles intersecting the arms of the cross
>       in its end points at right angles with the arms.
>
> A small nitpick for warming up: the phrase
>    "[...] it corresponds semantically to the equivalent term, Box,
>     defined in the STC specification"
> is a bit weird, because the ADQL box is a specialised syntax to specify a
> spherical polygon (with great circle segments a edges), but somehow STC's
> box probably allows for other kinds of edges than great circle segments.
> This also applies to other ADQL geometries(!). Maybe one could somehow,
> more correctly, state that ADQL geometries are subset of the possible
> geometries envisaged by STC, rather than being "equivalent".
>
> Now, the real problem is that "box" is perfectly ill-defined. The text
> speaks of
>   "arms extending, parallel to the coordinate axes at the center
>    position",
> but nowhere it defines what these "arms" should be. There are at least two
> equally plausible interpretations that easily come to mind:
>
> a) The "arms" are great circles. A very good argument for that is
>    that the text requires them to be parallel to the coordinate
>    axes only at the so-called centre position of the box.
>
> b) The "arms" are circles with constant RA, or constant DEC,
>    respectively. A very good argument for that is that the text
>    speaks of the actual edges of an ADQL box as
>      "line segments or great circles",
>    presumably another category than "arms".
>           [By the way, the "line segment" expression, nowhere else
>            to be found in the ADQL 2.1 PR, is obviously a very old
>            copy-and-paste leftover from the then current STC
>            document, where it alludes to curves on the unit sphere
>            that are not great circles.]
> (Note that a) and b) are different only for the "arms parallel to  the
> coordinate axes" tangential to circles of constant DEC.)
>
> I guess other interpretations might have their merits, too. Anyway, the
> question now is what may be done with this bane to interoperability.
>
> First, everybody reading the above carefully should agree that ADQL 2.1
> must NOT pass with "box" being in this dire state.
> Second, from the above follows that because of its woefully incomplete
> specification, there has _never_ been a compliant implementation of ADQL's
> box, by anybody.
>        [Now, I actually have spoken to people who did claim to have had
>         to-the-spec implementations at some point in time, but there was
>         not sufficient time to discuss which of the above (or even
>         another) interpretation they implemented, and if they thus had
>         implemented the same interpretation.
>         Also, they interestingly had no intention at all to put these
>         implementations forward in any way, they rather had a good laugh
>         when a conclusion along the lines 'happy to have that time
>         wasted' came up.]
>
> In the real world, nowadays actually many ADQL implementations just offer
> coordinate box semantics for "box" [they are a far cry from the ADQL text,
> especially because they are _not_ spherical polygons].
>
> Coordinate boxes are, from what I understand, requested by a sizeable
> fraction of users. But, for what it's worth, they are not universally
> appreciated by ADQL implementers (see also the TAP 1.1 issue above).
> At least one widely used ADQL implementation _does_ create a four-sided
> polygon for "box", but it implements very simple calculations that are
> totally incompatible with _any_ interpretation of the ADQL text, 2.0 or
> 2.1 PR.
>
> Probably the original motivation for the failed attempt of ADQL's box was
> the idea to have "something similar to a coordinate box that works around
> the poles". I wonder if there is any meaningful use case for that -- one
> can always use a spherical circle to probe the neighbourhood of a
> coordinate. But even if the case for a use case could be made, those who
> are proposing such a thing should come up with a sound definition of "box",
> or whatever else such a convenience polygon construction function would be
> called.
> Furthermore, there really should be a freely licensed and sufficiently
> documented reference implementation for that before such a thing would be
> standardised, because the numerical calculations for either of the above
> interpretations a) or b) are quite involved and proportionally error-prone.
>
> The most correct solution for the "box" problem can thus only be  1. to
> remove it from ADQL 2.0 via an erratum, because of its complete
>     ill-specification (see above).
>  2. to remove it from extant ADQL implementations, 2.0 or otherwise,
>     with error messages that clearly explain the problem at hand.
>     This is actually a very good service to users, who in many cases
>     _today_ are getting results form their queries that are very much
>     different from what they expect. "Be arbitrary in what you send"
>     is not to be recommended for interoperability.
>  3. to discuss the necessity to specify coordinate boxes, imperatively
>     with a different name, such as "cbox". For a start, remember that
>     these kinds of geometries are optional ADQL features.
>  4. if 3. is answered in the positive, to discuss putting coordinate
>     boxes into ADQL 2.1 and TAP 1.1 to accommodate for users who appear
>     to have a need for them.
>
> Best regards,
> Markus Nullmeier
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20180531/e69d0a0d/attachment.html>


More information about the dal mailing list