ADQL DISTANCE argument?

Gregory MANTELET gregory.mantelet at astro.unistra.fr
Mon Feb 24 11:20:33 CET 2020


Dear DAL,

In the goal to make ADQL-2.1 *finally* released, I would say that we 
keep version of DISTANCE between 2 points, instead of between two any 
other geometries. This latter, as Markus said, would require more 
careful definitions of what should be the distance between, for 
instance, a polygon and a circle (should it be between their "centroid" 
or the closest distance between boundaries of each geometry? ; this is a 
debate that, I think is out of scope for ADQL-2.1).

However, as Pat commented, CENTROID should be allowed as valid argument 
of DISTANCE. This should not cost much to add that in the grammar. 
Besides, it would *indirectly* allow the computation between two 
geometries by writing something like: DISTANCE( CENTROID(POLYGON(....)), 
CENTROID(CIRCLE(...)) ).

About the version of DISTANCE with 4 numeric arguments, I am not 
especially in favor or against it. As Ger pointed it out, it is just 
syntactic sugar, which, as Markus said, may introduce a bit a 
complexity, and so of bugs...but I can not really anticipate which ones. 
So, I fairly neutral on this point.

To sum up my thoughts:

/(for more readability here, I did not replace the parenthesis and comma 
with their BNF equivalent)/
---------------------------------------------------------------------
<distance> ::=
     DISTANCE(<coord_value>, <coord_value>)
   | DISTANCE(<numeric_value_expression>, <numeric_value_expression>,
<numeric_value_expression>, <numeric_value_expression>)

<coord_value> ::= <point_value> | <column_reference>

<point_value> ::= <point> | <centroid>
---------------------------------------------------------------------

I am aware that adding <centroid> into <point_value> has not an impact 
only on <distance>, but I looked in other places where it is used and I 
do not see why it would be inappropriate or error prone. Just tell me if 
it does.

I can start a GitHub's PR with these and the suggestions of Markus, if 
you want to.

Cheers,
Grégory



On 19/02/2020 16:14, Markus Demleitner wrote:
> Dear DAL,
>
> Current ADQL says:
>
>    Functions like AREA, COORD1, COORD2 and DISTANCE accept a
>    geometry and return a calculated numeric value.
>
>    The specification defines two versions of the DISTANCE function, one
>    that accepts two geometries, and one that accepts four separate numeric
>    values, both forms return a numeric value.
>
> Both statements would indicate that DISTANCE should accept general
> geometries, i.e., including circles and polygons.
>
> The later definition of DISTANCE then says
>
>    The specification defines two versions of the DISTANCE function,
>    one that accepts two POINT values, and a second that accepts four
>    separate numeric values.
>
> -- which is clear enough, had it not been for the previous statement,
> and the later statement
>
>    If the geometric arguments are expressed ...
>
> which might again be understood as saying the arguments can be more
> general.
>
> Finally, the grammar says, for the geometry case:
>
>    DISTANCE <left_paren> <coord_value> <comma>
>      <coord_value> <right_paren>
>
> where
>
>    <coord_value> ::= <point> | <column_reference>
>
> I *think* all this works out to say that over and above the grammar,
> for distance there's the additional constraint that column_reference
> must be POINT-typed.[1]
>
> Being general here is a pain in the neck (actually, that's why I ran
> into this question).  For one, you'll need to define distance
> much more carefully for such geometries, and if (as I think we ought
> to) we chose "minimum of distances of between all points in arg 1 and
> arg 2", I doubt we'll see many correct implementations of that.  Also
> I'll want to map a lot of DISTANCE calls into contains(point,
> circle) statements (because that's much easier on the query planner),
> and that's a pain if one of the points could actually be, say, a
> polygon.
>
> So... do we agree that DISTANCE only accept POINT-s?
>
> If so, I'd suggest to just drop the sentence:
>
>    Functions like AREA, COORD1, COORD2 and DISTANCE accept a
>    geometry and return a calculated numeric value.
>
> Then change
>
>    The specification defines two versions of the DISTANCE function,
>    one that accepts two geometries, and one that accepts four
>    separate...
>
> to
>
>    This specification defines two versions of the DISTANCE function,
>    one that accepts two POINTs, and one that accepts four
>    separate...
>
> And then add in 4.2.16 in some appropriate location something like
>
>    Note that when <column reference>s[2] are passed into DISTANCE, the
>    operation is only defined for POINT-typed values.  Behaviour for
>    other geometries is undefined at this point (but may be defined
>    later).
>
> Would anyone veto a PR to this effect?  Would anyone prefer something
> completely different?  Would anyone volunteer for doing the PR?
>
>             -- Markus
>
> [1] Incidentally, the grammar rules are incompatible with the
> statement in the 4.2.16 that "[t]he DISTANCE function may be applied
> to any expression that returns a geometric POINT value"; I see why it
> was put in, but unless we fix the grammar, we should remove the
> prose.
>
> [2] or <geometry_value_expression>s, depending on how you think about
> [1]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20200224/a4be3612/attachment.html>


More information about the dal mailing list