ADQL DISTANCE argument?

Gregory MANTELET gregory.mantelet at astro.unistra.fr
Tue Feb 25 18:51:35 CET 2020


Dear Alberto,


On 25/02/2020 10:43, alberto micol wrote:
> Dear Gregory,
>
> I fully commend your will to finish ADQL-2.1 asap.
> Still, one should be careful in defining things in a way that does not 
> block developers, providers, use cases, and user’s uptake.


I completely agree. But we should also be careful to the existing 
implementations. [which leads me to the points below]


>> On 24. Feb 2020, at 11:20, Gregory MANTELET 
>> <gregory.mantelet at astro.unistra.fr 
>> <mailto:gregory.mantelet at astro.unistra.fr>> wrote:
>>
>> Dear DAL,
>>
>> In the goal to make ADQL-2.1 *finally* released, I would say that we 
>> keep version of DISTANCE between 2 points, instead of between two any 
>> other geometries. This latter, as Markus said, would require more 
>> careful definitions of what should be the distance between, for 
>> instance, a polygon and a circle (should it be between their 
>> "centroid" or the closest distance between boundaries of each 
>> geometry? ; this is a debate that, I think is out of scope for ADQL-2.1).
>
> I do not see the need for any debate, as the world is already using 
> one and only one definition:


PostGIS and the geometries of SQLServer are not yet used by every 
databases in the VO, and so is the OGC standard.

If I take my personal case as example: I am using PostgreSQL + PgSphere 
but not PostGIS (maybe I should, that's something to think of....but not 
now). PgSphere is used by other major TAP services. However, it does not 
allow the computation of distance between anything else than circles and 
points.

Changing that can not be done easily on the implementation side, and one 
should think of how to define that properly in ADQL-x.x so that existing 
implementations have a way to deal with such new possibilities (the most 
reasonable would probably be to throw an error if a distance between 2 
complex geometries is not supported).

If we especially want to adopt the OGC standard in ADQL, I suspect we 
should do it for all geometries and geo. functions, and not only a 
partial implementation with just DISTANCE: either we do it completely or 
we don't, but not a partial implementation. Currently I do not know if 
everything in ADQL is compatible with the OGC standard, and because of 
that, I prefer to be careful and wait a bit in order to study in details 
this evolution, rather than rushing into it with the goal in mind to 
release ADQL-2.1 around May this year.


> (1) the distance between two neighbouring countries, take Germany and 
> France, is zero,
> otherwise they would not be neighbouring at all !


I agree.


> (2) The definition already exists in the universally-adopted OGC standard:
> /     "the distance between two geometries is the shortest distance 
> between any two points in those two geometries"/
>     (see: OpenGIS® Implementation Standard for Geographic information 
> - Simple feature access - Part 1: Common architecture
> available at: http://portal.opengeospatial.org/files/?artifact_id=25355 )


It sounds interesting. It should definitely be something to think of for 
the future of ADQL. As I said, if we do so, we would probably have to 
review all other geometries to make everything consistent.

Besides, doing so will probably lead to another (annoying but 
unfortunately necessary) question: ADQL-2.2 or ADQL-3.0? (and I am not 
starting this discussion here)


> (3) Many DBMSes already operationally use such definition (e.g., 
> PostGIS, SQLServer, ORACLE).


...and some existing databases (and extensions such as PgSphere) behind 
TAP services would have to evolve to follow such standard...that may 
take (unfortunately) time.


> (4) Adopting any different definition would only cause confusion to 
> everybody.


...it would be confusing only to people already using such geometries in 
databases....which may not be the case for the majority of our users.
But yes, I agree, it would be much better to follow an existing 
worldwide standard.


> (5) Adopting existing standards can only speed up our VO work.
>
> With that definition we would be fine and ready for the future (for 
> some of the implementations), or ready for the present (for some other 
> implementations).
>
> With the above definition, and with the grammar that I already 
> proposed in an earlier email,
> we are ready to proceed with no further delays to the publication of 
> the ADQL2.1.


To conclude my thoughts, I would propose that the possibility to support 
the OGC standard should be postponed to the next version of ADQL (2.2 or 
3.0). Do not think that I do not like the idea....it is just that I 
prefer an evolution of ADQL as smooth as possible, otherwise we risk to 
break some existing related services and we definitely do not want that.

Cheers,
Grégory


PS: I will report everything said in this email thread in a GitHub issue 
so that we do not forget and that we can continue this discussion in future.



> Regarding centroids, yes, I’m in!
>
> Many thanks,
> Alberto
>
>> However, as Pat commented, CENTROID should be allowed as valid 
>> argument of DISTANCE. This should not cost much to add that in the 
>> grammar. Besides, it would *indirectly* allow the computation between 
>> two geometries by writing something like: DISTANCE( 
>> CENTROID(POLYGON(....)), CENTROID(CIRCLE(...)) ).
>>
>> About the version of DISTANCE with 4 numeric arguments, I am not 
>> especially in favor or against it. As Ger pointed it out, it is just 
>> syntactic sugar, which, as Markus said, may introduce a bit a 
>> complexity, and so of bugs...but I can not really anticipate which 
>> ones. So, I fairly neutral on this point.
>>
>> To sum up my thoughts:
>>
>> /(for more readability here, I did not replace the parenthesis and 
>> comma with their BNF equivalent)/
>> ---------------------------------------------------------------------
>> <distance> ::=
>>     DISTANCE(<coord_value>, <coord_value>)
>>   | DISTANCE(<numeric_value_expression>, <numeric_value_expression>,
>> <numeric_value_expression>, <numeric_value_expression>)
>>
>> <coord_value> ::= <point_value> | <column_reference>
>>
>> <point_value> ::= <point> | <centroid>
>> ---------------------------------------------------------------------
>>
>> I am aware that adding <centroid> into <point_value> has not an 
>> impact only on <distance>, but I looked in other places where it is 
>> used and I do not see why it would be inappropriate or error prone. 
>> Just tell me if it does.
>>
>> I can start a GitHub's PR with these and the suggestions of Markus, 
>> if you want to.
>>
>> Cheers,
>> Grégory
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20200225/ba4c3d10/attachment.html>


More information about the dal mailing list