ADQL DISTANCE argument?

alberto micol amicol.ivoa at googlemail.com
Tue Mar 10 17:38:27 CET 2020


Dear Gregory,

In your answer you spent quite some time on the idea of supporting the OGC standard. 
As a conclusion you said (I paraphrase) that it would take long to support OGC hence we do not change the definition of the distance ADQL function.

But that was not at all what I was saying!

I mentioned the OGC standard in my second bullet item, as an example, to state that the definition of distance between two geometries has been already defined, and there is one and only one definition used by the entire world, Hence there is no need to debate on its definition. Far from me the idea of adopting the entire OGC standard (well not at this point in time)!

When I said, and only in my last bullet item: "(5) Adopting existing standards can only speed up our VO work.”, that was a generic statement always good to keep in mind. But I agree that I should have written “Adopting existing standard definitions can only speed up our VO work” instead.  I was only referring to the definition of “distance” as Markus was thinking of other possible distances than what normally used.

A standard cannot be based on what pgsphere can or cannot do.
Let’s not block the VO development because some (oldish) software component cannot do better.

My conclusions are:
The definition of distance between two geometries is well-defined and used world-wide
There is no reason to think of a new ADQL, version 2.2 or 3.0, ADQL2.1 can be achieved in May.
It is just only matter of allowing who can do more to do more. 
If old implementations cannot change, well, they won’t. an error message will be shown; it is matter of documenting this in the ADQL2.1 standard.
I have already provided all is needed, including definition, and small grammar changes, for a speedy implementation of “distance” in ADQL2.1. Nothing else is needed.
Cheers,
Alberto



> On 25. Feb 2020, at 18:51, Gregory MANTELET <gregory.mantelet at astro.unistra.fr> wrote:
> 
> Dear Alberto,
> 
> 
> On 25/02/2020 10:43, alberto micol wrote:
>> Dear Gregory,
>> 
>> I fully commend your will to finish ADQL-2.1 asap.
>> Still, one should be careful in defining things in a way that does not block developers, providers, use cases, and user’s uptake.
> 
> 
> I completely agree. But we should also be careful to the existing implementations. [which leads me to the points below]
> 
> 
>>> On 24. Feb 2020, at 11:20, Gregory MANTELET <gregory.mantelet at astro.unistra.fr <mailto:gregory.mantelet at astro.unistra.fr>> wrote:
>>> 
>>> Dear DAL,
>>> 
>>> In the goal to make ADQL-2.1 *finally* released, I would say that we keep version of DISTANCE between 2 points, instead of between two any other geometries. This latter, as Markus said, would require more careful definitions of what should be the distance between, for instance, a polygon and a circle (should it be between their "centroid" or the closest distance between boundaries of each geometry? ; this is a debate that, I think is out of scope for ADQL-2.1).
>> 
>> I do not see the need for any debate, as the world is already using one and only one definition:
> 
> 
> PostGIS and the geometries of SQLServer are not yet used by every databases in the VO, and so is the OGC standard.
> 
> If I take my personal case as example: I am using PostgreSQL + PgSphere but not PostGIS (maybe I should, that's something to think of....but not now). PgSphere is used by other major TAP services. However, it does not allow the computation of distance between anything else than circles and points.
> 
> Changing that can not be done easily on the implementation side, and one should think of how to define that properly in ADQL-x.x so that existing implementations have a way to deal with such new possibilities (the most reasonable would probably be to throw an error if a distance between 2 complex geometries is not supported).
> 
> If we especially want to adopt the OGC standard in ADQL, I suspect we should do it for all geometries and geo. functions, and not only a partial implementation with just DISTANCE: either we do it completely or we don't, but not a partial implementation. Currently I do not know if everything in ADQL is compatible with the OGC standard, and because of that, I prefer to be careful and wait a bit in order to study in details this evolution, rather than rushing into it with the goal in mind to release ADQL-2.1 around May this year.
> 
> 
>> (1) the distance between two neighbouring countries, take Germany and France, is zero,
>> otherwise they would not be neighbouring at all ! 
> 
> 
> I agree.
> 
> 
>> (2) The definition already exists in the universally-adopted OGC standard:
>>      "the distance between two geometries is the shortest distance between any two points in those two geometries"
>>     (see: OpenGIS® Implementation Standard for Geographic information - Simple feature access - Part 1: Common architecture
>> 	available at: http://portal.opengeospatial.org/files/?artifact_id=25355 <http://portal.opengeospatial.org/files/?artifact_id=25355> )
> 
> 
> It sounds interesting. It should definitely be something to think of for the future of ADQL. As I said, if we do so, we would probably have to review all other geometries to make everything consistent.
> 
> Besides, doing so will probably lead to another (annoying but unfortunately necessary) question: ADQL-2.2 or ADQL-3.0? (and I am not starting this discussion here)
> 
> 
>> (3) Many DBMSes already operationally use such definition (e.g., PostGIS, SQLServer, ORACLE). 
> 
> 
> ...and some existing databases (and extensions such as PgSphere) behind TAP services would have to evolve to follow such standard...that may take (unfortunately) time.
> 
> 
>> (4) Adopting any different definition would only cause confusion to everybody.
> 
> 
> ...it would be confusing only to people already using such geometries in databases....which may not be the case for the majority of our users.
> But yes, I agree, it would be much better to follow an existing worldwide standard.
> 
> 
>> (5) Adopting existing standards can only speed up our VO work.
>> 
>> With that definition we would be fine and ready for the future (for some of the implementations), or ready for the present (for some other implementations).
>> 
>> With the above definition, and with the grammar that I already proposed in an earlier email,
>> we are ready to proceed with no further delays to the publication of the ADQL2.1.
> 
> 
> To conclude my thoughts, I would propose that the possibility to support the OGC standard should be postponed to the next version of ADQL (2.2 or 3.0). Do not think that I do not like the idea....it is just that I prefer an evolution of ADQL as smooth as possible, otherwise we risk to break some existing related services and we definitely do not want that.
> 
> Cheers,
> Grégory
> 
> 
> PS: I will report everything said in this email thread in a GitHub issue so that we do not forget and that we can continue this discussion in future.
> 
> 
> 
>> Regarding centroids, yes, I’m in!
>> 
>> Many thanks,
>> Alberto
>> 
>>> However, as Pat commented, CENTROID should be allowed as valid argument of DISTANCE. This should not cost much to add that in the grammar. Besides, it would *indirectly* allow the computation between two geometries by writing something like: DISTANCE( CENTROID(POLYGON(....)), CENTROID(CIRCLE(...)) ).
>>> 
>>> About the version of DISTANCE with 4 numeric arguments, I am not especially in favor or against it. As Ger pointed it out, it is just syntactic sugar, which, as                 Markus said, may introduce a bit a complexity, and so of bugs...but I can not really anticipate which ones. So, I fairly neutral on this point.
>>> 
>>> To sum up my thoughts:
>>> 
>>> (for more readability here, I did not replace the parenthesis and comma with their BNF equivalent)
>>> ---------------------------------------------------------------------
>>> <distance> ::=
>>>     DISTANCE(<coord_value>, <coord_value>)
>>>   | DISTANCE(<numeric_value_expression>, <numeric_value_expression>,
>>>              <numeric_value_expression>, <numeric_value_expression>)
>>> 
>>> <coord_value> ::= <point_value> | <column_reference>
>>> 
>>> <point_value> ::= <point> | <centroid>
>>> ---------------------------------------------------------------------
>>> 
>>> I am aware that adding <centroid> into <point_value> has not an impact only on <distance>, but I looked in other places where it is used and I do not see why it would be inappropriate or error prone. Just tell me if it does.
>>> 
>>> I can start a GitHub's PR with these and the suggestions of Markus, if you want to.
>>> 
>>> Cheers,
>>> Grégory
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/dal/attachments/20200310/8bb8ca28/attachment.html>


More information about the dal mailing list