Two very loose ends of the ADQL 2.1 PR

Mon May 28 20:40:04 CEST 2018

Hello all,

On 26/05/18 11:21, Mark Taylor wrote:
>> I have been made aware of Section 4.2.7: "Preferred crossmatch syntax"
>> of the ADQL 2.1 PR.

> This is a new item in ADQL 2.1, so it does not affect those
> implementing or using ADQL 2.0 services.

Fair enough, yes.

> If that point isn't clear, it could be made explicit in the text
> by adding a comment like
>  
>    "While ADQL 2.0 services are also encouraged to implement this syntax
>     efficiently, clients should be aware there is no general expectation
>     that such queries will execute efficiently on such legacy services."

I think it would help, because there will be certainly important sites
sticking to ADQL 2.0 for a _very_ long time to come.

> or maybe just writing instead
>   "Clients posing crossmatch-like queries in ADQL 2.1 are adivised ..."

Well, I believe there is a difference between clients and users, i. e.,
people who are writing ADQL directly. I think it should be made clearer
who is responsible for what, a) the implementers of client software,
b) their users.

> The idea is that both users/clients and service providers will
> pay attention to the advice, so that crossmatches can run efficiently
> without too much implementation effort or guesswork on either side.
> Since this advice only applies to a new version of the standard
> (not legacy services), that doesn't seem too far-fetched.

I do not want to deny the good intentions of the section in question :-)
However, I would suspect that users will be very likely unaware of
the version number of the ADQL implementation they are sending their
queries to. Granted, as far as I understand, the recommended crossmatch
syntax of ADQL 2.1, in all likelihood, will not work on 2.0 services at
all...

>> Besides, there is a rather odd mismatch between the quite strong choice
>> of "advised" for users / clients and the much weaker word "encouraged"
>> for services.
> 
> I may have put too much nuance into the language here.  The intention
> is not stronger/weaker force, but that users are free to take or
> ignore the "advice" (since doing it wrong will mostly affect themselves),
> but services have a kind of moral obligation to do this, in order
> to provide good service to their users.  If non-native English
> speakers would like to offer a clearer or less ambiguous form of
> words, I don't object.

I see. But then again I fear that moral obligations will be quite often
not even perceived, and even more so just ignored. Therefore,

>> a sentence such as "This syntax MUST be handled as efficient or better
>> as semantically equivalent queries" would be in order.
> 
> I am reluctant to put a MUST relating to performance details.
> It's probably untestable and unenforcable.

acknowledging the problems with testability and enforceability, I
still argue for a SHOULD here. For, one can (or should!) assume that
actual implementers of ADQL will have a pretty good understanding
on how the underlying database software will be able to effectuate
a reasonably efficient crossmatch.

Anyway, in this case I also argue to rename this section, in order
to make the more relaxed ADQL 2.1 implementers aware of the fact that
the text does not only deal with syntax, but with expected ("SHOULD")
performance requirements as well.
Otherwise, it is all to likely that the fears of my original post
will still come true and too many people will just implement the
new syntax, without giving any thought to the new performance
requirements.

> Of course it's good if services can implement all crossmatch
> syntax variants efficiently.  But at present there is a large and
> ill-defined set of these, so it's a heavy requirement to put on
> services, and many of them don't or can't do that, so many
> clients have a poor experience.

Well, since it is virtually guaranteed that any ADQL 2.1 implementation
will have to implement a mildly sophisticated query rewriting
infrastructure for the ADQL 2.1 PR preferred crossmatch syntax to make
it perform as expected, the actual number of rewriting rules would be
probably not so large a burden. But since I am not working on that
myself, I can say no more there.

> This new section is an effort to address that problem.
> I still think it's a good idea, and that it will result in a better
> TAP user experience at a low cost for service implementors.

Well, conceding that I did throw around that "MUST" in my previous
message sort of prematurely, another point here is that probably most
software, such as RDBMSs, that is customarily used to implement ADQL
does _not_ offer a fixed SQL (or whatever QL) syntax that would be
automatically, internally, optimised to result in the fastest possible
crossmatch execution.

Most importantly, ADQL 2.1 implementations would have to efficiently
consider the sizes of the tables that are being crossmatched, reordering
the tables as necessary. And of course DBAs will have to make sure that
all those tables that merit it are properly indexed, whatever that means
for the specific database software used.
Thus, my assessment is "reasonable costs", rather than "low costs".

Best regards,
Markus Nullmeier