ADQL Erratum 2

Mon May 29 11:23:28 CEST 2017

Dear Markus, DAL,

As in DACHS, the ADQL-Library does consider the parameter of rand(x) as 
optional.

I think that I originally ignored the human description where the 
parameter is not written as optional.
Instead, I focused on its BNF description, where, on the contrary, the 
parameter is optional:

	RAND <left_paren> [ <unsigned_integer> ] <right_paren>

It really seems there was an inconsistency in the ADQL 2.0 document on 
that point.

So, to answer to your questions:

> (1) Does anyone actually implement random(int) (rather than just
> falling back to random())?  And if so, what do you do?

For PostgreSQL, I also translate the ADQL rand(x) into the SQL random(). 
So I completely ignore the seed parameter if any is given.

However, it seems that for SQLServer, the function rand([x]) exists as 
described in ADQL 2.0
and so that's how the SQLServer translator of my library translates it: 
exactly like in ADQL.

Similarly the MySQL and the H2 database (but not SQLite) have also the 
same optional parameter for rand: rand([x]).

> (2) Wouldn't it be preferable if we said, in the erratum, as the new
> text:
>
>    rand([x]) -- Returns a random value between 0.0 and 1.0.  The
>    argument was initially intended to provide a random seed, if given.
>    It turned out, however, that in concept and implementation, it is
>    hard to attach stable semantics to this notion.  Hence, while an
>    argument is accepted for backward compatibility, clients should
>    expect that the 1-argument function behaves exactly like the
>    0-argument one.
>
> Or something like this?

I completely agree to make the seed parameter optional.

But since this seed parameter is optionally accepted by some DBMS used 
on some existing TAP implementation, I am not entirely convinced that we 
should disable the possibility to use a seed parameter. Then, I don't 
have a strong opinion about the random generation and the need of 
"regenerating" the random numbers with a seed in a database usage.

So, why not saying that this optional parameter may be ignored by some 
ADQL implementation?
Like that, it works with everybody and I don't actually think that a 
client/user will really notice the difference and may complain....but I 
may be wrong here.

Cheers,
Grégory

On 05/29/2017 10:10 AM, Markus Demleitner wrote:
> Dear DAL,
>
> There's currently ADQL Erraturm 2,
> http://wiki.ivoa.net/twiki/bin/view/IVOA/ADQL-2_0-Err-2 under
> discussion.
>
> While I believe the main content is essentially uncontentious (I
> personally would prefer square brackets around optional arguments),
> I'm not so sure about:
>
>    rand(x)
>    Returns a random value between 0.0 and 1.0, *where x is an
>    optional seed value*.
>
> Frankly, I believe we need to say a bit more about this if we expect
> it to work.
>
> I *think* what the authors intended here was:
>
>    "If an argument to RAND is given, a single call to a setseed-like
>    function should be performed in the transaction that will later be
>    used to execute the query itself."
>
> It certainly makes no sense to set the seed as part of the query
> itself (you'd then get *very* unrandom numbers indeed).
>
> Full disclosure: DaCHS currently does neither: random(n) is
> translated to the same query as random().  My rationale is that I
> doubt you'll get any sort of reproducability (which setseed is about)
> either way, given that it's not clear if the PRNG is per-transaction
> or might be pushed along by queries executed in parallel (Postgres
> docs aren't clear here) and that, with set calculus and the query
> planner in the background, you can't really expect a particular
> sequence of rows and hence a particular sequence of random numbers by
> rows.
>
> Of course, it's also a pain to implement that extra query one would
> need for halfway reasonable behaviour.
>
> So:
>
> (1) Does anyone actually implement random(int) (rather than just
> falling back to random())?  And if so, what do you do?
>
> (2) Wouldn't it be preferable if we said, in the erratum, as the new
> text:
>
>    rand([x]) -- Returns a random value between 0.0 and 1.0.  The
>    argument was initially intended to provide a random seed, if given.
>    It turned out, however, that in concept and implementation, it is
>    hard to attach stable semantics to this notion.  Hence, while an
>    argument is accepted for backward compatibility, clients should
>    expect that the 1-argument function behaves exactly like the
>    0-argument one.
>
> Or something like this?
>
>            -- Markus