ADQL evolution: OFFSET?

François-Xavier Pineau francois-xavier.pineau at astro.unistra.fr
Mon Jan 11 16:46:15 CET 2016


Dear Markus, dear all,

Here after our (Laurent Michel and I) feelings about this.

On 01/11/2016 11:31 AM, Markus Demleitner wrote:
> Dear DAL list,
>
> One of the users of DaCHS' ADQL engine requested the addition of
> OFFSET, and indeed I think there's at least three good reasons to do
> so:
>
> (1) It aids harvesting, which in particular with obscore is IMHO a
>      very nice thing to have
We are not DBMS experts but we are slightly concerned by the possible 
consequences of the usage of
OFFSET to harvest large tables, e.g a catalogue like 2MASS.

First point (not specific to large tables) underlined by Gerard: to 
obtain consistent results, one have to use OFFSET together with
ORDER BY (at least in PostgresSQL), see PSQL doc here: 
http://www.postgresql.org/docs/9.5/static/queries-limit.html.

Then (specific to large tables), like it is also mentioned in the PSQL doc:
"The rows skipped by an OFFSET clause still have to be computed inside 
the server; therefore a large OFFSET might be inefficient."

So using LIMIT, ORDER BY and OFFSET may not be the best way to harvest a 
large table (huge server load),
except (maybe) if ORDER BY is made on an indexed column, without 
filtering rows (no 'WHERE'), and
if the table is clustered according to the indexed column.

> (2) It enables pagination UIs bolted on top of an ADQL service
Except for large results (let's say > 100 000 rows for a web browser), 
the pagination can be made on the client side.
And above 100 000 rows, you will not look at all values but probably 
want to order then according to a given column
or put additional or more restrictive constraints.
> (3) The OFFSET 0 hack that helps in several database systems to set a
>      confused query planner right.
May OFFSET 0 confuse the query planner?
If not, I would vote for systematically add it on the server side (the 
user do not have to know about it, it is internal cuisine).

Having said this does not means that we are against its implementation 
in ADQL.

Cheers,

François-Xavier Pineau (CDS) and Laurent Michel (OAS)

>
> Given that, and given I've missed OFFSET even back in 2010 or so
> (when I wrote the example
> http://dc.g-vo.org/tap/examples#Crossmatchforaguidestar), I'm a bit
> surprised OFFSET hasn't been part of my TAP implementation notes.
>
> I don't remember why I dropped it -- perhaps because, equally
> surprisingly, OFFSET is not part of SQL92.  Nevertheless, I'd like to
> propose the addition of (at first optional, mandatory starting 3.0)
> support for OFFSET in ADQL 2.1.
>
> This would entail giving a new rule
>
>    <offset_spec> ::= "OFFSET" <unsigned_integer>
>
> and changing the query_specification production to
>
>    <query_specification> ::=
>      "SELECT" [ <set_quantifier> ] [ <set_limit> ]
>        <select_list> <table_expression> [ <offset_spec> ]
>
> -- which would correspond to the postgres syntax of offset.
>
> Since it's not part of SQL 92, a bit of prose would be required in
> the body of the spec, which I'd be happy to contribute if people
> don't protest.
>
> But: How does everyone else feel about this?  In particular: Is this
> an issue for non-postgres databases (and I'm mainly thinking of MySQL
> and Oracle here; SQLlite has the feature and even syntax as proposed
> here)?
>
> Cheers,
>
>            Markus


-- 
François-Xavier Pineau
CDS, Observatoire Astronomique de Strasbourg
11, rue de l'Université
F - 67000 Strasbourg


More information about the dal mailing list