ADQL evolution: OFFSET?

Mon Jan 11 21:16:17 CET 2016

Hi Theresa and all,

> -----Original Message-----
> From: dal-bounces at ivoa.net [mailto:dal-bounces at ivoa.net] On Behalf Of
> Theresa Dower
> Sent: Monday, January 11, 2016 2:02 PM
> To: francois-xavier.pineau at astro.unistra.fr; dal at ivoa.net
> Subject: RE: ADQL evolution: OFFSET?
> 
> All,
> 
> Hello! As a note, while newer versions of MSSQL Server handle OFFSET (and also
> require the order by, as a possible performance issue), I have found no evidence
> of support for LIMIT, only very ugly kludges that will require ADQL query
> rewriting. Since the kludges do exist, and are less bad than the issues we've had
> with NATURAL JOINs, I'm not going to argue too hard against this, but I did want
> to make the issue known.
>
>From the SQLServer 2012 documentation it seems that one cannot use TOP together with OFFSET, but that instead FETCH should be used to limit the result set. And in the OFFSET context it seems to be basically equivalent to LIMIT in for example Postgres. SQLServer would have something like

SELECT select_list
  FROM table_expression
 [ORDER BY ... [OFFSET number ROWS [FETCH FIRST number ROWS ONLY]]

(Note, OFFSET 0 ROWS FETCH FIRST number ROWS ONLY seems to be equivalent in SQLServer to SELECT TOP number, as one would hope.)

Where Postgres would have 

SELECT select_list
    FROM table_expression
    [ ORDER BY ... ]
    [ LIMIT { number | ALL } ] [ OFFSET number ]

Assuming that we have to do some query rewriting anyway (e.g. would we want to keep the ROWS and ONLY keywords from SQLServer?) , if ADQL follows either of these patterns (*) it will map quite easily to the other (**). 
More easily I think than the current rewrite that is (still?) required  from ADQL's 'SELECT TOP number FROM ... ' to Postgres' 'SELECT ... FROM ... LIMIT number' 

Cheers
Gerard

(*) should the LIMIT/FETCH pattern be included? If so I'd vote for using the term FETCH rather than LIMIT and that we'd include the restriction that TOP must not be combined with OFFSET...FETCH (as in SQLServer).
(**) assuming ADQL will allow OFFSET only in the ORDER BY clause, otherwise mapping to SQLServer would become too ugly, and arguably it only makes sense in that context.

> --Theresa
> 
> -----Original Message-----
> From: dal-bounces at ivoa.net [mailto:dal-bounces at ivoa.net] On Behalf Of
> François-Xavier Pineau
> Sent: Monday, January 11, 2016 10:46 AM
> To: dal at ivoa.net
> Subject: Re: ADQL evolution: OFFSET?
> 
> Dear Markus, dear all,
> 
> Here after our (Laurent Michel and I) feelings about this.
> 
> On 01/11/2016 11:31 AM, Markus Demleitner wrote:
> > Dear DAL list,
> >
> > One of the users of DaCHS' ADQL engine requested the addition of
> > OFFSET, and indeed I think there's at least three good reasons to do
> > so:
> >
> > (1) It aids harvesting, which in particular with obscore is IMHO a
> >      very nice thing to have
> We are not DBMS experts but we are slightly concerned by the possible
> consequences of the usage of OFFSET to harvest large tables, e.g a catalogue
> like 2MASS.
> 
> First point (not specific to large tables) underlined by Gerard: to obtain
> consistent results, one have to use OFFSET together with ORDER BY (at least in
> PostgresSQL), see PSQL doc here:
> http://www.postgresql.org/docs/9.5/static/queries-limit.html.
> 
> Then (specific to large tables), like it is also mentioned in the PSQL doc:
> "The rows skipped by an OFFSET clause still have to be computed inside the
> server; therefore a large OFFSET might be inefficient."
> 
> So using LIMIT, ORDER BY and OFFSET may not be the best way to harvest a
> large table (huge server load), except (maybe) if ORDER BY is made on an
> indexed column, without filtering rows (no 'WHERE'), and if the table is clustered
> according to the indexed column.
> 
> > (2) It enables pagination UIs bolted on top of an ADQL service
> Except for large results (let's say > 100 000 rows for a web browser), the
> pagination can be made on the client side.
> And above 100 000 rows, you will not look at all values but probably want to
> order then according to a given column or put additional or more restrictive
> constraints.
> > (3) The OFFSET 0 hack that helps in several database systems to set a
> >      confused query planner right.
> May OFFSET 0 confuse the query planner?
> If not, I would vote for systematically add it on the server side (the user do not
> have to know about it, it is internal cuisine).
> 
> Having said this does not means that we are against its implementation in ADQL.
> 
> Cheers,
> 
> François-Xavier Pineau (CDS) and Laurent Michel (OAS)
> 
> >
> > Given that, and given I've missed OFFSET even back in 2010 or so (when
> > I wrote the example
> > http://dc.g-vo.org/tap/examples#Crossmatchforaguidestar), I'm a bit
> > surprised OFFSET hasn't been part of my TAP implementation notes.
> >
> > I don't remember why I dropped it -- perhaps because, equally
> > surprisingly, OFFSET is not part of SQL92.  Nevertheless, I'd like to
> > propose the addition of (at first optional, mandatory starting 3.0)
> > support for OFFSET in ADQL 2.1.
> >
> > This would entail giving a new rule
> >
> >    <offset_spec> ::= "OFFSET" <unsigned_integer>
> >
> > and changing the query_specification production to
> >
> >    <query_specification> ::=
> >      "SELECT" [ <set_quantifier> ] [ <set_limit> ]
> >        <select_list> <table_expression> [ <offset_spec> ]
> >
> > -- which would correspond to the postgres syntax of offset.
> >
> > Since it's not part of SQL 92, a bit of prose would be required in the
> > body of the spec, which I'd be happy to contribute if people don't
> > protest.
> >
> > But: How does everyone else feel about this?  In particular: Is this
> > an issue for non-postgres databases (and I'm mainly thinking of MySQL
> > and Oracle here; SQLlite has the feature and even syntax as proposed
> > here)?
> >
> > Cheers,
> >
> >            Markus
> 
> 
> --
> François-Xavier Pineau
> CDS, Observatoire Astronomique de Strasbourg 11, rue de l'Université F - 67000
> Strasbourg