ADQL evolution: OFFSET?
Gerard Lemson
glemson1 at jhu.edu
Mon Jan 11 23:29:31 CET 2016
Hi Tom
> -----Original Message-----
> From: dal-bounces at ivoa.net [mailto:dal-bounces at ivoa.net] On Behalf Of Tom
> McGlynn (NASA/GSFC Code 660.1)
> Sent: Monday, January 11, 2016 3:30 PM
> To: dal at ivoa.net
> Subject: Re: ADQL evolution: OFFSET?
>
> In this discussion it seems to be implicit than using an ORDER BY expression
> ensures that the rows will be returned in some defined order. While that is a
> necessary condition it is not sufficient. The value of the expression must be
> unique in each row to guarantee a fixed ordering.
Yes, that is true. A unique (set of) column(s) would need to be included.
> Given this below are a few cogitations on the idea of whether we should couple
> OFFSET and ORDER BY.
>
> - So why do we want to have the OFFSET explicitly tied to an ORDER BY?
>
> If we want to guarantee that we can do pagination correctly, this won't do it.
> If we want to enable users to do pagaination correctly they will still need to
> carefully pick the expression they use in the order by clause. So it's not really
> much easier.
No, but it is still tied to an ORDER BY.
>
> - Are there costs to requiring order by?
>
> Order by's can dramatically slow down queries, so it's not entirely free. Since
> some static databases actually do return rows in a consistent order (for a given
> query), this could be a performance hit.
Often depends on the content of the cache. If someone just ran a query that put some rows in cache, the DB may decide to start returning those while retrieving other rows from disk. But it is true I suppose that 'order by' a non-unique column can cause different orders.
I am curious to the use case for 'OFFSET number' without ORDER BY.
> And of course if the database is not static, then even with an order by in the
> query one is not guananteed to get consistent results since new rows may have
> been added or deleted.
>
I suppose the only way in which we could be sure we'd get what we want, i.e. consistent, non-repeating pagination, is if we could open a cursor and fetch results from it. I am assuming a cursor to be transactionally stable.
I don't think this solution would work.
> - What about supporting OFFSET where the underlying database requires the
> OFFSET to be in an ORDER BY clause: Can we handle lone OFFSETs?
>
> I'm not familiar with the exact syntax required, but perhaps something like:
> ORDER BY 'x' OFFSET 100
> could be used if there is no existing ORDER BY clause in the query. Any constant
> expression would do except that I believe some databases use simple integer
> constants to refer to a column number in the result. But this would need to be
> tested.
>
I just now tied this on CasJobs' SQLServer and the error message says: 'A constant expression was encountered in the ORDER BY list'
What was accepted was ' ORDER BY 0*<numerical column> OFFSET 10 ROWS'
>
> - So my feeling is that we shouldn't go out of our way to couple OFFSET and
> ORDER BY.
>
> - The SQL 2008 standard construct is
>
> OFFSET start ROW[S] (at least according to POSTGRESQL which supports
> many variants here)
>
Same for MS SQLServer.
> While ADQL is based upon an earlier version of the SQL standard, it seems
> gratuitously silly to not follow the agreed standard when we're adding
> something new. It sets a tone that we don't care about other standards
> development.
>
> - In the vein of the previous comment, we might also consider adding support
> for the standard
> FETCH FIRST count ROWS ONLY
> as a standard if verbose synonym for TOP. [I don't like the idea of
> adding support for LIMIT since it is another non-standard keyword, though since
> the HEASARC's TAP is running in Postgres it's easy enough for us to support.]
>
For what it's worth, I agree.
But I would now like to see the reason for introducing this clause at all without ORDER BY.
Gerard
> Tom McGlynn
>
> François-Xavier Pineau wrote:
> > Dear Markus, dear all,
> >
> > Here after our (Laurent Michel and I) feelings about this.
> >
> > On 01/11/2016 11:31 AM, Markus Demleitner wrote:
> >> Dear DAL list,
> >>
> >> One of the users of DaCHS' ADQL engine requested the addition of
> >> OFFSET, and indeed I think there's at least three good reasons to do
> >> so:
> >>
> >> (1) It aids harvesting, which in particular with obscore is IMHO a
> >> very nice thing to have
> > We are not DBMS experts but we are slightly concerned by the possible
> > consequences of the usage of OFFSET to harvest large tables, e.g a
> > catalogue like 2MASS.
> >
> > First point (not specific to large tables) underlined by Gerard: to
> > obtain consistent results, one have to use OFFSET together with ORDER
> > BY (at least in PostgresSQL), see PSQL doc here:
> > http://www.postgresql.org/docs/9.5/static/queries-limit.html.
> >
> > Then (specific to large tables), like it is also mentioned in the PSQL
> > doc:
> > "The rows skipped by an OFFSET clause still have to be computed inside
> > the server; therefore a large OFFSET might be inefficient."
> >
> > So using LIMIT, ORDER BY and OFFSET may not be the best way to harvest
> > a large table (huge server load), except (maybe) if ORDER BY is made
> > on an indexed column, without filtering rows (no 'WHERE'), and if the
> > table is clustered according to the indexed column.
> >
> >> (2) It enables pagination UIs bolted on top of an ADQL service
> > Except for large results (let's say > 100 000 rows for a web browser),
> > the pagination can be made on the client side.
> > And above 100 000 rows, you will not look at all values but probably
> > want to order then according to a given column or put additional or
> > more restrictive constraints.
> >> (3) The OFFSET 0 hack that helps in several database systems to set a
> >> confused query planner right.
> > May OFFSET 0 confuse the query planner?
> > If not, I would vote for systematically add it on the server side (the
> > user do not have to know about it, it is internal cuisine).
> >
> > Having said this does not means that we are against its implementation
> > in ADQL.
> >
> > Cheers,
> >
> > François-Xavier Pineau (CDS) and Laurent Michel (OAS)
> >
> >>
> >> Given that, and given I've missed OFFSET even back in 2010 or so
> >> (when I wrote the example
> >> http://dc.g-vo.org/tap/examples#Crossmatchforaguidestar), I'm a bit
> >> surprised OFFSET hasn't been part of my TAP implementation notes.
> >>
> >> I don't remember why I dropped it -- perhaps because, equally
> >> surprisingly, OFFSET is not part of SQL92. Nevertheless, I'd like to
> >> propose the addition of (at first optional, mandatory starting 3.0)
> >> support for OFFSET in ADQL 2.1.
> >>
> >> This would entail giving a new rule
> >>
> >> <offset_spec> ::= "OFFSET" <unsigned_integer>
> >>
> >> and changing the query_specification production to
> >>
> >> <query_specification> ::=
> >> "SELECT" [ <set_quantifier> ] [ <set_limit> ]
> >> <select_list> <table_expression> [ <offset_spec> ]
> >>
> >> -- which would correspond to the postgres syntax of offset.
> >>
> >> Since it's not part of SQL 92, a bit of prose would be required in
> >> the body of the spec, which I'd be happy to contribute if people
> >> don't protest.
> >>
> >> But: How does everyone else feel about this? In particular: Is this
> >> an issue for non-postgres databases (and I'm mainly thinking of MySQL
> >> and Oracle here; SQLlite has the feature and even syntax as proposed
> >> here)?
> >>
> >> Cheers,
> >>
> >> Markus
> >
> >
More information about the dal
mailing list