ADQL evolution: OFFSET?
gerard.lemson at gmail.com
gerard.lemson at gmail.com
Mon Jan 11 23:30:04 CET 2016
Hi Tom
> -----Original Message-----
> From: dal-bounces at ivoa.net [mailto:dal-bounces at ivoa.net] On Behalf Of
> Tom McGlynn (NASA/GSFC Code 660.1)
> Sent: Monday, January 11, 2016 3:30 PM
> To: dal at ivoa.net
> Subject: Re: ADQL evolution: OFFSET?
>
> In this discussion it seems to be implicit than using an ORDER BY
> expression ensures that the rows will be returned in some defined
> order. While that is a necessary condition it is not sufficient. The
> value of the expression must be unique in each row to guarantee a fixed ordering.
Yes, that is true. A unique (set of) column(s) would need to be included.
> Given this below are a few cogitations on the idea of whether we
> should couple OFFSET and ORDER BY.
>
> - So why do we want to have the OFFSET explicitly tied to an ORDER BY?
>
> If we want to guarantee that we can do pagination correctly, this won't do it.
> If we want to enable users to do pagaination correctly they will still
> need to carefully pick the expression they use in the order by clause.
> So it's not really much easier.
No, but it is still tied to an ORDER BY.
>
> - Are there costs to requiring order by?
>
> Order by's can dramatically slow down queries, so it's not entirely
> free. Since some static databases actually do return rows in a
> consistent order (for a given query), this could be a performance hit.
Often depends on the content of the cache. If someone just ran a query that put some rows in cache, the DB may decide to start returning those while retrieving other rows from disk. But it is true I suppose that 'order by' a non-unique column can cause different orders.
I am curious to the use case for 'OFFSET number' without ORDER BY.
> And of course if the database is not static, then even with an order
> by in the query one is not guananteed to get consistent results since
> new rows may have been added or deleted.
>
I suppose the only way in which we could be sure we'd get what we want, i.e. consistent, non-repeating pagination, is if we could open a cursor and fetch results from it. I am assuming a cursor to be transactionally stable.
I don't think this solution would work.
> - What about supporting OFFSET where the underlying database requires
> the OFFSET to be in an ORDER BY clause: Can we handle lone OFFSETs?
>
> I'm not familiar with the exact syntax required, but perhaps something like:
> ORDER BY 'x' OFFSET 100
> could be used if there is no existing ORDER BY clause in the query.
> Any constant expression would do except that I believe some databases
> use simple integer constants to refer to a column number in the
> result. But this would need to be tested.
>
I just now tied this on CasJobs' SQLServer and the error message says: 'A constant expression was encountered in the ORDER BY list'
What was accepted was ' ORDER BY 0*<numerical column> OFFSET 10 ROWS'
>
> - So my feeling is that we shouldn't go out of our way to couple
> OFFSET and ORDER BY.
>
> - The SQL 2008 standard construct is
>
> OFFSET start ROW[S] (at least according to POSTGRESQL which
> supports many variants here)
>
Same for MS SQLServer.
> While ADQL is based upon an earlier version of the SQL standard, it
> seems gratuitously silly to not follow the agreed standard when we're
> adding something new. It sets a tone that we don't care about other
> standards development.
>
> - In the vein of the previous comment, we might also consider adding
> support for the standard
> FETCH FIRST count ROWS ONLY
> as a standard if verbose synonym for TOP. [I don't like the idea of
> adding support for LIMIT since it is another non-standard keyword,
> though since the HEASARC's TAP is running in Postgres it's easy enough
> for us to support.]
>
For what it's worth, I agree.
But I would now like to see the reason for introducing this clause at all without ORDER BY.
Gerard
> Tom McGlynn
>
> François-Xavier Pineau wrote:
> > Dear Markus, dear all,
> >
> > Here after our (Laurent Michel and I) feelings about this.
> >
> > On 01/11/2016 11:31 AM, Markus Demleitner wrote:
> >> Dear DAL list,
> >>
> >> One of the users of DaCHS' ADQL engine requested the addition of
> >> OFFSET, and indeed I think there's at least three good reasons to
> >> do
> >> so:
> >>
> >> (1) It aids harvesting, which in particular with obscore is IMHO a
> >> very nice thing to have
> > We are not DBMS experts but we are slightly concerned by the
> > possible consequences of the usage of OFFSET to harvest large
> > tables, e.g a catalogue like 2MASS.
> >
> > First point (not specific to large tables) underlined by Gerard: to
> > obtain consistent results, one have to use OFFSET together with
> > ORDER BY (at least in PostgresSQL), see PSQL doc here:
> > http://www.postgresql.org/docs/9.5/static/queries-limit.html.
> >
> > Then (specific to large tables), like it is also mentioned in the
> > PSQL
> > doc:
> > "The rows skipped by an OFFSET clause still have to be computed
> > inside the server; therefore a large OFFSET might be inefficient."
> >
> > So using LIMIT, ORDER BY and OFFSET may not be the best way to
> > harvest a large table (huge server load), except (maybe) if ORDER BY
> > is made on an indexed column, without filtering rows (no 'WHERE'),
> > and if the table is clustered according to the indexed column.
> >
> >> (2) It enables pagination UIs bolted on top of an ADQL service
> > Except for large results (let's say > 100 000 rows for a web
> > browser), the pagination can be made on the client side.
> > And above 100 000 rows, you will not look at all values but probably
> > want to order then according to a given column or put additional or
> > more restrictive constraints.
> >> (3) The OFFSET 0 hack that helps in several database systems to set a
> >> confused query planner right.
> > May OFFSET 0 confuse the query planner?
> > If not, I would vote for systematically add it on the server side
> > (the user do not have to know about it, it is internal cuisine).
> >
> > Having said this does not means that we are against its
> > implementation in ADQL.
> >
> > Cheers,
> >
> > François-Xavier Pineau (CDS) and Laurent Michel (OAS)
> >
> >>
> >> Given that, and given I've missed OFFSET even back in 2010 or so
> >> (when I wrote the example
> >> http://dc.g-vo.org/tap/examples#Crossmatchforaguidestar), I'm a bit
> >> surprised OFFSET hasn't been part of my TAP implementation notes.
> >>
> >> I don't remember why I dropped it -- perhaps because, equally
> >> surprisingly, OFFSET is not part of SQL92. Nevertheless, I'd like
> >> to propose the addition of (at first optional, mandatory starting
> >> 3.0) support for OFFSET in ADQL 2.1.
> >>
> >> This would entail giving a new rule
> >>
> >> <offset_spec> ::= "OFFSET" <unsigned_integer>
> >>
> >> and changing the query_specification production to
> >>
> >> <query_specification> ::=
> >> "SELECT" [ <set_quantifier> ] [ <set_limit> ]
> >> <select_list> <table_expression> [ <offset_spec> ]
> >>
> >> -- which would correspond to the postgres syntax of offset.
> >>
> >> Since it's not part of SQL 92, a bit of prose would be required in
> >> the body of the spec, which I'd be happy to contribute if people
> >> don't protest.
> >>
> >> But: How does everyone else feel about this? In particular: Is
> >> this an issue for non-postgres databases (and I'm mainly thinking
> >> of MySQL and Oracle here; SQLlite has the feature and even syntax
> >> as proposed here)?
> >>
> >> Cheers,
> >>
> >> Markus
> >
> >
More information about the dal
mailing list