Proposals for VOSpace (paginated response)

Patrick Dowler pdowler.cadc at gmail.com
Fri Jun 18 23:00:15 CEST 2021


The current spec does support pagination when listing child nodes of a
container (*uri* and *limit* params), but implementation is complex. We
have two VOSpace implementations that illustrate quite well.

Impl 1: relational database + object store
Here, it is easy enough to implement pagination because it is just a couple
extra things injected into the SQL query to the DB. The server picks the
default order, but we also added support for a custom optional param so the
client could control the order: name, lastModified date, or contentLength.

Impl 2: only a posix file system
Here, it is really hard to implement pagination because the posix directory
listing APIs don't have any concept of order (iirc, I determined it lists
in inode order so you could get some strangeness if an inode is re-used --
rename? -- during listing). It also looks more or less impossible to scale
paginated listing with many children: with each request, you have to start
at the beginning of the list and skip over previously seen entries so it
gets slower and slower with each "page" of children. This service cannot
support the custom sorting on the server side either.

So, I would also like to improve the spec here but would like to see
something where a service that cannot support pagination (just stream
output) can be effectively used: clients will need to be able to figure out
which to expect or at least if they got all the rows or not. That really
means support for the *uri* parameter would be optional and maybe just
responding with an error with a specified "fault" term would suffice. The
*limit* param is easy enough to implement (like MAXREC in DAL standards) in
both cases.

--
Patrick Dowler
Canadian Astronomy Data Centre
Victoria, BC, Canada


On Wed, 16 Jun 2021 at 22:31, Dave Morris <dave.morris at metagrid.co.uk>
wrote:

> Hi Sonia,
>
> You raised several good suggestions in your email. To avoid confusion
> I'll reply to each one in a separate email thread.
>
> On 2021-06-11 13:31, Zorba, Sonia wrote:
> > 7. On the getNode endpoint add parameters to perform paginated
> > requests.
> > Useful for nodes having too many children.
>
> Paginated response sounds simple, but it turns out to be complicated to
> implement.
>
> We would need to define a design that does not put a heavy load on the
> server, can reliably handle the insertion or deletion of nodes between
> requests without producing duplicate rows in the results, and does not
> require the use of a relational database to implement it.
>
> As far as I know, everyone who has looked at this has decided that it is
> easier to do it on the client side than on the server side. Perhaps
> someone would like to look at this again and propose a definition for
> how a paginated response could work?
>
> For me, I see pagination as a client side display function rather than a
> server side data access function. Is there a strong use case for doing
> this on the server side ?
>
> Bear in mind that even if we did define a new property for pagination,
> existing version 2.1 services would not understand it. So unless we make
> the new property mandatory, everyone adopts the new standard, and we
> deprecate the version 2.1 standard, clients would still have to cope
> with large responses from version 2.1 services.
>
> Cheers
> -- Dave
>
> --------
> Dave Morris
> Research Software Engineer
> Wide Field Astronomy Unit
> Institute for Astronomy
> University of Edinburgh
> --------
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/grid/attachments/20210618/4ab7dbd8/attachment.html>


More information about the grid mailing list