Proposals for VOSpace (paginated response)

Dave Morris dave.morris at metagrid.co.uk
Sun Jun 20 05:30:29 CEST 2021


Apologies, I forgot this was in the specification.
The text describing how this works is buried in the Response section of 
the getNode method, which makes it easy to miss.

As a start, I've added an issue to revise the text to make this clearer.
https://github.com/ivoa-std/VOSpace/issues/4

This wouldn't change the technical definition, just promote the 
description of how pagination works into a separate sub-section clearly 
labelled 'pagination'.

If we want to take it further, possibly by adding the exception that Pat 
proposes below, then that would be a new issue.

-- Dave

On 2021-06-18 22:00, Patrick Dowler wrote:
> The current spec does support pagination when listing child nodes of a
> container (*uri* and *limit* params), but implementation is complex. We
> have two VOSpace implementations that illustrate quite well.
> 
> Impl 1: relational database + object store
> Here, it is easy enough to implement pagination because it is just a 
> couple
> extra things injected into the SQL query to the DB. The server picks 
> the
> default order, but we also added support for a custom optional param so 
> the
> client could control the order: name, lastModified date, or 
> contentLength.
> 
> Impl 2: only a posix file system
> Here, it is really hard to implement pagination because the posix 
> directory
> listing APIs don't have any concept of order (iirc, I determined it 
> lists
> in inode order so you could get some strangeness if an inode is re-used 
> --
> rename? -- during listing). It also looks more or less impossible to 
> scale
> paginated listing with many children: with each request, you have to 
> start
> at the beginning of the list and skip over previously seen entries so 
> it
> gets slower and slower with each "page" of children. This service 
> cannot
> support the custom sorting on the server side either.
> 
> So, I would also like to improve the spec here but would like to see
> something where a service that cannot support pagination (just stream
> output) can be effectively used: clients will need to be able to figure 
> out
> which to expect or at least if they got all the rows or not. That 
> really
> means support for the *uri* parameter would be optional and maybe just
> responding with an error with a specified "fault" term would suffice. 
> The
> *limit* param is easy enough to implement (like MAXREC in DAL 
> standards) in
> both cases.
> 
> --
> Patrick Dowler
> Canadian Astronomy Data Centre
> Victoria, BC, Canada
> 
> 
> On Wed, 16 Jun 2021 at 22:31, Dave Morris <dave.morris at metagrid.co.uk>
> wrote:
> 
>> Hi Sonia,
>> 
>> You raised several good suggestions in your email. To avoid confusion
>> I'll reply to each one in a separate email thread.
>> 
>> On 2021-06-11 13:31, Zorba, Sonia wrote:
>> > 7. On the getNode endpoint add parameters to perform paginated
>> > requests.
>> > Useful for nodes having too many children.
>> 
>> Paginated response sounds simple, but it turns out to be complicated 
>> to
>> implement.
>> 
>> We would need to define a design that does not put a heavy load on the
>> server, can reliably handle the insertion or deletion of nodes between
>> requests without producing duplicate rows in the results, and does not
>> require the use of a relational database to implement it.
>> 
>> As far as I know, everyone who has looked at this has decided that it 
>> is
>> easier to do it on the client side than on the server side. Perhaps
>> someone would like to look at this again and propose a definition for
>> how a paginated response could work?
>> 
>> For me, I see pagination as a client side display function rather than 
>> a
>> server side data access function. Is there a strong use case for doing
>> this on the server side ?
>> 
>> Bear in mind that even if we did define a new property for pagination,
>> existing version 2.1 services would not understand it. So unless we 
>> make
>> the new property mandatory, everyone adopts the new standard, and we
>> deprecate the version 2.1 standard, clients would still have to cope
>> with large responses from version 2.1 services.
>> 
>> Cheers
>> -- Dave
>> 
>> --------
>> Dave Morris
>> Research Software Engineer
>> Wide Field Astronomy Unit
>> Institute for Astronomy
>> University of Edinburgh
>> --------
>> 
>> 


More information about the grid mailing list