Proposals for VOSpace (paginated response)

Zorba, Sonia sonia.zorba at inaf.it
Mon Jun 21 10:57:34 CEST 2021


Sorry, obviously I missed this too.

Maybe also expanding the first example could help.
I would add a first call with limit=100 and no uri parameter, to explain
that if the response contains 100 child nodes the client knows that it can
fetch the next results by setting the last child node URI in the uri
parameter of the next call.

What I don't like about this approach is that if there were exactly 100
child nodes the next call would return a single result, so it could be
avoided. It would be nice to have a "total count" parameter in the
response, to know the exact number of remaining pages, but I don't know if
this complicates too much the current implementations.

Cheers,
Sonia


Il giorno dom 20 giu 2021 alle ore 05:30 Dave Morris <
dave.morris at metagrid.co.uk> ha scritto:

> Apologies, I forgot this was in the specification.
> The text describing how this works is buried in the Response section of
> the getNode method, which makes it easy to miss.
>
> As a start, I've added an issue to revise the text to make this clearer.
> https://github.com/ivoa-std/VOSpace/issues/4
>
> This wouldn't change the technical definition, just promote the
> description of how pagination works into a separate sub-section clearly
> labelled 'pagination'.
>
> If we want to take it further, possibly by adding the exception that Pat
> proposes below, then that would be a new issue.
>
> -- Dave
>
> On 2021-06-18 22:00, Patrick Dowler wrote:
> > The current spec does support pagination when listing child nodes of a
> > container (*uri* and *limit* params), but implementation is complex. We
> > have two VOSpace implementations that illustrate quite well.
> >
> > Impl 1: relational database + object store
> > Here, it is easy enough to implement pagination because it is just a
> > couple
> > extra things injected into the SQL query to the DB. The server picks
> > the
> > default order, but we also added support for a custom optional param so
> > the
> > client could control the order: name, lastModified date, or
> > contentLength.
> >
> > Impl 2: only a posix file system
> > Here, it is really hard to implement pagination because the posix
> > directory
> > listing APIs don't have any concept of order (iirc, I determined it
> > lists
> > in inode order so you could get some strangeness if an inode is re-used
> > --
> > rename? -- during listing). It also looks more or less impossible to
> > scale
> > paginated listing with many children: with each request, you have to
> > start
> > at the beginning of the list and skip over previously seen entries so
> > it
> > gets slower and slower with each "page" of children. This service
> > cannot
> > support the custom sorting on the server side either.
> >
> > So, I would also like to improve the spec here but would like to see
> > something where a service that cannot support pagination (just stream
> > output) can be effectively used: clients will need to be able to figure
> > out
> > which to expect or at least if they got all the rows or not. That
> > really
> > means support for the *uri* parameter would be optional and maybe just
> > responding with an error with a specified "fault" term would suffice.
> > The
> > *limit* param is easy enough to implement (like MAXREC in DAL
> > standards) in
> > both cases.
> >
> > --
> > Patrick Dowler
> > Canadian Astronomy Data Centre
> > Victoria, BC, Canada
> >
> >
> > On Wed, 16 Jun 2021 at 22:31, Dave Morris <dave.morris at metagrid.co.uk>
> > wrote:
> >
> >> Hi Sonia,
> >>
> >> You raised several good suggestions in your email. To avoid confusion
> >> I'll reply to each one in a separate email thread.
> >>
> >> On 2021-06-11 13:31, Zorba, Sonia wrote:
> >> > 7. On the getNode endpoint add parameters to perform paginated
> >> > requests.
> >> > Useful for nodes having too many children.
> >>
> >> Paginated response sounds simple, but it turns out to be complicated
> >> to
> >> implement.
> >>
> >> We would need to define a design that does not put a heavy load on the
> >> server, can reliably handle the insertion or deletion of nodes between
> >> requests without producing duplicate rows in the results, and does not
> >> require the use of a relational database to implement it.
> >>
> >> As far as I know, everyone who has looked at this has decided that it
> >> is
> >> easier to do it on the client side than on the server side. Perhaps
> >> someone would like to look at this again and propose a definition for
> >> how a paginated response could work?
> >>
> >> For me, I see pagination as a client side display function rather than
> >> a
> >> server side data access function. Is there a strong use case for doing
> >> this on the server side ?
> >>
> >> Bear in mind that even if we did define a new property for pagination,
> >> existing version 2.1 services would not understand it. So unless we
> >> make
> >> the new property mandatory, everyone adopts the new standard, and we
> >> deprecate the version 2.1 standard, clients would still have to cope
> >> with large responses from version 2.1 services.
> >>
> >> Cheers
> >> -- Dave
> >>
> >> --------
> >> Dave Morris
> >> Research Software Engineer
> >> Wide Field Astronomy Unit
> >> Institute for Astronomy
> >> University of Edinburgh
> >> --------
> >>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/grid/attachments/20210621/4c5c603e/attachment-0001.html>


More information about the grid mailing list