Proposals for VOSpace (paginated response)

Patrick Dowler pdowler.cadc at gmail.com
Tue Jun 22 20:54:52 CEST 2021


The clients (a browser storage UI and "vls" command line) always get all
pages :-)
The pagination mechanism allowed us to constrain server resources used (to
a single page result set from query) because our current vospace server
library doesn't stream the result... I'd prefer streaming the result but it
will take some work to refactor parts of the server side code to do it.

Also, if the client side uses DOM-based xml parsing they likely prefer
pages (modest sized documents) to huge xml payloads even if they intend to
list everything... yeah, they could have built on an event-based xml parser
instead for scalability, but that's also a tough retrofit if you used a DOM
library.

--
Patrick Dowler
Canadian Astronomy Data Centre
Victoria, BC, Canada


On Tue, 22 Jun 2021 at 11:39, Dave Morris <dave.morris at metagrid.co.uk>
wrote:

> In your implementation, do you know how many clients request the second
> or third page ?
>
> I suspect many clients will ask for the first page, using limit, but few
> will actually ask for the subsequent pages.
>
> Interesting to know how many.
>
> -- Dave
>
> On 2021-06-22 19:08, Patrick Dowler wrote:
> > yeah, clients always need to make that extra call and it does seem a
> > little
> > clunky... that's the same pattern (and necessity) in several object
> > store
> > APIs I have been working with so vospace isn't alone here.
> >
> > On the other hand, over in DAL services we put a bit of metadata at the
> > end
> > of a VOTable (after the rows) to say that the result was truncated
> > (DALI
> > "overflow")... so the caller knows whether there are more records they
> > didn't see.
> >
> > --
> > Patrick Dowler
> > Canadian Astronomy Data Centre
> > Victoria, BC, Canada
> >
> >
> > On Mon, 21 Jun 2021 at 01:57, Zorba, Sonia <sonia.zorba at inaf.it> wrote:
> >
> >> Sorry, obviously I missed this too.
> >>
> >> Maybe also expanding the first example could help.
> >> I would add a first call with limit=100 and no uri parameter, to
> >> explain
> >> that if the response contains 100 child nodes the client knows that it
> >> can
> >> fetch the next results by setting the last child node URI in the uri
> >> parameter of the next call.
> >>
> >> What I don't like about this approach is that if there were exactly
> >> 100
> >> child nodes the next call would return a single result, so it could be
> >> avoided. It would be nice to have a "total count" parameter in the
> >> response, to know the exact number of remaining pages, but I don't
> >> know if
> >> this complicates too much the current implementations.
> >>
> >> Cheers,
> >> Sonia
> >>
> >>
> >> Il giorno dom 20 giu 2021 alle ore 05:30 Dave Morris <
> >> dave.morris at metagrid.co.uk> ha scritto:
> >>
> >>> Apologies, I forgot this was in the specification.
> >>> The text describing how this works is buried in the Response section
> >>> of
> >>> the getNode method, which makes it easy to miss.
> >>>
> >>> As a start, I've added an issue to revise the text to make this
> >>> clearer.
> >>> https://github.com/ivoa-std/VOSpace/issues/4
> >>>
> >>> This wouldn't change the technical definition, just promote the
> >>> description of how pagination works into a separate sub-section
> >>> clearly
> >>> labelled 'pagination'.
> >>>
> >>> If we want to take it further, possibly by adding the exception that
> >>> Pat
> >>> proposes below, then that would be a new issue.
> >>>
> >>> -- Dave
> >>>
> >>> On 2021-06-18 22:00, Patrick Dowler wrote:
> >>> > The current spec does support pagination when listing child nodes of
> a
> >>> > container (*uri* and *limit* params), but implementation is complex.
> We
> >>> > have two VOSpace implementations that illustrate quite well.
> >>> >
> >>> > Impl 1: relational database + object store
> >>> > Here, it is easy enough to implement pagination because it is just a
> >>> > couple
> >>> > extra things injected into the SQL query to the DB. The server picks
> >>> > the
> >>> > default order, but we also added support for a custom optional param
> so
> >>> > the
> >>> > client could control the order: name, lastModified date, or
> >>> > contentLength.
> >>> >
> >>> > Impl 2: only a posix file system
> >>> > Here, it is really hard to implement pagination because the posix
> >>> > directory
> >>> > listing APIs don't have any concept of order (iirc, I determined it
> >>> > lists
> >>> > in inode order so you could get some strangeness if an inode is
> re-used
> >>> > --
> >>> > rename? -- during listing). It also looks more or less impossible to
> >>> > scale
> >>> > paginated listing with many children: with each request, you have to
> >>> > start
> >>> > at the beginning of the list and skip over previously seen entries so
> >>> > it
> >>> > gets slower and slower with each "page" of children. This service
> >>> > cannot
> >>> > support the custom sorting on the server side either.
> >>> >
> >>> > So, I would also like to improve the spec here but would like to see
> >>> > something where a service that cannot support pagination (just stream
> >>> > output) can be effectively used: clients will need to be able to
> figure
> >>> > out
> >>> > which to expect or at least if they got all the rows or not. That
> >>> > really
> >>> > means support for the *uri* parameter would be optional and maybe
> just
> >>> > responding with an error with a specified "fault" term would suffice.
> >>> > The
> >>> > *limit* param is easy enough to implement (like MAXREC in DAL
> >>> > standards) in
> >>> > both cases.
> >>> >
> >>> > --
> >>> > Patrick Dowler
> >>> > Canadian Astronomy Data Centre
> >>> > Victoria, BC, Canada
> >>> >
> >>> >
> >>> > On Wed, 16 Jun 2021 at 22:31, Dave Morris <
> dave.morris at metagrid.co.uk>
> >>> > wrote:
> >>> >
> >>> >> Hi Sonia,
> >>> >>
> >>> >> You raised several good suggestions in your email. To avoid
> confusion
> >>> >> I'll reply to each one in a separate email thread.
> >>> >>
> >>> >> On 2021-06-11 13:31, Zorba, Sonia wrote:
> >>> >> > 7. On the getNode endpoint add parameters to perform paginated
> >>> >> > requests.
> >>> >> > Useful for nodes having too many children.
> >>> >>
> >>> >> Paginated response sounds simple, but it turns out to be complicated
> >>> >> to
> >>> >> implement.
> >>> >>
> >>> >> We would need to define a design that does not put a heavy load on
> the
> >>> >> server, can reliably handle the insertion or deletion of nodes
> between
> >>> >> requests without producing duplicate rows in the results, and does
> not
> >>> >> require the use of a relational database to implement it.
> >>> >>
> >>> >> As far as I know, everyone who has looked at this has decided that
> it
> >>> >> is
> >>> >> easier to do it on the client side than on the server side. Perhaps
> >>> >> someone would like to look at this again and propose a definition
> for
> >>> >> how a paginated response could work?
> >>> >>
> >>> >> For me, I see pagination as a client side display function rather
> than
> >>> >> a
> >>> >> server side data access function. Is there a strong use case for
> doing
> >>> >> this on the server side ?
> >>> >>
> >>> >> Bear in mind that even if we did define a new property for
> pagination,
> >>> >> existing version 2.1 services would not understand it. So unless we
> >>> >> make
> >>> >> the new property mandatory, everyone adopts the new standard, and we
> >>> >> deprecate the version 2.1 standard, clients would still have to cope
> >>> >> with large responses from version 2.1 services.
> >>> >>
> >>> >> Cheers
> >>> >> -- Dave
> >>> >>
> >>> >> --------
> >>> >> Dave Morris
> >>> >> Research Software Engineer
> >>> >> Wide Field Astronomy Unit
> >>> >> Institute for Astronomy
> >>> >> University of Edinburgh
> >>> >> --------
> >>> >>
> >>> >>
> >>>
> >>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/grid/attachments/20210622/e88d3536/attachment.html>


More information about the grid mailing list