Proposals for VOSpace (paginated response)

Dave Morris dave.morris at metagrid.co.uk
Tue Jun 22 20:39:07 CEST 2021


In your implementation, do you know how many clients request the second 
or third page ?

I suspect many clients will ask for the first page, using limit, but few 
will actually ask for the subsequent pages.

Interesting to know how many.

-- Dave

On 2021-06-22 19:08, Patrick Dowler wrote:
> yeah, clients always need to make that extra call and it does seem a 
> little
> clunky... that's the same pattern (and necessity) in several object 
> store
> APIs I have been working with so vospace isn't alone here.
> 
> On the other hand, over in DAL services we put a bit of metadata at the 
> end
> of a VOTable (after the rows) to say that the result was truncated 
> (DALI
> "overflow")... so the caller knows whether there are more records they
> didn't see.
> 
> --
> Patrick Dowler
> Canadian Astronomy Data Centre
> Victoria, BC, Canada
> 
> 
> On Mon, 21 Jun 2021 at 01:57, Zorba, Sonia <sonia.zorba at inaf.it> wrote:
> 
>> Sorry, obviously I missed this too.
>> 
>> Maybe also expanding the first example could help.
>> I would add a first call with limit=100 and no uri parameter, to 
>> explain
>> that if the response contains 100 child nodes the client knows that it 
>> can
>> fetch the next results by setting the last child node URI in the uri
>> parameter of the next call.
>> 
>> What I don't like about this approach is that if there were exactly 
>> 100
>> child nodes the next call would return a single result, so it could be
>> avoided. It would be nice to have a "total count" parameter in the
>> response, to know the exact number of remaining pages, but I don't 
>> know if
>> this complicates too much the current implementations.
>> 
>> Cheers,
>> Sonia
>> 
>> 
>> Il giorno dom 20 giu 2021 alle ore 05:30 Dave Morris <
>> dave.morris at metagrid.co.uk> ha scritto:
>> 
>>> Apologies, I forgot this was in the specification.
>>> The text describing how this works is buried in the Response section 
>>> of
>>> the getNode method, which makes it easy to miss.
>>> 
>>> As a start, I've added an issue to revise the text to make this 
>>> clearer.
>>> https://github.com/ivoa-std/VOSpace/issues/4
>>> 
>>> This wouldn't change the technical definition, just promote the
>>> description of how pagination works into a separate sub-section 
>>> clearly
>>> labelled 'pagination'.
>>> 
>>> If we want to take it further, possibly by adding the exception that 
>>> Pat
>>> proposes below, then that would be a new issue.
>>> 
>>> -- Dave
>>> 
>>> On 2021-06-18 22:00, Patrick Dowler wrote:
>>> > The current spec does support pagination when listing child nodes of a
>>> > container (*uri* and *limit* params), but implementation is complex. We
>>> > have two VOSpace implementations that illustrate quite well.
>>> >
>>> > Impl 1: relational database + object store
>>> > Here, it is easy enough to implement pagination because it is just a
>>> > couple
>>> > extra things injected into the SQL query to the DB. The server picks
>>> > the
>>> > default order, but we also added support for a custom optional param so
>>> > the
>>> > client could control the order: name, lastModified date, or
>>> > contentLength.
>>> >
>>> > Impl 2: only a posix file system
>>> > Here, it is really hard to implement pagination because the posix
>>> > directory
>>> > listing APIs don't have any concept of order (iirc, I determined it
>>> > lists
>>> > in inode order so you could get some strangeness if an inode is re-used
>>> > --
>>> > rename? -- during listing). It also looks more or less impossible to
>>> > scale
>>> > paginated listing with many children: with each request, you have to
>>> > start
>>> > at the beginning of the list and skip over previously seen entries so
>>> > it
>>> > gets slower and slower with each "page" of children. This service
>>> > cannot
>>> > support the custom sorting on the server side either.
>>> >
>>> > So, I would also like to improve the spec here but would like to see
>>> > something where a service that cannot support pagination (just stream
>>> > output) can be effectively used: clients will need to be able to figure
>>> > out
>>> > which to expect or at least if they got all the rows or not. That
>>> > really
>>> > means support for the *uri* parameter would be optional and maybe just
>>> > responding with an error with a specified "fault" term would suffice.
>>> > The
>>> > *limit* param is easy enough to implement (like MAXREC in DAL
>>> > standards) in
>>> > both cases.
>>> >
>>> > --
>>> > Patrick Dowler
>>> > Canadian Astronomy Data Centre
>>> > Victoria, BC, Canada
>>> >
>>> >
>>> > On Wed, 16 Jun 2021 at 22:31, Dave Morris <dave.morris at metagrid.co.uk>
>>> > wrote:
>>> >
>>> >> Hi Sonia,
>>> >>
>>> >> You raised several good suggestions in your email. To avoid confusion
>>> >> I'll reply to each one in a separate email thread.
>>> >>
>>> >> On 2021-06-11 13:31, Zorba, Sonia wrote:
>>> >> > 7. On the getNode endpoint add parameters to perform paginated
>>> >> > requests.
>>> >> > Useful for nodes having too many children.
>>> >>
>>> >> Paginated response sounds simple, but it turns out to be complicated
>>> >> to
>>> >> implement.
>>> >>
>>> >> We would need to define a design that does not put a heavy load on the
>>> >> server, can reliably handle the insertion or deletion of nodes between
>>> >> requests without producing duplicate rows in the results, and does not
>>> >> require the use of a relational database to implement it.
>>> >>
>>> >> As far as I know, everyone who has looked at this has decided that it
>>> >> is
>>> >> easier to do it on the client side than on the server side. Perhaps
>>> >> someone would like to look at this again and propose a definition for
>>> >> how a paginated response could work?
>>> >>
>>> >> For me, I see pagination as a client side display function rather than
>>> >> a
>>> >> server side data access function. Is there a strong use case for doing
>>> >> this on the server side ?
>>> >>
>>> >> Bear in mind that even if we did define a new property for pagination,
>>> >> existing version 2.1 services would not understand it. So unless we
>>> >> make
>>> >> the new property mandatory, everyone adopts the new standard, and we
>>> >> deprecate the version 2.1 standard, clients would still have to cope
>>> >> with large responses from version 2.1 services.
>>> >>
>>> >> Cheers
>>> >> -- Dave
>>> >>
>>> >> --------
>>> >> Dave Morris
>>> >> Research Software Engineer
>>> >> Wide Field Astronomy Unit
>>> >> Institute for Astronomy
>>> >> University of Edinburgh
>>> >> --------
>>> >>
>>> >>
>>> 
>> 


More information about the grid mailing list