Proposals for VOSpace (paginated response)

Patrick Dowler pdowler.cadc at gmail.com
Tue Jun 22 20:08:33 CEST 2021


yeah, clients always need to make that extra call and it does seem a little
clunky... that's the same pattern (and necessity) in several object store
APIs I have been working with so vospace isn't alone here.

On the other hand, over in DAL services we put a bit of metadata at the end
of a VOTable (after the rows) to say that the result was truncated (DALI
"overflow")... so the caller knows whether there are more records they
didn't see.

--
Patrick Dowler
Canadian Astronomy Data Centre
Victoria, BC, Canada


On Mon, 21 Jun 2021 at 01:57, Zorba, Sonia <sonia.zorba at inaf.it> wrote:

> Sorry, obviously I missed this too.
>
> Maybe also expanding the first example could help.
> I would add a first call with limit=100 and no uri parameter, to explain
> that if the response contains 100 child nodes the client knows that it can
> fetch the next results by setting the last child node URI in the uri
> parameter of the next call.
>
> What I don't like about this approach is that if there were exactly 100
> child nodes the next call would return a single result, so it could be
> avoided. It would be nice to have a "total count" parameter in the
> response, to know the exact number of remaining pages, but I don't know if
> this complicates too much the current implementations.
>
> Cheers,
> Sonia
>
>
> Il giorno dom 20 giu 2021 alle ore 05:30 Dave Morris <
> dave.morris at metagrid.co.uk> ha scritto:
>
>> Apologies, I forgot this was in the specification.
>> The text describing how this works is buried in the Response section of
>> the getNode method, which makes it easy to miss.
>>
>> As a start, I've added an issue to revise the text to make this clearer.
>> https://github.com/ivoa-std/VOSpace/issues/4
>>
>> This wouldn't change the technical definition, just promote the
>> description of how pagination works into a separate sub-section clearly
>> labelled 'pagination'.
>>
>> If we want to take it further, possibly by adding the exception that Pat
>> proposes below, then that would be a new issue.
>>
>> -- Dave
>>
>> On 2021-06-18 22:00, Patrick Dowler wrote:
>> > The current spec does support pagination when listing child nodes of a
>> > container (*uri* and *limit* params), but implementation is complex. We
>> > have two VOSpace implementations that illustrate quite well.
>> >
>> > Impl 1: relational database + object store
>> > Here, it is easy enough to implement pagination because it is just a
>> > couple
>> > extra things injected into the SQL query to the DB. The server picks
>> > the
>> > default order, but we also added support for a custom optional param so
>> > the
>> > client could control the order: name, lastModified date, or
>> > contentLength.
>> >
>> > Impl 2: only a posix file system
>> > Here, it is really hard to implement pagination because the posix
>> > directory
>> > listing APIs don't have any concept of order (iirc, I determined it
>> > lists
>> > in inode order so you could get some strangeness if an inode is re-used
>> > --
>> > rename? -- during listing). It also looks more or less impossible to
>> > scale
>> > paginated listing with many children: with each request, you have to
>> > start
>> > at the beginning of the list and skip over previously seen entries so
>> > it
>> > gets slower and slower with each "page" of children. This service
>> > cannot
>> > support the custom sorting on the server side either.
>> >
>> > So, I would also like to improve the spec here but would like to see
>> > something where a service that cannot support pagination (just stream
>> > output) can be effectively used: clients will need to be able to figure
>> > out
>> > which to expect or at least if they got all the rows or not. That
>> > really
>> > means support for the *uri* parameter would be optional and maybe just
>> > responding with an error with a specified "fault" term would suffice.
>> > The
>> > *limit* param is easy enough to implement (like MAXREC in DAL
>> > standards) in
>> > both cases.
>> >
>> > --
>> > Patrick Dowler
>> > Canadian Astronomy Data Centre
>> > Victoria, BC, Canada
>> >
>> >
>> > On Wed, 16 Jun 2021 at 22:31, Dave Morris <dave.morris at metagrid.co.uk>
>> > wrote:
>> >
>> >> Hi Sonia,
>> >>
>> >> You raised several good suggestions in your email. To avoid confusion
>> >> I'll reply to each one in a separate email thread.
>> >>
>> >> On 2021-06-11 13:31, Zorba, Sonia wrote:
>> >> > 7. On the getNode endpoint add parameters to perform paginated
>> >> > requests.
>> >> > Useful for nodes having too many children.
>> >>
>> >> Paginated response sounds simple, but it turns out to be complicated
>> >> to
>> >> implement.
>> >>
>> >> We would need to define a design that does not put a heavy load on the
>> >> server, can reliably handle the insertion or deletion of nodes between
>> >> requests without producing duplicate rows in the results, and does not
>> >> require the use of a relational database to implement it.
>> >>
>> >> As far as I know, everyone who has looked at this has decided that it
>> >> is
>> >> easier to do it on the client side than on the server side. Perhaps
>> >> someone would like to look at this again and propose a definition for
>> >> how a paginated response could work?
>> >>
>> >> For me, I see pagination as a client side display function rather than
>> >> a
>> >> server side data access function. Is there a strong use case for doing
>> >> this on the server side ?
>> >>
>> >> Bear in mind that even if we did define a new property for pagination,
>> >> existing version 2.1 services would not understand it. So unless we
>> >> make
>> >> the new property mandatory, everyone adopts the new standard, and we
>> >> deprecate the version 2.1 standard, clients would still have to cope
>> >> with large responses from version 2.1 services.
>> >>
>> >> Cheers
>> >> -- Dave
>> >>
>> >> --------
>> >> Dave Morris
>> >> Research Software Engineer
>> >> Wide Field Astronomy Unit
>> >> Institute for Astronomy
>> >> University of Edinburgh
>> >> --------
>> >>
>> >>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/grid/attachments/20210622/5eb9a696/attachment.html>


More information about the grid mailing list