Featherweight Publishing Registries

Walter Landry wlandry at caltech.edu
Sat Nov 5 18:09:39 CET 2016


Markus Demleitner wrote:
> On Fri, Oct 28, 2016 at 09:09:55PM -0700, Walter Landry wrote:
> > Just to be clear, Atom feeds are described by an IETF RFC [1], so it
> > is just as standardized as OAI-PMH.  In addition, Atom feed clients
> > are ubiquitous, there are a wide variety of Atom tools, and, of
> > course, Atom has far, far larger adoption than OAI-PMH.
> 
> ...but then it does something rather different.  Unless we completely
> overturn the way the Registry has worked, we need both full and
> incremental harvesting, and I can't see how either is possible with
> Atom (where the originating server determines what records it puts
> into its feed, and the harvester has no way of selecting "all",
> "yesterday's", "last week's", or whatever -- right?)

Atom supports this by requiring an <updated> element.  Fetching things
by time is a principle use case of Atom.  So I am confused by your
statement.

> >From your other, Fri, 28 Oct 2016 07:37:35 -0700 (PDT), mail:
> 
> > Harvesting Vizier's records takes more than a day.  That does not fit
> > my definition of "works well".  IRSA's implementation is also
> 
> Nah, not at all.  The whole VO Registry, including VizieR, can
> be fully re-harvested in deal less than an hour (ok, it takes a bit
> longer if you don't use sets=ivo_managed, but few components would
> have a reason to do that).  Incremental harvesting takes minutes at
> worst.  As a registry operator (both ends, publishing and harvesting)
> I'd maintain that it does work well.

Theresa Dower told me in Stellenbosch that it takes a day to fully
harvest Vizier.  As another data point, we are doing some iterations
on our registry right now, and it takes 20 minutes for the RofR to
create a report.

I do not doubt that these times could be improved.  Alberto
Accomazzi's experiences with arXiv show that it can be done.  I am
sure that I could make a service that returns null updates in less
than 100 milliseconds.  I would rather spend that effort on something
easier for publishing operators to implement [1].

> Anyway, we can talk here all day: It seems, Walter, that OAI-PMH is
> an itch that mainly you feel.

Apparently :(

Cheers,
Walter Landry

[1] As a not-so-random sample, there is exactly one package in Debian
    stable that deals with OAI-PMH.  In contrast, there are more than
    50 packages that deal with Atom.


More information about the registry mailing list