Featherweight Publishing Registries

Menelaos Perdikeas mperdikeas at gmail.com
Mon Nov 7 15:07:01 CET 2016


I thought I might contribute a couple of data points in this discussion
even though I am not wearing the EuroVO Registry hat any more:
[1] a full harvest from Vizier used to take a couple of hours (and that's a
Java implementation that's populating a substantial database with all sorts
of audit trails - so things could be made more lightweight if one wished).
[2] while it took me more than "one good afternoon" to implement OAI-PMH
I'd like to speak in favour of OAI-PMH. The specification is concise and
very clear and being domain-agnostic makes it possible to produce a
compliant implementation without requiring an immersion into IVOA concepts
and its recommendations ecosystem.
Cheers,
Menelaus Perdikeas.


On Sat, Nov 5, 2016 at 1:09 PM, Walter Landry <wlandry at caltech.edu> wrote:

> Markus Demleitner wrote:
> > On Fri, Oct 28, 2016 at 09:09:55PM -0700, Walter Landry wrote:
> > > Just to be clear, Atom feeds are described by an IETF RFC [1], so it
> > > is just as standardized as OAI-PMH.  In addition, Atom feed clients
> > > are ubiquitous, there are a wide variety of Atom tools, and, of
> > > course, Atom has far, far larger adoption than OAI-PMH.
> >
> > ...but then it does something rather different.  Unless we completely
> > overturn the way the Registry has worked, we need both full and
> > incremental harvesting, and I can't see how either is possible with
> > Atom (where the originating server determines what records it puts
> > into its feed, and the harvester has no way of selecting "all",
> > "yesterday's", "last week's", or whatever -- right?)
>
> Atom supports this by requiring an <updated> element.  Fetching things
> by time is a principle use case of Atom.  So I am confused by your
> statement.
>
> > >From your other, Fri, 28 Oct 2016 07:37:35 -0700 (PDT), mail:
> >
> > > Harvesting Vizier's records takes more than a day.  That does not fit
> > > my definition of "works well".  IRSA's implementation is also
> >
> > Nah, not at all.  The whole VO Registry, including VizieR, can
> > be fully re-harvested in deal less than an hour (ok, it takes a bit
> > longer if you don't use sets=ivo_managed, but few components would
> > have a reason to do that).  Incremental harvesting takes minutes at
> > worst.  As a registry operator (both ends, publishing and harvesting)
> > I'd maintain that it does work well.
>
> Theresa Dower told me in Stellenbosch that it takes a day to fully
> harvest Vizier.  As another data point, we are doing some iterations
> on our registry right now, and it takes 20 minutes for the RofR to
> create a report.
>
> I do not doubt that these times could be improved.  Alberto
> Accomazzi's experiences with arXiv show that it can be done.  I am
> sure that I could make a service that returns null updates in less
> than 100 milliseconds.  I would rather spend that effort on something
> easier for publishing operators to implement [1].
>
> > Anyway, we can talk here all day: It seems, Walter, that OAI-PMH is
> > an itch that mainly you feel.
>
> Apparently :(
>
> Cheers,
> Walter Landry
>
> [1] As a not-so-random sample, there is exactly one package in Debian
>     stable that deals with OAI-PMH.  In contrast, there are more than
>     50 packages that deal with Atom.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/registry/attachments/20161107/d985c7b2/attachment.html>


More information about the registry mailing list