<div dir="ltr"><div><div><div><div>I thought I might contribute a couple of data points in this discussion even though I am not wearing the EuroVO Registry hat any more:<br></div>[1] a full harvest from Vizier used to take a couple of hours (and that's a Java implementation that's populating a substantial database with all sorts of audit trails - so things could be made more lightweight if one wished).<br></div>[2] while it took me more than "one good afternoon" to implement OAI-PMH I'd like to speak in favour of OAI-PMH. The specification is concise and very clear and being domain-agnostic makes it possible to produce a compliant implementation without requiring an immersion into IVOA concepts and its recommendations ecosystem.<br></div>Cheers,<br></div>Menelaus Perdikeas.<br><div><div><div><div><div><div><div><br></div></div></div></div></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Nov 5, 2016 at 1:09 PM, Walter Landry <span dir="ltr"><<a href="mailto:wlandry@caltech.edu" target="_blank">wlandry@caltech.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Markus Demleitner wrote:<br>
> On Fri, Oct 28, 2016 at 09:09:55PM -0700, Walter Landry wrote:<br>
> > Just to be clear, Atom feeds are described by an IETF RFC [1], so it<br>
> > is just as standardized as OAI-PMH. In addition, Atom feed clients<br>
> > are ubiquitous, there are a wide variety of Atom tools, and, of<br>
> > course, Atom has far, far larger adoption than OAI-PMH.<br>
><br>
> ...but then it does something rather different. Unless we completely<br>
> overturn the way the Registry has worked, we need both full and<br>
> incremental harvesting, and I can't see how either is possible with<br>
> Atom (where the originating server determines what records it puts<br>
> into its feed, and the harvester has no way of selecting "all",<br>
> "yesterday's", "last week's", or whatever -- right?)<br>
<br>
Atom supports this by requiring an <updated> element. Fetching things<br>
by time is a principle use case of Atom. So I am confused by your<br>
statement.<br>
<br>
> >From your other, Fri, 28 Oct 2016 07:37:35 -0700 (PDT), mail:<br>
><br>
> > Harvesting Vizier's records takes more than a day. That does not fit<br>
> > my definition of "works well". IRSA's implementation is also<br>
><br>
> Nah, not at all. The whole VO Registry, including VizieR, can<br>
> be fully re-harvested in deal less than an hour (ok, it takes a bit<br>
> longer if you don't use sets=ivo_managed, but few components would<br>
> have a reason to do that). Incremental harvesting takes minutes at<br>
> worst. As a registry operator (both ends, publishing and harvesting)<br>
> I'd maintain that it does work well.<br>
<br>
Theresa Dower told me in Stellenbosch that it takes a day to fully<br>
harvest Vizier. As another data point, we are doing some iterations<br>
on our registry right now, and it takes 20 minutes for the RofR to<br>
create a report.<br>
<br>
I do not doubt that these times could be improved. Alberto<br>
Accomazzi's experiences with arXiv show that it can be done. I am<br>
sure that I could make a service that returns null updates in less<br>
than 100 milliseconds. I would rather spend that effort on something<br>
easier for publishing operators to implement [1].<br>
<br>
> Anyway, we can talk here all day: It seems, Walter, that OAI-PMH is<br>
> an itch that mainly you feel.<br>
<br>
Apparently :(<br>
<br>
Cheers,<br>
Walter Landry<br>
<br>
[1] As a not-so-random sample, there is exactly one package in Debian<br>
stable that deals with OAI-PMH. In contrast, there are more than<br>
50 packages that deal with Atom.<br>
</blockquote></div><br></div>