<div dir="ltr">Hi Markus,<div><br></div><div>I think what you propose will be quite useful for a lot of smaller provider that don't want to hassle with setting up their own OAI server, so +1 from me. In terms of implementation, there is at least one gateway-based OAI setup that I'm aware of: <a href="http://srepod.sourceforge.net/">http://srepod.sourceforge.net/</a></div><div><br></div><div>As to the larger issue of whether OAI-PMH should still be at the core of the metadata sharing for the registry, I think it's no secret that the authors themselves have acknowledged that it's not perfect and has some flaws (Markus highlighted one), nonetheless it's still widely used so I think scrapping now without having decided on another widely supported standard would be a mistake. I mentioned ResourceSync because from what I know this is the most likely candidate protocol which will replace the functionality of OAI-PMH.</div><div><br></div><div>-- Alberto</div><div><br></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Oct 22, 2016 at 7:58 AM, Markus Demleitner <span dir="ltr"><<a href="mailto:msdemlei@ari.uni-heidelberg.de" target="_blank">msdemlei@ari.uni-heidelberg.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear Registry folks,<br>
<span class=""><br>
On Fri, Oct 21, 2016 at 11:42:59PM +0200, Accomazzi, Alberto wrote:<br>
> Even if we decided that OAI-PMH is now getting a bit long in the tooth, I<br>
> would suggest considering more modern and widespread standards designed for<br>
> this purpose rather than reinventing the wheel. Many of us (I think)<br>
> already support web crawlers by using the sitemap protocol (<br>
> <a href="http://www.sitemaps.org/protocol.html" rel="noreferrer" target="_blank">http://www.sitemaps.org/<wbr>protocol.html</a>). We could go even further and use<br>
> ResourceSync which simply builds on top of sitemap and is expected to be<br>
> the successor to OAI-PMH: <a href="http://www.openarchives.org/rs/toc" rel="noreferrer" target="_blank">http://www.openarchives.org/<wbr>rs/toc</a><br>
<br>
</span>Given that OAI-PMH has at least one trap many implementors regularly<br>
step into (the dateUpdated in the record vs. the date a record<br>
actually appears in whatever the OAI-PMH service), I'd be interested<br>
in following the upstream developments, and I'd be happy if someone<br>
from the VO community could participate in whatever group pursues<br>
these (and perhaps report on them in our WG meetings during the<br>
interops).<br>
<br>
Having said that, I don't believe this will scratch Walter's itch.<br>
While I believe that for a place the size of IRSA, a working OAI-PMH<br>
interface is highly desirable, I have indeed been planning for a<br>
while to offer a more lightweight process, in particular to<br>
operators that only run very few services but still want to keep<br>
the resource records on their local systems (rather than in the web<br>
interfaces offered by ESAVO and STScI): a proxy publishing registry<br>
(let's call it purx for now).<br>
<br>
Essentially, the data providers would submit a URL purx pulls a<br>
resource record from, and after validation, purx puts this record<br>
into is ivo_managed resource records, so regular OAI-PMH harvesters<br>
will find it. Purx then will, once a day or so, check if anything<br>
has changed on the remote side, and if so, re-download the record and<br>
push it out to incremental harvesters. If the record becomes<br>
invalid, mails will be sent to the contact person in the registry<br>
record, if it vanishes, a deleted record will be pushed out by purx.<br>
<br>
In that way, the harvesters still only need to talk to OAI-PMH<br>
services and not need hit hundreds or thousands of machines with a<br>
handful of records each, while small data centers don't have to<br>
bother with OAI-PMH and can still programmatically generate their<br>
resource records. There is, of course, a small downside: The ivoids<br>
of the services will have to be managed by purx, and all of these<br>
records will be under one authority. That would actually be by<br>
design: Registering and properly managing the authority record is<br>
another chore I'd rather spare the small data centers.<br>
<br>
There are two reasons why this doesn't exist:<br>
<br>
(1) I've always hoped someone else would build a service like this<br>
(any takers?)<br>
<br>
(2) I never had a concrete candidate who'd be using it (although I<br>
strongly suspect that once it's there and properly documented,<br>
there'd be quite a few)<br>
<br>
So -- if you can do something about (1) or (2) -- by all means do<br>
speak up.<br>
<span class="HOEnZb"><font color="#888888"><br>
-- Markus<br>
</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div>Dr. Alberto Accomazzi<br>Principal Investigator</div><div>NASA Astrophysics Data System - <a href="http://ads.harvard.edu" target="_blank">http://ads.harvard.edu</a><br>Harvard-Smithsonian Center for Astrophysics - <a href="http://www.cfa.harvard.edu" target="_blank">http://www.cfa.harvard.edu</a><br>60 Garden St, MS 83, Cambridge, MA 02138, USA</div></div></div>
</div>