Featherweight Publishing Registries

Accomazzi, Alberto aaccomazzi at cfa.harvard.edu
Sat Oct 22 09:58:59 CEST 2016


Hi Markus,

I think what you propose will be quite useful for a lot of smaller provider
that don't want to hassle with setting up their own OAI server, so +1 from
me.  In terms of implementation, there is at least one gateway-based OAI
setup that I'm aware of: http://srepod.sourceforge.net/

As to the larger issue of whether OAI-PMH should still be at the core of
the metadata sharing for the registry, I think it's no secret that the
authors themselves have acknowledged that it's not perfect and has some
flaws (Markus highlighted one), nonetheless it's still widely used so I
think scrapping now without having decided on another widely supported
standard would be a mistake.  I mentioned ResourceSync because from what I
know this is the most likely candidate protocol which will replace the
functionality of OAI-PMH.

-- Alberto




On Sat, Oct 22, 2016 at 7:58 AM, Markus Demleitner <
msdemlei at ari.uni-heidelberg.de> wrote:

> Dear Registry folks,
>
> On Fri, Oct 21, 2016 at 11:42:59PM +0200, Accomazzi, Alberto wrote:
> > Even if we decided that OAI-PMH is now getting a bit long in the tooth, I
> > would suggest considering more modern and widespread standards designed
> for
> > this purpose rather than reinventing the wheel.  Many of us (I think)
> > already support web crawlers by using the sitemap protocol (
> > http://www.sitemaps.org/protocol.html).  We could go even further and
> use
> > ResourceSync which simply builds on top of sitemap and is expected to be
> > the successor to OAI-PMH: http://www.openarchives.org/rs/toc
>
> Given that OAI-PMH has at least one trap many implementors regularly
> step into (the dateUpdated in the record vs. the date a record
> actually appears in whatever the OAI-PMH service), I'd be interested
> in following the upstream developments, and I'd be happy if someone
> from the VO community could participate in whatever group pursues
> these (and perhaps report on them in our WG meetings during the
> interops).
>
> Having said that, I don't believe this will scratch Walter's itch.
> While I believe that for a place the size of IRSA, a working OAI-PMH
> interface is highly desirable, I have indeed been planning for a
> while to offer a more lightweight process, in particular to
> operators that only run very few services but still want to keep
> the resource records on their local systems (rather than in the web
> interfaces offered by ESAVO and STScI): a proxy publishing registry
> (let's call it purx for now).
>
> Essentially, the data providers would submit a URL purx pulls a
> resource record from, and after validation, purx puts this record
> into is ivo_managed resource records, so regular OAI-PMH harvesters
> will find it.  Purx then will, once a day or so, check if anything
> has changed on the remote side, and if so, re-download the record and
> push it out to incremental harvesters.  If the record becomes
> invalid, mails will be sent to the contact person in the registry
> record, if it vanishes, a deleted record will be pushed out by purx.
>
> In that way, the harvesters still only need to talk to OAI-PMH
> services and not need hit hundreds or thousands of machines with a
> handful of records each, while small data centers don't have to
> bother with OAI-PMH and can still programmatically generate their
> resource records.  There is, of course, a small downside: The ivoids
> of the services will have to be managed by purx, and all of these
> records will be under one authority.  That would actually be by
> design: Registering and properly managing the authority record is
> another chore I'd rather spare the small data centers.
>
> There are two reasons why this doesn't exist:
>
> (1) I've always hoped someone else would build a service like this
> (any takers?)
>
> (2) I never had a concrete candidate who'd be using it (although I
> strongly suspect that once it's there and properly documented,
> there'd be quite a few)
>
> So -- if you can do something about (1) or (2) -- by all means do
> speak up.
>
>         -- Markus
>



-- 
Dr. Alberto Accomazzi
Principal Investigator
NASA Astrophysics Data System - http://ads.harvard.edu
Harvard-Smithsonian Center for Astrophysics - http://www.cfa.harvard.edu
60 Garden St, MS 83, Cambridge, MA 02138, USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/registry/attachments/20161022/cde37d3e/attachment-0001.html>


More information about the registry mailing list