Featherweight Publishing Registries

Markus Demleitner msdemlei at ari.uni-heidelberg.de
Sat Oct 22 07:58:45 CEST 2016


Dear Registry folks,

On Fri, Oct 21, 2016 at 11:42:59PM +0200, Accomazzi, Alberto wrote:
> Even if we decided that OAI-PMH is now getting a bit long in the tooth, I
> would suggest considering more modern and widespread standards designed for
> this purpose rather than reinventing the wheel.  Many of us (I think)
> already support web crawlers by using the sitemap protocol (
> http://www.sitemaps.org/protocol.html).  We could go even further and use
> ResourceSync which simply builds on top of sitemap and is expected to be
> the successor to OAI-PMH: http://www.openarchives.org/rs/toc

Given that OAI-PMH has at least one trap many implementors regularly
step into (the dateUpdated in the record vs. the date a record
actually appears in whatever the OAI-PMH service), I'd be interested
in following the upstream developments, and I'd be happy if someone
from the VO community could participate in whatever group pursues
these (and perhaps report on them in our WG meetings during the
interops).

Having said that, I don't believe this will scratch Walter's itch.
While I believe that for a place the size of IRSA, a working OAI-PMH
interface is highly desirable, I have indeed been planning for a
while to offer a more lightweight process, in particular to
operators that only run very few services but still want to keep
the resource records on their local systems (rather than in the web
interfaces offered by ESAVO and STScI): a proxy publishing registry
(let's call it purx for now).

Essentially, the data providers would submit a URL purx pulls a
resource record from, and after validation, purx puts this record
into is ivo_managed resource records, so regular OAI-PMH harvesters
will find it.  Purx then will, once a day or so, check if anything
has changed on the remote side, and if so, re-download the record and
push it out to incremental harvesters.  If the record becomes
invalid, mails will be sent to the contact person in the registry
record, if it vanishes, a deleted record will be pushed out by purx.

In that way, the harvesters still only need to talk to OAI-PMH
services and not need hit hundreds or thousands of machines with a
handful of records each, while small data centers don't have to
bother with OAI-PMH and can still programmatically generate their
resource records.  There is, of course, a small downside: The ivoids
of the services will have to be managed by purx, and all of these
records will be under one authority.  That would actually be by
design: Registering and properly managing the authority record is
another chore I'd rather spare the small data centers.

There are two reasons why this doesn't exist:

(1) I've always hoped someone else would build a service like this
(any takers?)

(2) I never had a concrete candidate who'd be using it (although I
strongly suspect that once it's there and properly documented,
there'd be quite a few)

So -- if you can do something about (1) or (2) -- by all means do
speak up.

        -- Markus


More information about the registry mailing list