Featherweight Publishing Registries

Thu Oct 27 11:06:32 CEST 2016

Hi Walter,

I guess I feel something of a need to speak up for OAI-PMH here...

On Tue, Oct 25, 2016 at 09:49:55AM -0700, Walter Landry wrote:
> The only reason I am looking for a new approach is because the current
> approach does not work that well.  Part of that is because the

Well, as someone who's harvesting both using OAI-PMH and, in my case,
plain TAP, I have to say I think it works well, and much better for
this purpose than plain TAP.  That shouldn't come as a surprise,
since (incremental) synchronisation is what OAI-PMH was designed for,
and...

> protocol is overly complicated for what we are doing.  That makes

...I also have to say pretty well designed, so I'd deny the "overly
complicated".  The DaCHS implementation of OAI-PMH is 400 lines,
comments and all included.  And that does paging, which you wouldn't
have to.

There's a significant complexity in coming up with the resoruce
records themselves, yes, but that's not changed whether or not you're
doing OAI-PMH or any other thing that transmits a few of these
records on one go.

As to OAI-PMH itself, the Identify, ListMetadataFormats, and ListSets
operations just require you to push out a static document after
checking arguments; that seems to me a small price to pay for
interoperability with the wider world of bibliography.  So, the only
thing you'll have to write code for are GetRecord (which is trivial:
Stick your record in an almost-static envelope and go), and
ListIdentifiers and ListRecords.  The interfaces for the latter two
are almost identical, and I'd be hard pressed to see how you could
further shrink them if you want to support incremental harvesting,
which I think is a must.

Sure, it'd be better still if we had a *different* interface for
determining the changes since the last harvest, probably based on a
VCS-like revision number, but that'd certainly be no simplification.

Finally, there's one thing that actually does add complexity: support
for the oai_dc metadata schema.  Yes.  I give you that's a service to
the outside world that's not giving any added value to our community
at all and makes things a bit more complicated because you have to
either keep two versions of your records or transform them on the
fly, and both options suck a bit.

So, if we were alone in the world, I'd say let's get rid of oai_dc
tomorrow.

But we aren't, and oai_dc is providing a friendly face to the rest of
the open data/electronic library/whatever communities.  It's a
moderate price to pay, in particular since the transformation from
ivo_vor to oai_dc is doable with an XSL stylesheet.

Bottom line: OAI-PMH for a moderately-sized data center (i.e., no
paging required) is so simple I'd probably not bother to look for an
existing implementation (I've not looked at mod_oai, though) if
you're writing your software yourself otherwise.  Just write the
couple of lines and avoid the few pitfalls (e.g., over-complicated
namespace management, abusing @updated in OAI-PMH).

         -- Markus