Featherweight Publishing Registries
Theresa Dower
dower at stsci.edu
Fri Oct 28 23:20:14 CEST 2016
I've been waiting for a while stewing over thoughts on this and doing a bit of research before I wanted to weigh in. Hello.
It seems like most of the specific complaints about OAI-PMH are about the speed of individual implementations, and issues with implementation do not necessarily warrant inventing new standards and creating whole new implementation issues. As for the standard itself, I see "it's complicated", and yes, it is; synchronizing distributed record metatadata and retaining its history and provenance is a complicated thing. I also see notes that the standard itself, and implementation packages for it outside of the VO have not been updated recently. Does that mean they're abandoned, or are they *stable*? It seems to be a bit of both; the Open Archives Initiative have been developing new standards for sharing metadata of specifically web-based resources, and perhaps that is a thing one should look into more closely if one is set on moving away from PMH (I honestly haven't looked deeply enough at it to know if it's a standard for the transport system or metadata descriptions like VOResource). But in mulling over an idea like that, it is very important to note the Registry is one of very few places in the IVOA where we utilize the expertise of other disciplines instead of rolling our own solutions from scratch. This is economically and technically political as Francoise said: we claim that interoperability is key in the IVOA, and using standards from outside astronomy here gives us this interoperability; some folks' funding comes from outside astronomy, as well, and having hooks into other disciplines like library science allows us to work with archives we might not otherwise be able to. Standards that we adopt without rolling our own come with pre-existing reference implementations, validators, a community. These are not things to be let go just because a particular implementation is slow, or the very hard synchronization problem gets buggy enough to need human wrangling once or twice a year. (As Markus noted, one of our biggest ongoing problems is differences in record creation and publication times, which are, again, an implementation issue, and can be resolved using larger margins of error in the sort of incremental harvesting the standard was built for.)
So. What don't we like about OAI-PMH? Speed? Can we write faster implementations? Can we take the developer time to do so? I would very much love the time to rework STScI's implementation, as it was not intended for the distributed, load-balanced back end that currently runs it, and am planning to find that time this year. Are there newer library standards that have supplanted it, with their own community, validators, help? I don't know; I only spent a bit of time yesterday looking into this, but I'm very interested to know what anyone else has found that may be so much better than PMH to be worth the effort of every registry publisher and maintainer in the IVOA ecosystem to adopt. I get the feeling that at the standards level, what we have works, and the fact that it has worked for a rather long time without touching it is a thing to be celebrated rather than laughed at. We have a registry ecosystem including 20-some publishing registries and the RofR, each being separately maintained, and I'd argue that moving toward another standard for communicating between registries means committing the entire registry ecosystem to the change or is wasted effort. (Moving toward other standards for search is great! We need more good ones, and more good front ends on them!) So. Let's look at what any large open projects in digital archives outside of astronomy might be doing, but not lose sight of factoring in the time and effort to move the whole registry ecosystem or the danger of fracturing it into separate incompatible systems.
--Theresa
More information about the registry
mailing list