OAI as VO Harvesting Interface
Ray Plante
rplante at poplar.ncsa.uiuc.edu
Tue Sep 9 10:43:17 PDT 2003
Hi,
As promised, here are a few more words about OAI as a Harvesting
interface. First for those not familiar with OAI, you can find a
discussion of the use of OAI within a VO registry framework via
http://rai.ncsa.uiuc.edu/~rplante/VO/metadata/evaloai.html.
The major advantages of using OAI:
1. it is an existing, field-tested standard that we do not have to
reinvent.
2. we can leverage existing tools for harvesting
3. it allows us to expose our records beyond the VO community.
1 and 2 are the strongest arguments for using OAI. 2 was important for
both prototype publishing registries at CalTech and NCSA. (The OAI
support in our VORegistry-in-a-Box package is an existing CGI script from
Virginia Tech; we just added the specific support for our resource
metadata.)
The primary disadvantage is that the OAI interface is not a SOAP-based Web
Service; it's defined in terms of HTTP Gets.
A possible variation that gets around the disadvantage and retains some of
the advantages is to define a WSDL version of the OAI operations. (I
believe Gretchen & Wil did something like this.) This would allow us to
reuse the OAI design. It would not be difficult to use a generic
HTTP-Get-to-Web-Service adapter layer which would allow us to fully retain
the advantages of 2 & 3.
Clive brought up an interesting point about supporting the hierarchical
nature our resources. In the OAI model, each resource it exposes is
described by a node of XML data. To the interface, there is no inherent
support for hierarchical relationships; this is encoded, if necessary,
within the domain-specific metadata. The recently proposed resource
metadata provides ways to express various relationships between resources.
(In my opinion, I would use the "Manager" item to refer to parental
containment, but others may disagree.) The important point is, the OAI
model for harvesting doesn't need to know about hierarchies; it's just
about synchronizing the contents of one registry with another.
Does the Harvesting interface need to know about hierarchies? You might
answer yes if you want to control harvesting based on where resources
exist within a hierarchy (e.g. only harvest three levels deep). I suspect
that in practice, this will not be so important. Given that OAI can
control harvesting based on registry-defined catagories, date, and
metadata format, it is likely that sufficiently similar filtering can be
done through another mechanism. I'd been interested to hear ideas to the
contrary.
All in all, I see tremendous advantage to adopting OAI for Harvesting in
some form. If we don't adopt it outright, we should at least borrow
heavily from it as it addresses the common issues associated with
efficient synching of information systems.
cheers,
Ray
More information about the registry
mailing list