OAI as VO Harvesting Interface

Ray Plante rplante at poplar.ncsa.uiuc.edu
Tue Sep 9 10:43:17 PDT 2003


Hi,

As promised, here are a few more words about OAI as a Harvesting 
interface.  First for those not familiar with OAI, you can find a 
discussion of the use of OAI within a VO registry framework via 
http://rai.ncsa.uiuc.edu/~rplante/VO/metadata/evaloai.html.

The major advantages of using OAI:
  1.  it is an existing, field-tested standard that we do not have to 
         reinvent. 
  2.  we can leverage existing tools for harvesting
  3.  it allows us to expose our records beyond the VO community.

1 and 2 are the strongest arguments for using OAI.  2 was important for 
both prototype publishing registries at CalTech and NCSA.  (The OAI 
support in our VORegistry-in-a-Box package is an existing CGI script from 
Virginia Tech; we just added the specific support for our resource 
metadata.)

The primary disadvantage is that the OAI interface is not a SOAP-based Web 
Service; it's defined in terms of HTTP Gets.  

A possible variation that gets around the disadvantage and retains some of 
the advantages is to define a WSDL version of the OAI operations.  (I 
believe Gretchen & Wil did something like this.)  This would allow us to 
reuse the OAI design.  It would not be difficult to use a generic 
HTTP-Get-to-Web-Service adapter layer which would allow us to fully retain 
the advantages of 2 & 3.

Clive brought up an interesting point about supporting the hierarchical 
nature our resources.  In the OAI model, each resource it exposes is 
described by a node of XML data.  To the interface, there is no inherent 
support for hierarchical relationships; this is encoded, if necessary, 
within the domain-specific metadata.  The recently proposed resource 
metadata provides ways to express various relationships between resources.  
(In my opinion, I would use the "Manager" item to refer to parental
containment, but others may disagree.)  The important point is, the OAI 
model for harvesting doesn't need to know about hierarchies; it's just 
about synchronizing the contents of one registry with another.  

Does the Harvesting interface need to know about hierarchies?  You might 
answer yes if you want to control harvesting based on where resources
exist within a hierarchy (e.g. only harvest three levels deep).  I suspect 
that in practice, this will not be so important.  Given that OAI can 
control harvesting based on registry-defined catagories, date, and 
metadata format, it is likely that sufficiently similar filtering can be 
done through another mechanism.  I'd been interested to hear ideas to the 
contrary.

All in all, I see tremendous advantage to adopting OAI for Harvesting in 
some form.  If we don't adopt it outright, we should at least borrow 
heavily from it as it addresses the common issues associated with 
efficient synching of information systems. 

cheers,
Ray







More information about the registry mailing list