Harvesting from multiple registries

Ray Plante rplante at poplar.ncsa.uiuc.edu
Wed Jan 28 06:03:33 PST 2004


Hi Matthew,

On Tue, 27 Jan 2004, Matthew Graham wrote:
> We want to harvest from CDS and NCSA but how do we handle records from CDS
> that NCSA already harvested? I suppose that they will still retain their
> CDS identifier and so we can then ignore duplicate records. However, the
> data will be more recent that from CDS and so how do we distinguish this
> from a modified record that has been directly published in NCSA, which we
> do not want to ignore? Or do we just ignore anything from another registry
> that we find in NCSA?

First, we don't harvest from anywhere else, so you don't have to worry 
about this specifically.

In general, though, one would only ingest records from NCSA that 
originated from NCSA.  Here's a theoretical recipe that I don't think 
anyone's tried and could probably use refinement:

  1.  On first harvesting, get my registry record by doing a ListRecords 
      with metadataPrefix=ivo_vor and set=Registry

  2.  Parse my Registry record to extract the authority IDs that I manage; 
      this is gotten from the ManagedAuthority elements.

  3.  Now get all my records using ListRecords with 
      metadataPrefix=ivo_vor.  Only load those records whose AuthorityID
      matches those I declared in my Registry record.

I reviewed this recipe against our registry and found some errors, which 
hopefully I've fixed.  Let Ramon and/or me know if you have trouble.

cheers,
Ray

PS: for onlookers, our OAI interface is at 
http://nvo.ncsa.uiuc.edu/cgi-bin/nvo/oai.pl




More information about the registry mailing list