Harvesting from multiple registries
Ray Plante
rplante at poplar.ncsa.uiuc.edu
Wed Jan 28 06:03:33 PST 2004
Hi Matthew,
On Tue, 27 Jan 2004, Matthew Graham wrote:
> We want to harvest from CDS and NCSA but how do we handle records from CDS
> that NCSA already harvested? I suppose that they will still retain their
> CDS identifier and so we can then ignore duplicate records. However, the
> data will be more recent that from CDS and so how do we distinguish this
> from a modified record that has been directly published in NCSA, which we
> do not want to ignore? Or do we just ignore anything from another registry
> that we find in NCSA?
First, we don't harvest from anywhere else, so you don't have to worry
about this specifically.
In general, though, one would only ingest records from NCSA that
originated from NCSA. Here's a theoretical recipe that I don't think
anyone's tried and could probably use refinement:
1. On first harvesting, get my registry record by doing a ListRecords
with metadataPrefix=ivo_vor and set=Registry
2. Parse my Registry record to extract the authority IDs that I manage;
this is gotten from the ManagedAuthority elements.
3. Now get all my records using ListRecords with
metadataPrefix=ivo_vor. Only load those records whose AuthorityID
matches those I declared in my Registry record.
I reviewed this recipe against our registry and found some errors, which
hopefully I've fixed. Let Ramon and/or me know if you have trouble.
cheers,
Ray
PS: for onlookers, our OAI interface is at
http://nvo.ncsa.uiuc.edu/cgi-bin/nvo/oai.pl
More information about the registry
mailing list