ownedAuthority element

Ray Plante rplante at ncsa.uiuc.edu
Wed Apr 6 10:42:54 PDT 2005


Hi Kevin,

On Wed, 6 Apr 2005, KevinBenson wrote:
> In general since we have several other areas with curation and things going
> on I would like to see it kept real simple and effectively have these 3
> rules.
> 
> 1.) harvestee when returning records in OAI returns records for authority
> ids managed or owned (from there Registry).

I take it from your concluding comment that you are indeed distinguishing 
between managed and owned.  If so, how does a harvestee know whether it 
should be returning "managed" records or "owned" records?  Are you 
assuming the use of sets or not?

> 2.) harvester *may* desire to skip harvesting certain registries if
> discovered an authority id has already been received from another registry
> that has that particular authority id as being "managed".
> 3.) If harvester is a "Full Registry" then at an unknown time (presumably
> when it kicks off its harvests), it MUST harvest any registry type that it
> has been put as a <managedAuthority> and being managed by its own registry.

Okay, I'm not understanding what 3 is saying.  

Here's how I think 3 is suppose to work based on how <managedAuthority>
and <ownedAuthority> are (suppose to be) defined.  Assume a full registry 
has been seeded somehow, e.g. it's copied all records from another full 
registry.  It looks at each Registry record in its holdings.  If it sees 
that it has <managedAuthority> elements, it will add that registry to
the list it will harvest from it.  It will also have to note in its
list that it will need to harvest "managed" records as well as which
authority IDs are listed as managed.  

Next, the full registry looks through the Registry records again, this
time looking at the <ownedAuthority> elements.  If an authorityID was 
not listed as a <managedAuthority> in one of the registries in the
list to harvest from, the registry described by the record is added to
the list of registries to harvest.  It also notes that it will have to
harvest only "owned" records from this registry.

At harvest time, it cycles through its harvestee list.  When it
harvests, it has to tell the harvestee (somehow) that it wants
*either* only "managed" records or "owned" records.  If we don't have
this capability (e.g. two OAI sets), then the harvester will have to
sort them out by looking at the identifiers, classifying them by their
authority IDs in order to determine what to take and what to throw
away.  

Have I got this right?  

If so, I don't think it's so simple because:
  o  the harvesting registry has to keep track of all these authority
       IDs to figure out who and how to harvest.  (This, of course,
       assumes that harvesting is done from all harvestees together at
       once, whatever that means.) 
  o  it does not guarantee that you won't get duplicates.  What if
       only *some* of a registry's "owned" records are listed as
       "managed" in another Registry record?  If you add a registry to
       pick up the missing authority IDs, you'll also get ones you
       already got.  The harvester then has to decide which to take.  
  o  if we don't have OAI sets, the harvester has to sort out "managed"
       from "owned" after pulling the records over.
  o  it requires interactive coordination between registry
       administrators to set up the hierarchy which is simply not
       necessary.   
  o  it is prone to error as the "managed" and "owned" records must be
       continually be kept up to date as new authority IDs are created.
       (And I'm concerned about what synchronization latency will do.)

Clearly using sets is much simpler than this.  

In contrast, consider these 2 rules:
  1) if a full registry is harvesting from a registry in its own 
     VO project, it does so with set=ivo_managed.   
  2) if a full registry is harvesting from a full registry from another VO 
     project, it does so with set=ivo_voproject.  

How does the full registry know which harvestees are in its project?
The <voProject> value in the harvestee's Registry record says so.  How
does the full registry know which harvestees can provide records for
its whole project?  The harvestee's metadata says it supports the
ivo_voproject set.  

Unless a harvestee screws up in its implementation, the harvester will not 
get any duplicate records.  Not only does the harvester not have to track 
authority IDs, it doesn't really have to look at the contents of the the 
records at all, apart from what it takes to load it in to the local DB.  
<managedAuthority> and <ownedAuthority> are not needed.  There's no
single "managing" registry; just "projects" that registries join.  No
human coordination is required; a publishing registry administrator
just indicates which project it's to be associated with.  Want to change 
projects?  Change the <voProject> value.  

To see how our approaches are similar, your <ManagedAuthority> is
replaced with <voProject>.  There will be fewer projects than
authorities, so this should be more "manageable".  

cheers,
Ray










More information about the registry mailing list