ownedAuthority element
Ray Plante
rplante at ncsa.uiuc.edu
Wed Apr 6 10:42:54 PDT 2005
Hi Kevin,
On Wed, 6 Apr 2005, KevinBenson wrote:
> In general since we have several other areas with curation and things going
> on I would like to see it kept real simple and effectively have these 3
> rules.
>
> 1.) harvestee when returning records in OAI returns records for authority
> ids managed or owned (from there Registry).
I take it from your concluding comment that you are indeed distinguishing
between managed and owned. If so, how does a harvestee know whether it
should be returning "managed" records or "owned" records? Are you
assuming the use of sets or not?
> 2.) harvester *may* desire to skip harvesting certain registries if
> discovered an authority id has already been received from another registry
> that has that particular authority id as being "managed".
> 3.) If harvester is a "Full Registry" then at an unknown time (presumably
> when it kicks off its harvests), it MUST harvest any registry type that it
> has been put as a <managedAuthority> and being managed by its own registry.
Okay, I'm not understanding what 3 is saying.
Here's how I think 3 is suppose to work based on how <managedAuthority>
and <ownedAuthority> are (suppose to be) defined. Assume a full registry
has been seeded somehow, e.g. it's copied all records from another full
registry. It looks at each Registry record in its holdings. If it sees
that it has <managedAuthority> elements, it will add that registry to
the list it will harvest from it. It will also have to note in its
list that it will need to harvest "managed" records as well as which
authority IDs are listed as managed.
Next, the full registry looks through the Registry records again, this
time looking at the <ownedAuthority> elements. If an authorityID was
not listed as a <managedAuthority> in one of the registries in the
list to harvest from, the registry described by the record is added to
the list of registries to harvest. It also notes that it will have to
harvest only "owned" records from this registry.
At harvest time, it cycles through its harvestee list. When it
harvests, it has to tell the harvestee (somehow) that it wants
*either* only "managed" records or "owned" records. If we don't have
this capability (e.g. two OAI sets), then the harvester will have to
sort them out by looking at the identifiers, classifying them by their
authority IDs in order to determine what to take and what to throw
away.
Have I got this right?
If so, I don't think it's so simple because:
o the harvesting registry has to keep track of all these authority
IDs to figure out who and how to harvest. (This, of course,
assumes that harvesting is done from all harvestees together at
once, whatever that means.)
o it does not guarantee that you won't get duplicates. What if
only *some* of a registry's "owned" records are listed as
"managed" in another Registry record? If you add a registry to
pick up the missing authority IDs, you'll also get ones you
already got. The harvester then has to decide which to take.
o if we don't have OAI sets, the harvester has to sort out "managed"
from "owned" after pulling the records over.
o it requires interactive coordination between registry
administrators to set up the hierarchy which is simply not
necessary.
o it is prone to error as the "managed" and "owned" records must be
continually be kept up to date as new authority IDs are created.
(And I'm concerned about what synchronization latency will do.)
Clearly using sets is much simpler than this.
In contrast, consider these 2 rules:
1) if a full registry is harvesting from a registry in its own
VO project, it does so with set=ivo_managed.
2) if a full registry is harvesting from a full registry from another VO
project, it does so with set=ivo_voproject.
How does the full registry know which harvestees are in its project?
The <voProject> value in the harvestee's Registry record says so. How
does the full registry know which harvestees can provide records for
its whole project? The harvestee's metadata says it supports the
ivo_voproject set.
Unless a harvestee screws up in its implementation, the harvester will not
get any duplicate records. Not only does the harvester not have to track
authority IDs, it doesn't really have to look at the contents of the the
records at all, apart from what it takes to load it in to the local DB.
<managedAuthority> and <ownedAuthority> are not needed. There's no
single "managing" registry; just "projects" that registries join. No
human coordination is required; a publishing registry administrator
just indicates which project it's to be associated with. Want to change
projects? Change the <voProject> value.
To see how our approaches are similar, your <ManagedAuthority> is
replaced with <voProject>. There will be fewer projects than
authorities, so this should be more "manageable".
cheers,
Ray
More information about the registry
mailing list