ownedAuthority element

KevinBenson kmb at mssl.ucl.ac.uk
Thu Apr 7 03:38:25 PDT 2005


Hope this e-mail eventually makes it been having network problems.

(Read below for a possible deeper issue, that I might like to know how NVO
(Carnivore, STSCI) works)
(I need to think on your voproject, seems like that might be a good idea)

Thought I might try a scenario to better explain.

Following scenario:
Carnivore in its registry type:
	<ownedAuthority>carnivore1</ownedAuthority><managedAuthority>carnivore1</ma
nagedAuthority>
	<ownedAuthority>carnivore2</ownedAuthority><managedAuthority>carnivore2</ma
nagedAuthority>
	<managedAuthority>heasarc</managedAuthority>
HEASARC registry type:
	<ownedAuthority>heasarc</ownedAuthority>

-When a harvester harvests Carnivore it will send metadata from "carnivore1,
carnivore2, and heasarc".
-When a harvester harvests Heasarc it will return metadata for "heasarc"

-Another registry like Astrogrid would like to try and do full registries
first before going to publishing registry.  If it sees heasarc authority id
from carnivore, it looks up the owner authority id and  realizes it can skip
harvesting heasarc.

-When Carnivore decides to do its harvesting of other registries, it checks
its own registry type and what managed authorities it has, and checks if
there is a Registry type with that <ownedAuthority>.  And performs a harvest
of that Registry in this case "heasarc".  And continue on to other harvests.

The sets could stay the same, (might think of a name change if we still
wanted to go this route), but "ivo_managed" set for heasarc would return
Heasarc, for Carnivore it would be carnivore1, carnivore2, heasarc.

*The only thing I could see a big problem above is if heasarc had several
authority ids it owned, and did not let Carnivore manage them all, but
surely you would not want to spread your managed authority ids around
between different full registries.

*Matthews last e-mail might have a good point about relational dbs and
extensions, granted I would think you would normally find a Full Registry
you know would work.  I tempted to say we might want to move extensions out
onto there own OAI set. Since they are non-ivoa.

If we think this might be to much trouble this round then we could drop it
for now.  And go with the flat model for a little longer.  Of course during
Kyoto maybe another type of solution might come available.  Read below about
tracking authority id's I think you need to do this in Full Registries.

----
Deeper issue, (we might want to put this on another thread or you might want
to e-mail me separately if you want to.)
You mentioned about keeping track of managed Authority id's and it was
sounding like yours does not keep track,
I must say I think this is something that needs to happen.  Do you do that
in your registries?   Because if you don't you will start getting a major
problem in registries creating and using the same authority ids (and
possibly same identifiers).  Already once or twice I caught a conflict in
our Astrogrid registries and did the transfer.

* In fact, I might need to check my logs, been busy with demos. I am
thinking I am seeing a conflict with authority ids on nvo, but need to
recheck it might be nothing.

Let me explain a few rules on Astrogrid currently:
*First comment: We use a HashMap that contains some objects, it is not very
big, but it contains the managedAuthority id and already a owner (which is
the authority id in the <identifier> element of the Registry type).

* All updates to the Registry makes sure it is an authority id managed by
its registry.
* All Registry types must have at least one  managedAuthority,  must be at
least one for the authority id in the <identifier>.
* When a Full Registry performs a harvest on another Registry, if it
receives a Registry type.  It checks its managed Authority elements and if
there is a conflict then it stops harvesting that Registry till this
conflict is resolved.  (We need to do this or we will run into problems.)
For example: Astrogrid registry harvests heasarc, then harvests NCSA in the
ncsa there is a new registry type and has a managedAuthority id the same as
one of the Heasarc ones all harvesting stops on NCSA till conflict is
resolved.

* On updates if it is an Authority Type and no other registry has it as a
managed authority id, then it adds it as a managedAuthority to its registry
type.

-The only check I do not do on a harvest is on other Resource metadata
types, I just blindly update.  I am assuming Registries will return the
metadata that they manage.  Which I know is sometimes not the case (even
though it should be).  This means if Heasarc was to return NCSA data or CDS
then it will update them into the registry blindly.

If your cours of an example on the hashmap key=value:
Key = Object with Managed Authority id and a Version Number,
Value = Managed Authority ID, Version Number, and Owner
{uk.portsmouth},{0.10}={uk.portsmouth},{0.10},{uk.ac.le.star(owner)}
{uk.ac.le.star},{0.10}={uk.ac.le.star},{0.10},{uk.ac.le.star(owner)}


(I would like to dd a Full or Publishing to this hashmap, unfortunately we
dont' currently distinguish those in our REgistry types.)

cheers,
Kevin


-----Original Message-----
From: owner-registry at eso.org [mailto:owner-registry at eso.org]On Behalf Of
Ray Plante
Sent: 06 April 2005 18:43
To: Registry List
Subject: RE: ownedAuthority element


Hi Kevin,

On Wed, 6 Apr 2005, KevinBenson wrote:
> In general since we have several other areas with curation and things
going
> on I would like to see it kept real simple and effectively have these 3
> rules.
>
> 1.) harvestee when returning records in OAI returns records for authority
> ids managed or owned (from there Registry).

I take it from your concluding comment that you are indeed distinguishing
between managed and owned.  If so, how does a harvestee know whether it
should be returning "managed" records or "owned" records?  Are you
assuming the use of sets or not?

> 2.) harvester *may* desire to skip harvesting certain registries if
> discovered an authority id has already been received from another registry
> that has that particular authority id as being "managed".
> 3.) If harvester is a "Full Registry" then at an unknown time (presumably
> when it kicks off its harvests), it MUST harvest any registry type that it
> has been put as a <managedAuthority> and being managed by its own
registry.

Okay, I'm not understanding what 3 is saying.

Here's how I think 3 is suppose to work based on how <managedAuthority>
and <ownedAuthority> are (suppose to be) defined.  Assume a full registry
has been seeded somehow, e.g. it's copied all records from another full
registry.  It looks at each Registry record in its holdings.  If it sees
that it has <managedAuthority> elements, it will add that registry to
the list it will harvest from it.  It will also have to note in its
list that it will need to harvest "managed" records as well as which
authority IDs are listed as managed.

Next, the full registry looks through the Registry records again, this
time looking at the <ownedAuthority> elements.  If an authorityID was
not listed as a <managedAuthority> in one of the registries in the
list to harvest from, the registry described by the record is added to
the list of registries to harvest.  It also notes that it will have to
harvest only "owned" records from this registry.

At harvest time, it cycles through its harvestee list.  When it
harvests, it has to tell the harvestee (somehow) that it wants
*either* only "managed" records or "owned" records.  If we don't have
this capability (e.g. two OAI sets), then the harvester will have to
sort them out by looking at the identifiers, classifying them by their
authority IDs in order to determine what to take and what to throw
away.

Have I got this right?

If so, I don't think it's so simple because:
  o  the harvesting registry has to keep track of all these authority
       IDs to figure out who and how to harvest.  (This, of course,
       assumes that harvesting is done from all harvestees together at
       once, whatever that means.)
  o  it does not guarantee that you won't get duplicates.  What if
       only *some* of a registry's "owned" records are listed as
       "managed" in another Registry record?  If you add a registry to
       pick up the missing authority IDs, you'll also get ones you
       already got.  The harvester then has to decide which to take.
  o  if we don't have OAI sets, the harvester has to sort out "managed"
       from "owned" after pulling the records over.
  o  it requires interactive coordination between registry
       administrators to set up the hierarchy which is simply not
       necessary.
  o  it is prone to error as the "managed" and "owned" records must be
       continually be kept up to date as new authority IDs are created.
       (And I'm concerned about what synchronization latency will do.)

Clearly using sets is much simpler than this.

In contrast, consider these 2 rules:
  1) if a full registry is harvesting from a registry in its own
     VO project, it does so with set=ivo_managed.
  2) if a full registry is harvesting from a full registry from another VO
     project, it does so with set=ivo_voproject.

How does the full registry know which harvestees are in its project?
The <voProject> value in the harvestee's Registry record says so.  How
does the full registry know which harvestees can provide records for
its whole project?  The harvestee's metadata says it supports the
ivo_voproject set.

Unless a harvestee screws up in its implementation, the harvester will not
get any duplicate records.  Not only does the harvester not have to track
authority IDs, it doesn't really have to look at the contents of the the
records at all, apart from what it takes to load it in to the local DB.
<managedAuthority> and <ownedAuthority> are not needed.  There's no
single "managing" registry; just "projects" that registries join.  No
human coordination is required; a publishing registry administrator
just indicates which project it's to be associated with.  Want to change
projects?  Change the <voProject> value.

To see how our approaches are similar, your <ManagedAuthority> is
replaced with <voProject>.  There will be fewer projects than
authorities, so this should be more "manageable".

cheers,
Ray











More information about the registry mailing list