Question: harvesting managed vs. all resource records

Ray Plante rplante at ncsa.uiuc.edu
Mon Apr 4 22:45:33 PDT 2005


On Mon, 4 Apr 2005, KevinBenson wrote:
>     Now your last couple of statements about "some cases in the NVO" and how
> the "harvestFrom field changes on multiple hops" sort of concerns me, this
> is sounding like the Resource metadata may be different between our
> registries (hence our harvestFrom is different between our registries) this
> does not sound good.  Or do I have that wrong?   

The attributes for the vr:Resource complex type are special.  They
are:
   created
   updated
   status
The values are not provided by the resource publisher, but rather,
they are set by the registry that holds the record.  In particular,
it is intended that any registry might update a record's status to
"inactive" based on their own tests of liveliness.  

In addtion to these attributes, we have discussed adding an attribute 
called verificationLevel to aid with registry curation.  The value would 
be assigned to a resource record by a registry to indicate quality of the 
resource metadata (not the resource itself).  Registries would set their 
own standards for what earns the highest quality rating; thus, they would 
feel free to override the value that might already be in there when the 
value is harvested.  

These attributes are the only place where values can differ across 
registries (really, only status and verificationLevel).  A harvestedFrom 
added to these attributes.  The rest of the record--that is, the 
information held in the Resource type's child elements--should NOT be 
changed by anyone other than the original publisher.  

> A record really should not have multiple hops, there should just be a brief
> amount of time where a record is not the same as the originating publishing
> registry because a harvest was not performed.

You mentioned an example earlier in which a harvester might get all 
records from a single registry the first time it harvests but then just 
get managed records on subsequent harvests.  Presumably, this is because 
after getting all records, it now knows about the other registries it can 
harvest from directly.  With that first call, many of the records it gets 
back--i.e. the "non-managed" ones--will be making their 2nd hop, yes?  

Currently, it is not possible for records to make multiple hops.  Assuming 
we implement the "ivo_managed" set, it will be possible but not necessarily 
standard practice.  If we think being able to harvest all records is 
useful, then tracking harvesting source separately from record origin will 
tighten up our record keeping.  

cheers,
Ray





More information about the registry mailing list