Question: harvesting managed vs. all resource records
Ray Plante
rplante at ncsa.uiuc.edu
Mon Apr 4 22:45:33 PDT 2005
On Mon, 4 Apr 2005, KevinBenson wrote:
> Now your last couple of statements about "some cases in the NVO" and how
> the "harvestFrom field changes on multiple hops" sort of concerns me, this
> is sounding like the Resource metadata may be different between our
> registries (hence our harvestFrom is different between our registries) this
> does not sound good. Or do I have that wrong?
The attributes for the vr:Resource complex type are special. They
are:
created
updated
status
The values are not provided by the resource publisher, but rather,
they are set by the registry that holds the record. In particular,
it is intended that any registry might update a record's status to
"inactive" based on their own tests of liveliness.
In addtion to these attributes, we have discussed adding an attribute
called verificationLevel to aid with registry curation. The value would
be assigned to a resource record by a registry to indicate quality of the
resource metadata (not the resource itself). Registries would set their
own standards for what earns the highest quality rating; thus, they would
feel free to override the value that might already be in there when the
value is harvested.
These attributes are the only place where values can differ across
registries (really, only status and verificationLevel). A harvestedFrom
added to these attributes. The rest of the record--that is, the
information held in the Resource type's child elements--should NOT be
changed by anyone other than the original publisher.
> A record really should not have multiple hops, there should just be a brief
> amount of time where a record is not the same as the originating publishing
> registry because a harvest was not performed.
You mentioned an example earlier in which a harvester might get all
records from a single registry the first time it harvests but then just
get managed records on subsequent harvests. Presumably, this is because
after getting all records, it now knows about the other registries it can
harvest from directly. With that first call, many of the records it gets
back--i.e. the "non-managed" ones--will be making their 2nd hop, yes?
Currently, it is not possible for records to make multiple hops. Assuming
we implement the "ivo_managed" set, it will be possible but not necessarily
standard practice. If we think being able to harvest all records is
useful, then tracking harvesting source separately from record origin will
tighten up our record keeping.
cheers,
Ray
More information about the registry
mailing list