Question: harvesting managed vs. all resource records

KevinBenson kmb at mssl.ucl.ac.uk
Tue Apr 5 01:48:02 PDT 2005


Wow you definitely threw some new things at me that I did not know about.
Maybe I missed the loop, but I think we might have a nightmare if your
saying these attributes may be different between all our registries.  I feel
they should come from the originating registry.  Was there some other thread
talking about these attributes are allowed to be different between
registries. How does a resource ever go to "deleted" if everybody is keeping
different statuses and now our "oai_full" set will be completely different
between registries (unless we keep 2 different copies of resources yuck).
Also the verificationLevel your sliding in there could be quite a lot of
trouble if they are changing between different registries.  Also possibly
give me a "harvestFrom" attribute example, I could see some worth of a last
harvest date, but again not on the actual Resource record.

I feel that all changes should come from the originating (managed or owned)
Registry otherwise things are going to get very jumbled.

A POSSIBLE SOLUTION OR COMPROMISE WITH STAMPING:
Originally Wil suggested an idea of stamping this was some months ago it
could be something we all welcome back.  We decided at the time it might be
a little difficult on this first pass, maybe now is the time to resurrect it
to life.  If my memory serves me correct changes are not on a particular
Resource record (hence the Record stays the same).  Instead internally  (or
possibly even a separate type of Resource type if we want to go that way), a
Registry may stamp other Resource records.

>From our recent transactions on e-mails we might have something like this:
</stamprecord>
<stampedidentifier>CDS/Vizier/II/2A</stampedidentifier>
<stamp>
  <approvedby>JVO</approvedby>
  <itsverificationLevel>Good/5</itsverificationlevel>
  <livliness>active|inactive</livlieness>
</stamp>
<harvestedFrom>Some registry</harvestFrom>
<lastHarvestDate>Last Harvest Date</lastHarvestDate>
<stamprecord>

The above is just a first thought, we might want to pull the original
stamping idea from the past it was more clear.  In general the way I
remember there would be one or two additional web service interface methods
such as getStampByIdentifier.  Anyways we now have the notion of what was
approved by a registry and verification level for that Record at that
particular Registry plus livliness at that Registry that you wanted; and
finally we can throw in some other stat information such as harvestFrom,
lastHarvestDate, and others.  The original Resource record will remain
consistent throughout our Registries.

Cheers,
Kevin

-----Original Message-----
From: owner-registry at eso.org [mailto:owner-registry at eso.org]On Behalf Of
Ray Plante
Sent: 05 April 2005 06:46
To: registry at ivoa.net
Subject: RE: Question: harvesting managed vs. all resource records


On Mon, 4 Apr 2005, KevinBenson wrote:
>     Now your last couple of statements about "some cases in the NVO" and
how
> the "harvestFrom field changes on multiple hops" sort of concerns me, this
> is sounding like the Resource metadata may be different between our
> registries (hence our harvestFrom is different between our registries)
this
> does not sound good.  Or do I have that wrong?

The attributes for the vr:Resource complex type are special.  They
are:
   created
   updated
   status
The values are not provided by the resource publisher, but rather,
they are set by the registry that holds the record.  In particular,
it is intended that any registry might update a record's status to
"inactive" based on their own tests of liveliness.

In addtion to these attributes, we have discussed adding an attribute
called verificationLevel to aid with registry curation.  The value would
be assigned to a resource record by a registry to indicate quality of the
resource metadata (not the resource itself).  Registries would set their
own standards for what earns the highest quality rating; thus, they would
feel free to override the value that might already be in there when the
value is harvested.

These attributes are the only place where values can differ across
registries (really, only status and verificationLevel).  A harvestedFrom
added to these attributes.  The rest of the record--that is, the
information held in the Resource type's child elements--should NOT be
changed by anyone other than the original publisher.

> A record really should not have multiple hops, there should just be a
brief
> amount of time where a record is not the same as the originating
publishing
> registry because a harvest was not performed.

You mentioned an example earlier in which a harvester might get all
records from a single registry the first time it harvests but then just
get managed records on subsequent harvests.  Presumably, this is because
after getting all records, it now knows about the other registries it can
harvest from directly.  With that first call, many of the records it gets
back--i.e. the "non-managed" ones--will be making their 2nd hop, yes?

Currently, it is not possible for records to make multiple hops.  Assuming
we implement the "ivo_managed" set, it will be possible but not necessarily
standard practice.  If we think being able to harvest all records is
useful, then tracking harvesting source separately from record origin will
tighten up our record keeping.

cheers,
Ray






More information about the registry mailing list