Question: harvesting managed vs. all resource records

KevinBenson kmb at mssl.ucl.ac.uk
Mon Apr 4 15:37:50 PDT 2005


Okay thanks Ray.  I will reply working backwards.
1.) Yep that is consistent what I was thinking as well.
2.) xs:date you are allowed an optional Time from what I can tell, but
probably a change to dateTime would be good: See
http://www.w3.org/TR/xmlschema-2/#date

3.) Hmmm I do see a small advantage of not querying for the Registry type
record.
    I currently don't allow an authority id to be entered unless the
Registry type has it as a managedAuthority, but I know we currently don't
restrict that.
    Now your last couple of statements about "some cases in the NVO" and how
the "harvestFrom field changes on multiple hops" sort of concerns me, this
is sounding like the Resource metadata may be different between our
registries (hence our harvestFrom is different between our registries) this
does not sound good.  Or do I have that wrong?   (I will say you might do
some internal things to get something to work and change a Resource that you
don't manage, but this should stay internal and quite temporary till the
curator can fix the entries and be a very rare occurrance.)

A record really should not have multiple hops, there should just be a brief
amount of time where a record is not the same as the originating publishing
registry because a harvest was not performed.

Yes I will try to start a "ownedAuthority" thread tomorrow morning, this was
something we agreed to back in Harvard, but just did not get it in the 0.10
schema; so I will kick it off again tomorrow.

cheers,
Kevin

-----Original Message-----
From: owner-registry at eso.org [mailto:owner-registry at eso.org]On Behalf Of
Ray Plante
Sent: 04 April 2005 22:41
To: KevinBenson
Cc: registry at ivoa.net
Subject: RE: Question: harvesting managed vs. all resource records


Hey Kevin,

On Mon, 4 Apr 2005, KevinBenson wrote:
> As you say on your wiki page Ray, you can discover who the curator is by
the
> Registry type of who is managing that authority id, so I am not quite sure
> what the "harvestFrom" gains you.

In principle, I admit the difference is probably subtle, but in practice,
it can make a noticeable difference.  Here's what I think harvestFrom
gains you:

  o  You don't have to do an additional query to find out where the record
     came from.

  o  You are protected against the possibility that Registry record is
     either not up to date (i.e. doesn't contain the authority ID) or is
     otherwise inconsistent (e.g. corrupted, missing, etc.).

  o  You can trace records that make multiple harvesting stops.  Note that
     what is recorded in the Registry record is not exactly what
     harvestFrom holds.  The latter will be the registry that the
     harvester got the record from.  That registry may have gotten that
     record from another registry (which would happen if the harvester
     grabs all records, rather than just the managed ones).

     We noticed some cases in the NVO in which the records exported by a
     registry is not exactly what was originally published (and we're
     talking about the resource metadata here).  Tracking down a problem
     like this would benefit from harvestFrom if the record actually makes
     multiple hops from its originator.

I think the fact that two working registries felt compelled to record this
information internally suggests that it's a good idea.

> Now we do need to talk about the notion
> again of <ownedAuthority> but that is later (this deals with full-full
> harvesting only so we don't keep harvesting every registry around).

Agreed.  We should bring this up in a separate thread.

> xs:date to my knowledge is okay with time values and in fact astrogrid
does
> it with a "time" with a "Z" ending and xerces seems to be okay with it.
So
> I think date should be okay, we probably should make sure status and
updated
> are required attributes; possibly created as well.

Technically, including time in a xs:date is not correct.  Given your
practice, I'll put supporting dateTime on the list of proposed changes to
VOResource.  It will be backward-compatible.

> Also I am now coming around on OAI sets, originally I was not to keen on
> them, and thought you could just do everything with ListRecords, but I do
> see where using a set to get everything the first time could be very good
> and is probably not to hard to implement plus adding oai_managed set would
> be just as easy.  I do think ListRecords need to only be managed Resources
> each time though.

Could you clarify this last sentence?  I think I hear you say that you're
okay with defining a standard set called "ivo_managed" to just get the
managed resources; is that right?  This could be used as an
argument to ListRecords (as well as ListIdentifiers).  If no set argument
were provided, all records would be returned.  In practice then, IVOA
harvesters would usually provide set=ivo_managed as an argument to
ListRecords.  Is this consistent with what you are thinking?

cheers,
Ray






More information about the registry mailing list