Question: harvesting managed vs. all resource records
Ray Plante
rplante at ncsa.uiuc.edu
Mon Apr 4 14:40:44 PDT 2005
Hey Kevin,
On Mon, 4 Apr 2005, KevinBenson wrote:
> As you say on your wiki page Ray, you can discover who the curator is by the
> Registry type of who is managing that authority id, so I am not quite sure
> what the "harvestFrom" gains you.
In principle, I admit the difference is probably subtle, but in practice,
it can make a noticeable difference. Here's what I think harvestFrom
gains you:
o You don't have to do an additional query to find out where the record
came from.
o You are protected against the possibility that Registry record is
either not up to date (i.e. doesn't contain the authority ID) or is
otherwise inconsistent (e.g. corrupted, missing, etc.).
o You can trace records that make multiple harvesting stops. Note that
what is recorded in the Registry record is not exactly what
harvestFrom holds. The latter will be the registry that the
harvester got the record from. That registry may have gotten that
record from another registry (which would happen if the harvester
grabs all records, rather than just the managed ones).
We noticed some cases in the NVO in which the records exported by a
registry is not exactly what was originally published (and we're
talking about the resource metadata here). Tracking down a problem
like this would benefit from harvestFrom if the record actually makes
multiple hops from its originator.
I think the fact that two working registries felt compelled to record this
information internally suggests that it's a good idea.
> Now we do need to talk about the notion
> again of <ownedAuthority> but that is later (this deals with full-full
> harvesting only so we don't keep harvesting every registry around).
Agreed. We should bring this up in a separate thread.
> xs:date to my knowledge is okay with time values and in fact astrogrid does
> it with a "time" with a "Z" ending and xerces seems to be okay with it. So
> I think date should be okay, we probably should make sure status and updated
> are required attributes; possibly created as well.
Technically, including time in a xs:date is not correct. Given your
practice, I'll put supporting dateTime on the list of proposed changes to
VOResource. It will be backward-compatible.
> Also I am now coming around on OAI sets, originally I was not to keen on
> them, and thought you could just do everything with ListRecords, but I do
> see where using a set to get everything the first time could be very good
> and is probably not to hard to implement plus adding oai_managed set would
> be just as easy. I do think ListRecords need to only be managed Resources
> each time though.
Could you clarify this last sentence? I think I hear you say that you're
okay with defining a standard set called "ivo_managed" to just get the
managed resources; is that right? This could be used as an
argument to ListRecords (as well as ListIdentifiers). If no set argument
were provided, all records would be returned. In practice then, IVOA
harvesters would usually provide set=ivo_managed as an argument to
ListRecords. Is this consistent with what you are thinking?
cheers,
Ray
More information about the registry
mailing list