OAI as VO Harvesting Interface

Ray Plante rplante at poplar.ncsa.uiuc.edu
Wed Sep 10 13:27:30 PDT 2003


Hi Clive,

On Wed, 10 Sep 2003, Clive Page wrote:
> The other blind spot is that it assumes a flat
> namespace for each resource, which is especially odd when they have
> adopted XML, a hierarchical structure if ever there was one, for their
> output.  I can hardly think of even one useful astronomical resource which
> doesn't have at least a couple of levels of structure, and as I pointed
> out yesterday some have ~4 levels.  I'd have thought this was pretty
> generally true for scientific data resources.  

Could you clarify, perhaps with an example, the nature of the hierarchical 
structure that needs to be captured?  (There are different possible types 
of hierarchies, and I want to make sure we're thinking about the same 
one.)  

I think the way to look at this is from a query perspective.  A query to a 
registry--be it a complex search or just a simple harvesting request--is 
going to return a bunch of matched things.  The question is, what should 
those things be?  The current model under discussion is that these things 
are resources, each having a (XML) description and a unique identifier.  

Now we may have resources within resources.  Say, a data center has a 
bunch of data collections, and a data collection contains a bunch of 
images.  If we do a search that matches at one of these levels what should 
we return?  The highest-most level?  How much of the lower level 
descriptions should be included?

In the current model, we've been talking about describing hierarchies as a 
type of relationship (see Tony's discussion); a resource description can 
refer to another via its identifier.  This makes the above questions 
straightforward:  we return those "levels" that match the query.  If 
information is desired about another level, one can access them via its 
identifier.  (Whether one has to go back to the registry for those other 
levels is detail of the registry interface.)  Note that I would expect 
most queries to include a constraint on the type of resource desired; 
thus, if you're looking for specifically for data collections, you won't 
also get back the data center (i.e. organisation) that curates it.  

This model simplifies harvesting:  registries are simply exchanging 
descriptions of resources.  They choose which kind of resources it wants, 
thus it selects the level of detail.  (The model also makes it easier to 
"flatten" the metadata into an RDB table.)  

So the question is, is the referencing scheme sufficient for capturing 
hierarchies?  Is anything more needed to support harvesting?

cheers,
Ray







More information about the registry mailing list