OAI as VO Harvesting Interface

Ray Plante rplante at poplar.ncsa.uiuc.edu
Fri Sep 12 08:07:09 PDT 2003


Hi Clive,

Thanks for the excellent examples.

On Fri, 12 Sep 2003, Clive Page wrote:
> I think there are likely to be several types of query, more or less
> corresponding to different levels of these structures:
> 
> (a) Can you find me a nearby mirror of Vizier?
> (b) Which catalogues does Vizier have giving radio polarisation?
> (c) Which XMM exposures when source X was in the field have an exposure
>     time over 10 kiloseconds?

I'm going to take a crack at what these queries might look like.
First, let's assume that the registry has enough detailed information 
about these resources to answer the question.  (c) in particular looks
like it would be out of scope of the registry; at the level of
locating exposures, one would go to a specific service at an XMM
archive.  Second, I'm going to assume some things about how the
resources are classified and what information is included; the actual
curators may have a better idea.  Finally, the queries are in pseudo
code.  

The purpose of these examples is to show how we support the
hierarchical nature of resources in queries.

(a) Can you find me a nearby mirror of Vizier?

    (I need to assume some things about how we describe mirrors.)

    select [Resources] where Title~'Vizier' or MirrorOf/Title~'Vizier'

(b) Which catalogues does Vizier have giving radio polarisation?

    select [Catalogs] where Manager/Title~'Vizier' and 
      VOTableColumns/FIELD/@UCD=[some ucd for radio polarization]

    Note: Catalog would be a class of resource that is not yet
    defined.  This level of resource, and thus the query, might be too
    fine-grained for a registry, but it still fits within the model.

(c) Which XMM exposures when source X was in the field have an exposure
    time over 10 kiloseconds?

    select [Images] where Subject/Facility~'XMM' and
       Overlaps(Coverage/Spatial,[position of source X]) and 
       Coverage/Temporal/Exposure > 10000

    Note: Image is also a resource class yet to be defined; it, too,
    is probably too fine-grained for the registry.

(c) is interesting because it points out that Facility is part of the
description of an Image.  It would also be part of the description of
the XMM data collection as a whole.  Thus, because this is not
inherently a hierarchical--that is, each level can stand on its own
independent of any other level--some information may get repeated.  I
think the advantage of connecting levels via references (e.g. as
MirrorOf does in (a) and Manager does in (b)), is that it does not
lock in a particular hierarchy where, say, Facility can only exist at
one level.  

Queries that span non-adjecent levels would be more complex but still
doable.  However, I think that if the levels are sufficiently
self-describing, such queries should rare.  

The more inherently hierarchical our resource model is, the harder it
is to map to an RDB model; we'll effectively restrict ourselves to an
XML database.  Given the ubiquity of RDBs in our community and the
relative immaturity of XML-based DBs, I'm not sure it's wise to force
ourselves into that corner unless we show ourselves we really need
it.  

cheers,
Ray




More information about the registry mailing list