OAI as VO Harvesting Interface

Clive Page cgp at star.le.ac.uk
Fri Sep 12 01:06:31 PDT 2003


On Wed, 10 Sep 2003, Ray Plante wrote:

> Could you clarify, perhaps with an example, the nature of the hierarchical
> structure that needs to be captured?  (There are different possible types
> of hierarchies, and I want to make sure we're thinking about the same
> one.)

I can think of 2 examples (and thought I'd posted one, if these aren't
clear let me know):

Archive of reduced/published such as CDS

CDS has 3 main services: Simbad, Aladin, Vizier
  Vizier has ~10,000 catalogues
    A typical catalogue has ~10 columns
      Each column has attributes: name, datatype, units, description

An observational archive such as XMM-Newton or HST:

XMM-Newton archive contains a few thousand observations
  Observation (pointing period) has data on 3 instruments
    X-ray camara data divided into exposures (period with same config)
      Each exposure has data from 3 cameras (MOS1, MOS2, PN)
        Camera data divided into data types (event-list, image, cal...)

In practice there's some flattening of these essentially hierarchical
structures, Vizier has a flat list of ~10k tables, but they are grouped by
waveband and in other ways, which is useful for the user, so there _could_
be at least one more level than is actually the case at present.

The public XMM-Newton archive (at Vilspa) makes metadata accessible down
to at least the 4th level of the hierarchy, but to retrieve data (I think)
you have to get it chunked to the observation level.  That's partly
because the archive is run on top of a relational DBMS (Oracle) which
encourages flattening.  Our internal XMM database is deeper, as it's run
on an object-oriented DBMS, which encourages more baroque structures.

> I think the way to look at this is from a query perspective.  A query to a
> registry--be it a complex search or just a simple harvesting request--is
> going to return a bunch of matched things.  The question is, what should
> those things be?

I think there are likely to be several types of query, more or less
corresponding to different levels of these structures:

(a) Can you find me a nearby mirror of Vizier?
(b) Which catalogues does Vizier have giving radio polarisation?
(c) Which XMM exposures when source X was in the field have an exposure
    time over 10 kiloseconds?

And I'm sure you can think of more.  A totally flat registry _could_
perhaps just about handle all of these, but I'm doubtful.  Or perhaps some
of these queries would be beyond what a registry would be able to answer
and some or all of the query would have to be forwarded to the archive
itself.

Regards

-- 
Clive Page
Dept of Physics & Astronomy,
University of Leicester,    Tel +44 116 252 3551
Leicester, LE1 7RH,  U.K.   Fax +44 116 252 3311



More information about the registry mailing list