How do we get metadata into the registry network?
Elizabeth Auden
eca at mssl.ucl.ac.uk
Fri Jun 25 10:09:01 PDT 2004
Hi Clive,
> One notable omission from the Registry interfaces document dated June 16th
> is any way of loading metadata from the primary sources, which are the
> files or databases in each data centre. At present all we have is ways of
> harvesting one registry from another
I had a go at registering column metadata for the 5 datasets registered as
TabularSkyServices from the Astrogrid GDW databases: USNO-B, 1XMM, 2MASS,
INT-WFS, and FIRST. I wrote a shell script that first gathered the column
names from one dataset held in the Leicester database, then used sed
and awk to generate the table section of the TabularSkyService XML
document, and finally merged this text into an XML file containing a
TabularSkyService entry generated from the VOResource schemas.
That generated a TabularSkyService entry containing sufficient metadata
for the registry to return column names. The time-consuming part was
entering the descriptions, units and relevant UCDs for each column - that
part had to be done by hand (but I had some time to kill after my plane
got delayed returning from the Garching AVO / Astrogrid meeting... :)
The completed XML file for each dataset's TabularSkyService was copied
directly into the Astrogrid registry; Kevin made a copy of the file that
was picked up during a registry harvest. I am *not* suggesting that this
is a good method to use with datasets like Vizier, or even datasets with
similar numbers of columns to 1XMM or 2MASS. However, for small (or at
least few columns) datasets I think entering the extra column information
by hand is feasible, with the caveat that having to enter metadata for a
million tables of 5 columns each becomes unfeasible.
An interface that allows registry entry authors to upload a
completed XML registry entry could be a useful addition to the current
text box entry forms on a registry admin page. This does not solve the
problem of gathering metadata from non-tabular sources you suggest, i.e.
time series, images, etc.
cheers,
Elizabeth
More information about the registry
mailing list