UCDs, metadata and AstroOntology
Anita Richards
amsr at jb.man.ac.uk
Wed Oct 16 06:37:48 PDT 2002
> Not to be left out, I've jumped in with:
> http://wiki.astrogrid.org/bin/view/Astrogrid/AreUCDsMetadata
> arguing that some of these atomic combinations are embedding metadata
> into the column names when it should be kept separate and distinct.
Hi Tony et al.,
I think Tony is right that in some cases the UCDs appear to have too much
detail. However if I understand his first two questions correctly, the
reason why I suggested the prefixes (or separate but necessary atoms,
whatever order they are in) is that, for example,
INST_FREQ value Radio would be all the possible observing frequency
ranges of an instrument
OBS_FREQ value 1.6 GHz would be the observing frequency for a particular
observation and at present the flux density of a source in that
observation is given by
PHOT_RADIO_1.6G
or, another example,
OBSTY_POS would be the geographical location of an observatory
OBS_POS would be the pointing position of the instrument for a
particular observation
SOURCE_POS would be the position of a source within that field of view.
Similarly, if I have understood Guy's example, LENGTH_WAVELENGTH means
something like BANDWIDTH but WAVELENGTH could also be the peak of a
spectral line, or possibly also used as a measure of other quantities like
power spectra formed by Fourier transforming almost anything, or in
wavelet analysis (I am not sure about this exactly, but I do know that
'power spectrum' does not necessarily mean a distribution of energy...).
This does not invalidate Tony's arguments (although it does mean that we
should all - including astronomers - realise that other parts of the field
use terms by 'custom and practice' which are far removed from their formal
meanings.). We do have to keep the original column names as occasionally
the UCD allocation will go wrong whatever rules you use.
If I understand correctly, what Tony is saying is that e.g. _if_ a
catalogue has the observing frequency in in the header then the source
flux density just needs to be PHOT.
To me that is still using atomised UCDs as metadata, it is simply allowing
their location to be split up, so that in order to execute a query 'find
sources within (cone) with a flux density >10 mJy at 1.6 GHz' the query
engine would select catalogues which covered the necessary position range
and which contained 1.6 GHz measurements, and ideally this would be done
by searching metadata in the catalogue headers. It would then select
sources and return their properties using the required constraints on the
values held in column entries identified by the column metadata. I would
call these UCDs.
In practice, catalogues might contain any information anywhere. For
example, a catalogue might have in the header 'radio sources > 100 mJy'
but not contain any flux densities in the catalogue itself. Or it might
contain no frequency information in the header but be a catalogue of a
particular type of source, and for each source have flux density
measurements at a number of frequencies, hence PHOT_RADIO_1.6G etc.
Unless we are going to reconstruct every catalogue so that certain
information comes in the header there is no rigid structure which can tell
you where to find which metadata. Of course, we can make recommendations,
because it will be much more efficient to be able to select catalogues
from their headers as a first step, and maybe convert or encourage the
authors to convert major catalogues to the most convenient format, but (at
least at first) I am not sure how efficient it would be to get the headers
'right' for every catalogue, or even how useful it would be in the above
example, if a catalogue had measurements for active stars from radio to
x-ray...
I think the point of atomised UCDs is that they could come anywhere in a
catalogue, e.g. every atom in the header would be taken to prefix every
appropriate column description atom, and if columns contained arrays then
they would in turn be prefixed by the higher levels (for example a pair of
columns containing a value and an error could be implicitly thought of as
an array) and the query engine would ahve to know what as 'appropriate'.
And whatever happens there will have to be the first step of converting
the wierd things people call columns and the ways they describe catalogues
into a common 'language'. As I understand it, the UCDs _are_ the meta
data for each column but they are dynamic, ie built up out of atoms from
the column or the header or possibly higher levels depending on the query.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dr. Anita M. S. Richards, AVO Astronomer
MERLIN/VLBI National Facility, University of Manchester,
Jodrell Bank Observatory, Macclesfield, Cheshire SK11 9DL, U.K.
tel +44 (0)1477 572683 (direct); 571321 (switchboard); 571618 (fax).
More information about the semantics
mailing list