Approach to metadata for Spectral Data at the Planetary Data System (PDS)

Anne Catherine Raugh araugh at umd.edu
Wed Jun 2 17:32:24 CEST 2021


Apologies for taking so long to respond - we have a major proposal due next
week and I have been somewhat distracted. I wanted to have some time to
organize my thoughts, rather than just producing a “brain dump”.

Some brief background.

I work for the Planetary Data System (PDS), where we archive data primarily
from remote sensing missions to planetary targets. I work at the Small
Bodies Node, where we archive data from missions like Rosetta, Hayabusa,
and OSIRIS-REx.

For the first 25 years the PDS existed, the metadata in the labels was such
that the only way to search for data of a specific type, like "linear
spectrum", was to know which instrument on which spacecraft took that sort
of data. (PDS was designed as a “Mission Data Archive”, so perhaps that is
not surprising, as mistaken as it seems in retrospect.) There was a data
type called “SPECTRUM”, but it was so badly defined that most spectral data
taken after 1990 were not delivered in that format, but instead were
labeled as “TABLE” or “IMAGE” or “QUBE”, depending on the dimensionality of
the data file.

When we sat down 10 years ago to redesign the standards for searching based
on inherent properties of the data rather than just origin, spectral and
imaging data presented a challenge because of variations in dimensionality
found throughout data in those disciplines. In my own node, we have at
least five different types of spectral data (and that’s ignoring non-light
spectra like mass spectra and time-of-flight spectra):

   1. Tabulated spectra, in which each row of a table contains a 1D spectrum
   2. 1D spectral tables, where each row provides the data for one spectral
   bin
   3. 2D spectra, where one dimension is spatial and one spectral
   4. 3D spectral cubes, where one dimension is spectral, but the other two
   might be (spatial, spatial) or (spatial, temporal)
   5. 2D images of focal planes on which a spectrum (like an echelle
   spectrum) is projected, where multiple orders might be present and the
   image axes do not, in general, align with the spectral axes

I have heard that there is an instrument out there planning to return
spectral movies - a rapid time-series of 3D spectral cubes with (spectral,
spatial, spatial) dimensions. And of course any one of these structures
might be used to record wavelength, frequency, or energy spectral data.

The problem we needed to solve in our metadata was how to present all these
data structures to a user without requiring the user to know (or guess) our
terminology for distinguishing these various spectral formats. Our general
search design is based on using faceted searches to drill down through
large initial query result sets. That is, we expect users to enter broad
search terms like “spectrum”, and we need to present ways for a user to
filter those results easily down to the data they are interested in and can
use.

In order to do this, we created a set of attributes that describe the data
in terms of the characteristics of the data distinct from its source. And,
to handle the multiple formats available for spectroscopy and imaging data,
in particular, in this set of attributes we separated science discipline
(imaging, spectroscopy) from format (table, image, cube,...).

So, in theory (we are still developing registries to make use of this level
of detail), a user will be able to enter “spectrum”, “spectroscopy”,
“spectral”, or similar terms, and get a return set that contains all
spectra of any format anywhere in our archive. Then, to the side, they will
be offered various facets they can select on to narrow results, including
spectral type (wavelength, frequency, energy) and data format (tabulated,
1D, 2D, etc.). We can provide brief descriptions of jargon like “Tabulated
Spectra” in mouse-over functions, so that users can decode our jargon when
we must resort to it for brevity.

The important break for us was realizing that the data structure is just
another independent variable, like wavelength or spectral measurement type,
used to describe the data content. By decoupling it from the science
discipline aspects of the description, we avoid pre-selecting data for
users based on criteria the user did not specify, and it becomes easy to
incorporate new formats of spectral data as they are created - presenting
them to users who may be unaware of their existence.

Regards,

Anne Raugh
PDS Small Bodies Node
University of Maryland
College Park, MD 20742-2421
301-405-6855
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/semantics/attachments/20210602/b816801f/attachment.html>


More information about the semantics mailing list