Approach to metadata for Spectral Data at the Planetary Data System (PDS)

Mon Jun 14 09:32:53 CEST 2021

Dear Anne,

On Wed, Jun 02, 2021 at 11:32:24AM -0400, Anne Catherine Raugh wrote:
> week and I have been somewhat distracted. I wanted to have some time to
> organize my thoughts, rather than just producing a “brain dump”.

Much appreciated, thank you.

Perhaps as a rough outline for people who've not be present at the
interop talk, the product-type vocabulary (draft at
http://www.ivoa.net/rdf/product-type) is guided by two use cases (or
so I claim):

(1) obscore case: "For my research, I need time-resolved data of
source X (or an image, or a spectrum, or whatever)"; constraints such
as resolution or spectral band are in different pieces of metadata.

(2) datalink case: "I have a piece of data, and my (datalink, say)
client now needs to pick an application that can work with it."

Even these two use cases might already be fairly conflicting, and of
course it'll never be perfect anyway.  For instance, several spectral
clients in use in the VO cannot deal with IRAF-style spectra (primary
FITS arrays); avoiding "cannot open" errors in these cases is
probably beyond what is reasonably doable.

> data structures to a user without requiring the user to know (or guess) our
> terminology for distinguishing these various spectral formats. Our general

Here, I suppose we in the VO can assume client support (or
researchers just looking up the terms at the well-known place above).
So, I'd rather make terminology explicit in general, in particular
because the sort of "loose matching" that you can do on a specific
website becomes an interoperability nightmare as different services
or clients do the loose matching in different ways.

> In order to do this, we created a set of attributes that describe the data
> in terms of the characteristics of the data distinct from its source. And,
> to handle the multiple formats available for spectroscopy and imaging data,
> in particular, in this set of attributes we separated science discipline
> (imaging, spectroscopy) from format (table, image, cube,...).

Yes, having what I'd call "axes" (time, spectrum, space, polarisation
and (solar system, simlations) potentially many others) separate from
"dimensionality" (or the distinction between relational or array-like
data) would seem wise.

However, there are already quite a few obscore tables out there, and
I don't think it's realistic to ignore the existing terms and, in
particular, the existing practice, which is what
http://www.ivoa.net/rdf/product-type largely represents.  If I got to
start again, I'd probably say we ought to have array1, array2,
array3, array4, and relational on the "format" side, and denote the
data content through combinations of terms from spectral (s), time
(t), space (l as in location), p (polarisation), etc, and then have a
spectral cube be s#l; there's a nice ADQL user defined function (UDF)
ivo_hashlist_has that would enable reasonably elegant and potentially
even indexable operations with this.

Alas, as I said, we have all the existing practice out there; still,
perhaps allowing "hashlists" in the datalink and obscore fields would
give us most of where we might want to go without having to throw
away existing practice entirely.  "cube#spectrum#image"?

Semantically, that's a pain, though, as you'd have two independent
hierarchies in one vocabulary, and one would also need extra UDFs to
enable semantic operations on such hashlists of terms.  But it is at
least something we ought to think about.

> So, in theory (we are still developing registries to make use of this level
> of detail), a user will be able to enter “spectrum”, “spectroscopy”,
> “spectral”, or similar terms, and get a return set that contains all
> spectra of any format anywhere in our archive. Then, to the side, they will
> be offered various facets they can select on to narrow results, including
> spectral type (wavelength, frequency, energy) and data format (tabulated,
> 1D, 2D, etc.). We can provide brief descriptions of jargon like “Tabulated
> Spectra” in mouse-over functions, so that users can decode our jargon when
> we must resort to it for brevity.

It can't quite work like this in the VO, because web pages aren't the
main UI (and there's no such thing as "the" UI anyway); but enabling
this kind of functionality for clients that want to provide something
like this definitely is part of the obscore use case, I'd say.

> The important break for us was realizing that the data structure is just
> another independent variable, like wavelength or spectral measurement type,
> used to describe the data content. By decoupling it from the science

Yes -- I think that is a very valuable insight.  The question for
product-type is what we make of it based on what we already have and
probably won't want to tear down.  Hm.

Well, thanks again for sharing these thoughts.

The actual lists of the tags you're assigning would probably help us
figure out what we'll have to expect as more solar-system data enters
the VO.  Are these public?

Thanks,

            Markus