Approach to metadata for Spectral Data at the Planetary Data System (PDS)

Anne Catherine Raugh araugh at umd.edu
Mon Jun 14 16:44:49 CEST 2021


Hello Markus,

You are quite right that there is a fundamental difference between our
primary goal of locating data for download (our current paradigm that may
be changing), and the IVOA goal of directing data to an
analysis environment. I frequently find myself now considering how to get
enough metadata into our labels so that the next layer of an interface can
do something that makes the data more usable - like automated
transformations.

The tags are documented as part of the PDS4 Information model. They are
part of the information model label taxonomy in what we call the
"Primary_Result_Summary" class. The formal definition from the Information
Model (IM) is here:

https://pds.nasa.gov/datastandards/documents/dd/current/PDS4_PDS_DD_1G00.html#d5e15413


That document describes the IM in terms of the structural hierarchy of
metadata in the major product types. Each "Product_" root node defines a
label structure for something in the archive. You can also find the same
information in a different format in the "Information Model" document (this
document presents the IM indexed on several different levels of the
hierarchy, and is a bit more congenial for knowledgeable label designers):

https://pds.nasa.gov/datastandards/documents/im/current/index_1G00.html#10.34%C2%A0%C2%A0class_pds_primary_result_summary


The "Information Model" document describes the IM in terms of the
structural hierarchy of metadata in the major product types. Each
"Product_" root node defines a label structure for something in the archive.

A more practical, how-do-I-fill-this-out description for label designers is
here:

https://sbnwiki.astro.umd.edu/wiki/Filling_Out_the_Observation_Area_Classes#.3CPrimary_Result_Summary.3E


...in a wiki I maintain that gives step-by-step instructions for creating
the most common types of labels.

It was a struggle to get metadata like this into the PDS4 Information
Model, because historically PDS has only ever described its data in terms
of its source (which instrument, which spacecraft, which mission) and its
target (which planet, and even non-planet targets were problematic). So I
view it as a start, but I hope we can do better for version 2.0.

-Anne.


On Mon, Jun 14, 2021 at 3:42 AM Markus Demleitner <
msdemlei at ari.uni-heidelberg.de> wrote:

> Dear Anne,
>
> On Wed, Jun 02, 2021 at 11:32:24AM -0400, Anne Catherine Raugh wrote:
> > week and I have been somewhat distracted. I wanted to have some time to
> > organize my thoughts, rather than just producing a “brain dump”.
>
> Much appreciated, thank you.
>
> Perhaps as a rough outline for people who've not be present at the
> interop talk, the product-type vocabulary (draft at
> http://www.ivoa.net/rdf/product-type) is guided by two use cases (or
> so I claim):
>
> (1) obscore case: "For my research, I need time-resolved data of
> source X (or an image, or a spectrum, or whatever)"; constraints such
> as resolution or spectral band are in different pieces of metadata.
>
> (2) datalink case: "I have a piece of data, and my (datalink, say)
> client now needs to pick an application that can work with it."
>
> Even these two use cases might already be fairly conflicting, and of
> course it'll never be perfect anyway.  For instance, several spectral
> clients in use in the VO cannot deal with IRAF-style spectra (primary
> FITS arrays); avoiding "cannot open" errors in these cases is
> probably beyond what is reasonably doable.
>
> > data structures to a user without requiring the user to know (or guess)
> our
> > terminology for distinguishing these various spectral formats. Our
> general
>
> Here, I suppose we in the VO can assume client support (or
> researchers just looking up the terms at the well-known place above).
> So, I'd rather make terminology explicit in general, in particular
> because the sort of "loose matching" that you can do on a specific
> website becomes an interoperability nightmare as different services
> or clients do the loose matching in different ways.
>
> > In order to do this, we created a set of attributes that describe the
> data
> > in terms of the characteristics of the data distinct from its source.
> And,
> > to handle the multiple formats available for spectroscopy and imaging
> data,
> > in particular, in this set of attributes we separated science discipline
> > (imaging, spectroscopy) from format (table, image, cube,...).
>
> Yes, having what I'd call "axes" (time, spectrum, space, polarisation
> and (solar system, simlations) potentially many others) separate from
> "dimensionality" (or the distinction between relational or array-like
> data) would seem wise.
>
> However, there are already quite a few obscore tables out there, and
> I don't think it's realistic to ignore the existing terms and, in
> particular, the existing practice, which is what
> http://www.ivoa.net/rdf/product-type largely represents.  If I got to
> start again, I'd probably say we ought to have array1, array2,
> array3, array4, and relational on the "format" side, and denote the
> data content through combinations of terms from spectral (s), time
> (t), space (l as in location), p (polarisation), etc, and then have a
> spectral cube be s#l; there's a nice ADQL user defined function (UDF)
> ivo_hashlist_has that would enable reasonably elegant and potentially
> even indexable operations with this.
>
> Alas, as I said, we have all the existing practice out there; still,
> perhaps allowing "hashlists" in the datalink and obscore fields would
> give us most of where we might want to go without having to throw
> away existing practice entirely.  "cube#spectrum#image"?
>
> Semantically, that's a pain, though, as you'd have two independent
> hierarchies in one vocabulary, and one would also need extra UDFs to
> enable semantic operations on such hashlists of terms.  But it is at
> least something we ought to think about.
>
> > So, in theory (we are still developing registries to make use of this
> level
> > of detail), a user will be able to enter “spectrum”, “spectroscopy”,
> > “spectral”, or similar terms, and get a return set that contains all
> > spectra of any format anywhere in our archive. Then, to the side, they
> will
> > be offered various facets they can select on to narrow results, including
> > spectral type (wavelength, frequency, energy) and data format (tabulated,
> > 1D, 2D, etc.). We can provide brief descriptions of jargon like
> “Tabulated
> > Spectra” in mouse-over functions, so that users can decode our jargon
> when
> > we must resort to it for brevity.
>
> It can't quite work like this in the VO, because web pages aren't the
> main UI (and there's no such thing as "the" UI anyway); but enabling
> this kind of functionality for clients that want to provide something
> like this definitely is part of the obscore use case, I'd say.
>
> > The important break for us was realizing that the data structure is just
> > another independent variable, like wavelength or spectral measurement
> type,
> > used to describe the data content. By decoupling it from the science
>
> Yes -- I think that is a very valuable insight.  The question for
> product-type is what we make of it based on what we already have and
> probably won't want to tear down.  Hm.
>
> Well, thanks again for sharing these thoughts.
>
> The actual lists of the tags you're assigning would probably help us
> figure out what we'll have to expect as more solar-system data enters
> the VO.  Are these public?
>
> Thanks,
>
>             Markus
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.ivoa.net/pipermail/semantics/attachments/20210614/ee0612be/attachment.html>


More information about the semantics mailing list