<div dir="ltr">Hello Markus,<div><br></div><div>You are quite right that there is a fundamental difference between our primary goal of locating data for download (our current paradigm that may be changing), and the IVOA goal of directing data to an analysis environment. I frequently find myself now considering how to get enough metadata into our labels so that the next layer of an interface can do something that makes the data more usable - like automated transformations.</div><div><br></div><div>The tags are documented as part of the PDS4 Information model. They are part of the information model label taxonomy in what we call the "Primary_Result_Summary" class. The formal definition from the Information Model (IM) is here:</div><div><br></div><div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div><a href="https://pds.nasa.gov/datastandards/documents/dd/current/PDS4_PDS_DD_1G00.html#d5e15413">https://pds.nasa.gov/datastandards/documents/dd/current/PDS4_PDS_DD_1G00.html#d5e15413</a><br></div></blockquote><div><br></div>
That document describes the IM in terms of the structural hierarchy of metadata in the major product types. Each "Product_" root node defines a label structure for something in the archive.
You can also find the same information in a different format in the "Information Model" document
(this document presents the IM indexed on several different levels of the hierarchy, and is a bit more congenial for knowledgeable label designers):</div><div><br></div><div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div><a href="https://pds.nasa.gov/datastandards/documents/im/current/index_1G00.html#10.34%C2%A0%C2%A0class_pds_primary_result_summary">https://pds.nasa.gov/datastandards/documents/im/current/index_1G00.html#10.34%C2%A0%C2%A0class_pds_primary_result_summary</a><br></div></blockquote><div><br></div></div><div>The "Information Model" document describes the IM in terms of the structural hierarchy of metadata in the major product types. Each "Product_" root node defines a label structure for something in the archive.</div><div><br></div><div>A more practical, how-do-I-fill-this-out description for label designers is here:</div><div><br></div><div><blockquote style="margin:0 0 0 40px;border:none;padding:0px"><div><a href="https://sbnwiki.astro.umd.edu/wiki/Filling_Out_the_Observation_Area_Classes#.3CPrimary_Result_Summary.3E">https://sbnwiki.astro.umd.edu/wiki/Filling_Out_the_Observation_Area_Classes#.3CPrimary_Result_Summary.3E</a><br></div></blockquote><br></div><div>...in a wiki I maintain that gives step-by-step instructions for creating the most common types of labels.</div><div><br></div><div>It was a struggle to get metadata like this into the PDS4 Information Model, because historically PDS has only ever described its data in terms of its source (which instrument, which spacecraft, which mission) and its target (which planet, and even non-planet targets were problematic). So I view it as a start, but I hope we can do better for version 2.0.</div><div><br clear="all"><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr">-Anne.</div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jun 14, 2021 at 3:42 AM Markus Demleitner <<a href="mailto:msdemlei@ari.uni-heidelberg.de">msdemlei@ari.uni-heidelberg.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Dear Anne,<br>
<br>
On Wed, Jun 02, 2021 at 11:32:24AM -0400, Anne Catherine Raugh wrote:<br>
> week and I have been somewhat distracted. I wanted to have some time to<br>
> organize my thoughts, rather than just producing a “brain dump”.<br>
<br>
Much appreciated, thank you.<br>
<br>
Perhaps as a rough outline for people who've not be present at the<br>
interop talk, the product-type vocabulary (draft at<br>
<a href="http://www.ivoa.net/rdf/product-type" rel="noreferrer" target="_blank">http://www.ivoa.net/rdf/product-type</a>) is guided by two use cases (or<br>
so I claim):<br>
<br>
(1) obscore case: "For my research, I need time-resolved data of<br>
source X (or an image, or a spectrum, or whatever)"; constraints such<br>
as resolution or spectral band are in different pieces of metadata.<br>
<br>
(2) datalink case: "I have a piece of data, and my (datalink, say)<br>
client now needs to pick an application that can work with it."<br>
<br>
Even these two use cases might already be fairly conflicting, and of<br>
course it'll never be perfect anyway. For instance, several spectral<br>
clients in use in the VO cannot deal with IRAF-style spectra (primary<br>
FITS arrays); avoiding "cannot open" errors in these cases is<br>
probably beyond what is reasonably doable.<br>
<br>
> data structures to a user without requiring the user to know (or guess) our<br>
> terminology for distinguishing these various spectral formats. Our general<br>
<br>
Here, I suppose we in the VO can assume client support (or<br>
researchers just looking up the terms at the well-known place above).<br>
So, I'd rather make terminology explicit in general, in particular<br>
because the sort of "loose matching" that you can do on a specific<br>
website becomes an interoperability nightmare as different services<br>
or clients do the loose matching in different ways.<br>
<br>
> In order to do this, we created a set of attributes that describe the data<br>
> in terms of the characteristics of the data distinct from its source. And,<br>
> to handle the multiple formats available for spectroscopy and imaging data,<br>
> in particular, in this set of attributes we separated science discipline<br>
> (imaging, spectroscopy) from format (table, image, cube,...).<br>
<br>
Yes, having what I'd call "axes" (time, spectrum, space, polarisation<br>
and (solar system, simlations) potentially many others) separate from<br>
"dimensionality" (or the distinction between relational or array-like<br>
data) would seem wise.<br>
<br>
However, there are already quite a few obscore tables out there, and<br>
I don't think it's realistic to ignore the existing terms and, in<br>
particular, the existing practice, which is what<br>
<a href="http://www.ivoa.net/rdf/product-type" rel="noreferrer" target="_blank">http://www.ivoa.net/rdf/product-type</a> largely represents. If I got to<br>
start again, I'd probably say we ought to have array1, array2,<br>
array3, array4, and relational on the "format" side, and denote the<br>
data content through combinations of terms from spectral (s), time<br>
(t), space (l as in location), p (polarisation), etc, and then have a<br>
spectral cube be s#l; there's a nice ADQL user defined function (UDF)<br>
ivo_hashlist_has that would enable reasonably elegant and potentially<br>
even indexable operations with this.<br>
<br>
Alas, as I said, we have all the existing practice out there; still,<br>
perhaps allowing "hashlists" in the datalink and obscore fields would<br>
give us most of where we might want to go without having to throw<br>
away existing practice entirely. "cube#spectrum#image"?<br>
<br>
Semantically, that's a pain, though, as you'd have two independent<br>
hierarchies in one vocabulary, and one would also need extra UDFs to<br>
enable semantic operations on such hashlists of terms. But it is at<br>
least something we ought to think about.<br>
<br>
> So, in theory (we are still developing registries to make use of this level<br>
> of detail), a user will be able to enter “spectrum”, “spectroscopy”,<br>
> “spectral”, or similar terms, and get a return set that contains all<br>
> spectra of any format anywhere in our archive. Then, to the side, they will<br>
> be offered various facets they can select on to narrow results, including<br>
> spectral type (wavelength, frequency, energy) and data format (tabulated,<br>
> 1D, 2D, etc.). We can provide brief descriptions of jargon like “Tabulated<br>
> Spectra” in mouse-over functions, so that users can decode our jargon when<br>
> we must resort to it for brevity.<br>
<br>
It can't quite work like this in the VO, because web pages aren't the<br>
main UI (and there's no such thing as "the" UI anyway); but enabling<br>
this kind of functionality for clients that want to provide something<br>
like this definitely is part of the obscore use case, I'd say.<br>
<br>
> The important break for us was realizing that the data structure is just<br>
> another independent variable, like wavelength or spectral measurement type,<br>
> used to describe the data content. By decoupling it from the science<br>
<br>
Yes -- I think that is a very valuable insight. The question for<br>
product-type is what we make of it based on what we already have and<br>
probably won't want to tear down. Hm.<br>
<br>
Well, thanks again for sharing these thoughts.<br>
<br>
The actual lists of the tags you're assigning would probably help us<br>
figure out what we'll have to expect as more solar-system data enters<br>
the VO. Are these public?<br>
<br>
Thanks,<br>
<br>
Markus<br>
</blockquote></div>