[SPECTRA] Some thoughts on the spectral model

Wed Sep 17 10:40:22 PDT 2003

As those of you who were at the recent NVO meeting know, I have been
taking a look at the datasets referred to in the spectral use survey led
by Doug Tody. Here I note a grab-bag of things that came out of this
study we should keep in mind for the spectral data model. In general,
the model for spectra presented earlier covers most of these cases, but
doesn't have a good place to put things like exposure and response
information.

1) The importance of describing clearly the observable (pixel value)
was reemphasized by the study. More cases of observables
to add to the list:
  - Antenna temperature (e.g. SWAS)
  - Ratio of two objects  (e.g. Arcturus over telluric)
    (Does this need extra metadata?)

2) We also need, separately I believe, to describe corrections
made to the observable that do not change its units or overall
interpretation:
  - absorption (atmosphere, galactic, ...)
  - fit, model,..
  - continuum-subtracted
  - lines removed

3) I note that spectra versus wavenumber are in the archives
 (e.g. Arcturus) so we do need to make sure we support this
 in the bandpass/frequency object.

4) NOAO Arc Lamp spectra: what metadata should these have to
  characterize them? In general, what metadata should calibration
  data have? Should this also handle real observation data that
  are used as calibration (e.g. a spectrum of a standard star
  might also have calibration metadata to say that 'this is a
  template for a K2IV star' as well as having the usual this-is-just-data
  metadata. An arc lamp should have metadata saying 'this is a KPNO
  HeNeAr lamp covering the following range'. How structured should
  this metadata be?

5) A lot of archives include spectral line identification tables,
   with catalogs of lines each with parameters like EW. I propose
   - as I think we agreed at Cambridge - that such data does not
   fall under the purview of 'spectrum', it is a separate object
   - possibly a special case of, or spectral analog to, 'source list', 
   possibly with a standard method to convert it to a spectrum object.
   The counter position would be to say that it is just a funny
   way to store a spectrum, with no continuum, but I think the
   extra metadata associated with specific atomic lines argues
   against this.

6) Some data are stored with several different spectra versus
   the same wavelength axis, e.g. a table with 4 columns,
         lambda, spec1, spec2, spec3
   In some cases the spectra refer to different objects,
   in others to different corrections (data, error, bitmask),
   and often to different observables (data1, data2, ratio).
   Should the spectral model treat these as a spectral array
   (a vector-valued spectrum with a single wavelength axis)
   or as an array of spectra (n-1 different spectrum objects,
   replicating the wavelength info for each one)?
   - In the case of error and bitmask, these are tightly related
   to the actual data column and will have explicit places to
   live in the model
   - In the case of two objects and their ratio, I believe
   interoperability will be better served by making data providers
   expose these to the VO as three different spectra. The
   cost is that applications will have to do work to realize that
   the spectra have compatible wavelength axes. But particularly
   in the case of the ratio, where the units are different 
   (dimensionless instead of flux) handling them as a vector could be
   messy.

 7) Although not different from the data model's point of view,
    I'm particularly concerned by the SDSS spectral FITS files.
    These have an n x 4 FITS image, where n is the number of
    wavelength points and the 4 layers are different observables
    (data, continuum-subtracted data, error, mask). This breaks
    the FITS paradigm (e.g. BUNIT is meaningless since the 4th
    plane has different units from the other 3) while using four
    n x 1 images would have been perfectly legal FITS allowing
    use of meaninful metadata. SDSS is by far not the only offender
    in this respect. Data providers will haave to take particular
    care in describing such datasets to the VO, and I suspect the
    data will have to be reformatted prior to transfer along the
    wire if generic VO tools to be developed to operate on
    spectra are to have a chance at swallowing these data.

  - Jonathan