RFC initiated for Simple Spectral Access protocol

Tue Jun 26 00:07:42 PDT 2007

Hi All -

I am on travel now and for a day or so more, so I will only comment
briefly, on the issue of active mediation vs native data pass-through.
Jesus asked a number of questions which I think it is probably simplest
to address by looking at the general approach.  This is repeating a
fundamental architectural discussion which already took place several
years ago, but is perhaps worth revisiting again now.

To fully deal with the issue of multiwavelength analysis of data from
many sources, which in the case of spectra involves data that can be
represented external to the VO in many different ways (including no
serialized representation at all as in the case of a RDBMS or dynamic
generation of data), SSA provides as its major interface a mechanism to
actively mediate spectra to a standard data model.

At the simplest level this is really not all that complicated; some
general metadata plus spectral coordinate, flux, and error vectors.
Of course the full model is more complex than that, but the essential
bits are not terribly complicated.  Once we have gone to the trouble of
identifying the essential data elements and their units in a standard
form, why not go ahead and provide the vector data as well?

This is a general solution which will work for essentially all 1-D
spectra.  On the other hand, if we try to describe how to map these
standard data model elements onto some arbitrary external data format,
in the general case the problem is intractable.  Sure, one can do it for a
simple enough model for various common table formats (assuming the client
supports all of these), but in the general case the external format can
be anything and the problem, if posed in terms of an arbitrary external
format, becomes intractable.  By having the data provider actively mediate
the data on the other hand, we have a straightforward 1-1 transformation,
performed by software which has full knowledge of the native project data.
The client sees only the standard representation, so from the client
perspective it is quite simple.

Hence, at least for spectra (and probably also for time series), general
multi-wavelength analysis requires mediation to a standard data model.

Pass-through of native project data is also important.  This is not so
much to make things easier for the data provider, but because information
can be lost in the process of mediating data to a standard data model.
If the client software knows about data from a specific project it may
be able to do a more sophisticated analysis working directly with the
native data.  In the general case of course, the client may not be able
to deal directly with native data.

Hence, SSA provides both active mediation (on the server side) to a
standard data model, as well as pass-through of unmodified native data.
This provides both support for general multi-wavelength analysis as well
as direct access to native data.

What was implemented earlier is an intermediate approach, where native
data is allowed to be in several standard formats (all of which must
be supported by the client), and a simple data model with four terms
is used to identify the spectral coordinate and flux vectors and their
dimensional units.

While this is simple and works to some extent for conformant data,
the problem is lack of generality and a too-limited model.  It only
works for some data formats, and puts more burden on the client which
must understand all possible native data formats.  The dimensional units
lack generality and do not address all the cases; the main alternative,
the FITS OGIP syntax as used in Spectrum and SSA, is more general (if
more complex) and is also a broader standard.

In any case this 4 parameter model is a very simple model.  To fully
understand native data will require project-specific information on the
part of the client.  This is not unusual for major data collections,
and VO should support this by native data pass-through, but the only
way we can hope to provide uniformity and standardization for data from
many sources is by developing a more complex generic data model - as
we have already done for SSA.  Some loss of information will occur if
data is mediated, but that is always the case when data is combined in
a more general common analysis, and the native data is always accessible
if required.

A possible compromise here might be to restore SpectralAxis and FluxAxis
as optional attributes, to go along with the dimensional units for these
two axes.  To do this we would have to specify what the values mean
and what data formats they refer to.  These could be useful to improve
the support for native data pass-through.  However it would be good to
recognize that this is not a general solution, and if we were to expand
upon this approach we would likely be reinventing the spectral data model.
To simplify an increasingly complex client-side mapping we might find
we needed to perform the transformation on the server side and include
the data vectors in with the metadata, in which case we would be right
back where we are today with what SSA already provides.

 	- Doug