Handling data cubes in VO

Mon Dec 26 23:24:31 PST 2005

I will attempt to summarize the discussions thus far, as well as propose
a way this could work.

Regarding data cubes vs IFU/MOS data - although loosely speaking these
are both cases of "3D" data, they are really two different problems.
The primary use case we are exploring now is the 3D "image", i.e., a
regularly sampled array with an associated WCS.  This was prompted mainly
by the radio data cube case, but a XY-time cube is much the same thing.

As Francois mentions, MOS/IFU data is probably best handled as an
aggregation of extracted, calibrated 1D spectra.  This is logical for
various reasons including the presence of measuring apertures, possibly
irregularly spaced, instead of a regular grid of pixels.  Specialized
analysis and visualization tools can still treat this as a special
kind of partially filled "3D" object if desired.  The "cutout" type of
service can still work fine for large IFU/MOS type spectral datasets.
Large amounts of data can be passed efficiently using the FITS binary
table, which SSA already supports.

For the case of the image data cube we already have a usable data model,
widely used within astronomy today - the FITS image with a FITS WCS.  Yes,
we need to have a more powerful data model to provide additional metadata
to drive uniform DAL queries and data access, but the basic data model for
a "3D image" already exists and is already in wide use within astronomy.
VO may eventually define something better, but the ND astronomical "image"
is currently our basic data model for working with this type of data.

The most interesting conclusion coming out of our earlier discussion of
use cases for analysis of radio data cubes was that the most fundamental
access operation needed for analysis of true 3D data is a client-specified
3D cutout or reprojection.  2D slices along the major image planes are
a special case of this and could be provided by any cutout service.
Slices at arbitrary angles and samplings are a form of reprojection.

In terms of access this can probably be achieved fairly simply by a
2-step process based on the coverage of the region of interest (ROI)
in N dimensions.  This is the N-D generalization of the POS,SIZE ROI
concept already provided by SIA.  Applications would make an initial
discovery query to discover data of interest.  The first query would return
metadata describing in detail the available data, e.g., an image cube or
3D collection.  This is much as we already do now for 2D data, but would
need to be generalized to 3D, would make use of the Characterisation model.

Given this detailed information on the available data a client could then
issue a more specific query to define the particular cutout/projection of
interest.  This could be a 3D subset, a 2D plane, or whatever.  A sequence
of these second-order queries could be used to fetch successive subsets
of a cube, comparable subsets of different cubes, and so forth.

The standard parameter-based SIA query can probably be used for both cases.
If desired, image generation parameters can be added to refine the query,
to control the size of the generated image in pixels, or define a specific
projection (if supported by the service).

In this way a client could discover data, get detailed metadata describing
the data of interest, and then issue a sequence of access operations to get
the actual data, all with something similar to the simple parameter-based,
uniform access interface we have now.

Image data would be returned in FITS, with a FITS WCS, to allow easy
use of the data by existing client software.  The SIA query will however
provide a way to return richer metadata (for example FITS WCS does not
fully address time coordinates whereas STC does attempt to do so, nor
does FITS fully address data characterisation).      - Doug