Handling data cubes in VO

Mon Jan 9 07:30:51 PST 2006

Hi Anita -

>From Anita:
> Having just seen Doug's summary, I am now wondering if we are talking more 
> about data discovery, or manipulation? Of course, they are not totally 
> separate - a tool which allows a qhick non-quantitative look at the data is 
> useful to help decide whether to download a Gb FITS file... but we must not 
> lose sight of the VO goal to use standard models to allow data discovery and 
> manipulation to be integrated into workflows.

We need to do both, however the main challenge with large data cubes is
what you call manipulation - dyamic access to cube data.

Basically what I am suggesting is that, once we have found a cube data
collection of interest which we wish to access (be it an individual cube
image or a set of them as for a survey) then most of what we want to do
can be done with dynamic 2/3-D cutouts or 2/3-D reprojections.  The 2
or 3D cutout is easy to understand and relatively easy to generate on
the server side.  The 2 or 3D reprojection also subsets the data, but
in addition allows rotation and axis tranposition and scale changes, by
specifying the WCS and optionally image geometry of the output image.
In particular, a 2D projection of a cube at a given position and with
a specified orientation provides the capability to produce an arbitrary
slice through the cube.

In a typical scenario, following data discovery a client could download
a sub-cube for local analysis, probably in some existing VO-enabled tool
(e.g., several such are in common use within radio astronomy).  Or the
client could issue a series of requests to dynamically slice the cube, or
even extract 1D spectra through a synthetic aperture with SSA.  In effect
the SIA query (generalized to 3D), posed against a single collection or
dataset, becomes a sub-cube or slice generation function, with the query
parameters being the arguments to the function.  The "image generation
parameters" in SIA 1.0 are a 2D example of what I am suggesting.

I think this would address most or all of the functionality Arnold
suggested.  It appears this scheme would also support functionality such
as in the existing CGPS/Aladin prototype by a combination of successive
2D cutouts plus possibly some extension metadata to provide a higher
level view of the cube.

> Datacubes (and higher dimensions) also highlight one of the bees in my bonnet 
> - in many cases, even for science-ready data, specialised software will be 
> best for manipulation - both due to the size of the cubes and the range of 
> ways to handle data.  Hence, as well as data discovery, we may need software 
> discovery - i.e., the user does not just need to find a cube to download or 
> view in a standard browser tool, but also to find software to translate 'show 
> linear polarization vectors' into 'if radio take arctan U/Q ...' or whatever 
> the equivalents are for other domains.  This is quite possible, but not much 
> discussed.

I agree, although the generic data access approach outlined above would
support much analysis.  I think what you describe here is a distributed
application.  Part of the application, at least the interactive user
interface, runs on the user desktop.  The "manipulation" part runs on
the remote server with high bandwidth access to the data.  This leads to
the component-framework type of approach.  We capture the functionality
of parts of the application in reusable components and execute them via
a distributed execution framework of some sort, supporting either local
or remote execution of the same components.  (This is what we have been
discussing in the Opticon context).

The software discovery bit is merely describing components in the registry.
This is the easy part.  The hard part is defining the execution framework,
and container-component interface.

 	- Doug