Handling data cubes in VO

Thu Dec 15 03:02:21 PST 2005

Dear Doug et al.,

This is very good news, that progress is being made.  I have copied this 
(see below for original message)
to several people in RadioNet and elsewhere interested in 3-D data access 
(so apols if you get this twice).

I do have a few comments (which relate to some of Doug's notes):

* In the near future the MERLIN archive may contain datacubes.

* Dimensions:

a)
Depending on the instrument, even for RA-Dec-wavelength-related cubes, the 
3rd axis is only linear in the correct units and the correct convention 
(as expanded at great length by Greisen et al.2003ASPC..295..403G)

b)
At some point we are going to want to extend the model to other 
combinations - transposed cubes, cubes with time as the 3rd axis, cubes 
with completely non-linear sampling (e.g. continuum images at 1.4, 1.6, 5, 
7, 22 GHz...), the polarization example given etc.

c)
There are also 4+D datasets (e.g. the standard MERLIN image has 4 axes - 
RA, Dec, Freq, Stokes).

If our first model is aimed at case a) only, we should not make it 
impossible to extend to b) and c) later (and other cases e.g  non-image 
data...)

* I have downloaded fits files of several Gb from various places in 
Europe, to JBO, and also transfered data cubes to/from Mexico.  Hence it 
is not unfeasible to have VO access to cubes.

* I do agree, however, that it is very desirable to offer services which 
produce diagnostic plots and offer cut-outs (selected by the user on the 
basis of the diagnostics)  (or even more sophisticated tools), at the data 
centre - this could be achieved by a combination of the current JIVE and 
MERLIN software/prototypes for some data, for example.

* 2D visualisation can be useful for some data - my experience is mainly 
with masers and HI absorption - interferometry data -  e.g. the spectrum 
from the visibilities or the cube; the moments of the cube... sometimes 
these are indeed not useful, usually because of a few channels being 
dynamic range limited or background continuum or lack of resolution in one 
dimension - but we need sufficient variety of use cases to cover the 
commonest cubes likely to be available in the near future.

Other points:

+ There is a prototype Aladin which allows selection and visualisation of 
a cube using a slider - very neat!

+ A python-based scripting language, parseltongue, is being developed by 
RadioNet (cf van Langevelde talk and Kettenis poster at ADASS) and I ahve 
used this to write a layer between AstroGrid and AIPS which extracts 
images from uv data on demand.  This was amazingly easy to impliment 
considering the pain of doing such a thing previously using POPS, 
shell-scripts, perl, CGI...  At present, parseltongue is specialised for 
talking to classic AIPS but it could drive any software package (e.g. 
pyraf already exists) and could be used to provide the sort of on-demand 
cut-outs and other extractions Doug mentioned.

Anyway, I am delighted with this announcment and I would like to be 
involved.

best wishes
a

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Dr. Anita M. S. Richards, AstroGrid Astronomer
MERLIN/VLBI National Facility, University of Manchester, 
Jodrell Bank Observatory, Macclesfield, Cheshire SK11 9DL, U.K. 
tel +44 (0)1477 572683 (direct); 571321 (switchboard); 571618 (fax).

On Wed, 14 Dec 2005, Doug Tody wrote:

> A small group of us met to discuss how to handle data cubes in VO, prompted
> by a query from Arecibo on how to publish data from an upcoming HI
> survey to the VO.  The conclusions from the meeting are summarized below.
>
> In short, since a data cube is a type of regularly-gridded pixel array it
> is probably best handled as an image by extending SIA to handle 3D images.
> Whole image access is generally impractical since the cubes are so large; 2D
> slices are occasionally useful but are generally not adequate for analysis.
> Hence the most common type of access is likely to be a 3D subset of the
> cube data, produced either as a cutout or by resampling.  Since cubes
> can be very large they may actually be stored as multiple data files
> in an archive, with the cutout generated from pieces of multiple files.
> In the case of fully processed cubes from a radio survey the Z-axis of
> the cube is most likely to be some form of velocity, hence the ability to
> query by velocity (relative to some specified reference frame) is important.
>
> A typical use-case would be for the user to use a tool such as the Karma
> kpvslice, running interactively on the user workstation, to visualize
> data coming from a 3D cutout or resampling service running remotely on the
> data server.  VO-Client tools could be used to locate and retrieve the data.
>
> Comments on this analysis are welcome.  One conclusion is that it is a
> priority to address 3D data in the next version of SIA.   - Doug
>
>
> ---------- Forwarded message ----------
> Date: Wed, 14 Dec 2005 12:56:18 -0700 (MST)
> From: Doug Tody <dtody at nrao.edu>
> To: Roy Williams <roy at cacr.caltech.edu>
> Cc: Steven Gibson <gibson at naic.edu>, John Benson <jbenson at nrao.edu>,
>    Arnold Rots <arots at head.cfa.harvard.edu>
> Subject: Re: VO for exposing Arecibo data
>
> For the record, some notes from our meeting:
>
>    o	SGPS (ATCA/Parkes) and CGPS are some good current examples of
> 	radio spectral data cube data of the sort we need to deal with.
>
> 	Interestingly, at NRAO we don't have much in the way of data cubes
> 	to publish to the VO.  It is more common to have "multi-band"
> 	data with 3-4 samples (e.g., Stokes I, Q, U) in the Z image axis.
> 	These are represented as 3D FITS images but a really more multi-band
> 	data than a true 3D observation (in SIA we would probably represent
> 	them as 3-4 2D images forming a logical group).  Spectral line
> 	data from VLA/VLBA, or OTF scans from GBT can produce cubes,
> 	but at present generally only the PI sees this data.
>
> 	Most radio cube data we are likely to need to deal with has XY
> 	as the spatial axes and Z as the spectral axis.  Most commonly
> 	the observable is velocity in some defined standard of rest.
> 	Frequency or wavelength is also seen but mainly for observational
> 	data.  (Hence being able to query by velocity is quite important
> 	for this data).
>
>    o	In general true 3D cubes from modern instruments are impractical to
>    	retrieve over the network.
>
>    o	By far the most important form of access appears to be some form of
> 	cutout.  We can either cut out a smaller 3D cube, or dimensionally
> 	reduce the data to produce 2D slices aligned to the image axes.
> 	The ability to resample or reproject the data is also important.
> 	Both of these cases represent 3D generalizations of what is
> 	already done in SIA.
>
>    o	2D visualization of cubes is not generally very useful.  The most
> 	common use case is to pull out a smaller 3D cube and visualize
> 	or analyze it locally using 3D tools such as Karma etc. provide.
>
>    o	The ability to handle 3D data should be a priority for the next
>    	version of SIA.  Cube data is most naturally dealt with as a type
> 	of "image" data.
>
>    o	There are use cases where cubes with time on the Z axis are also
>    	important (the spectral axis and the time axis can both have
> 	arbitrarily many samples, as can the spatial axes).
>
>
>  - Doug
>