SIAPv2 and datacubes

Fri Mar 20 04:17:00 PDT 2009

Thanks to Francois Bonnarel for pointing me at SIAP V2 0.2

These comments are informed by the work done for the EuroVO AIDA taks
on Data Cubes and on Units as well as in the IVOA data model group,
expecially on Polarization.

Overall, speaking as an astronomer-turned-data-provider, i.e. not a
software expert, it was great to see that the expansion is in the
direction of many things which I have seen raised in VO workshops
etc. It seemed reasonably intelligible, although a final version
should be more concise and not assume prior knowledge of SIAP v1.
Nonetheless, being garrulous and having prior knowledge, here are
some comments.  Sorry in advance if I have misunderstood any points.

2.2 Implications for SIAv2

Asynchronicity

It will be good to have explicit support for asynchronous queries.  In
fact, we have been operating asynchronous 'pseudo-SIAPs' for some
years, e.g. the MERLIN Imager and related services.  These require
SIAP-like inputs and return a VOTable intended to be SIAP-compliant,
but use CEA (now UWS) to manage long-running data processing to
produce the required images.  The main problems have been, firstly,
that I have invented conventions for parameters not yet specified
(minor) and that the Registry (or at least previous versions) could
only describe the services as a type which could not have associated
coverage etc. (major).  That is, you had to look for 'MERLIN' etc.;
the service would not show up if you looked for 'radio' or 'image'.

What is returned?

This section describes the option of a 2-stage process to request and
generate the data, but it was less clear to me that returning the data
can also be 2-stage, i.e. the VO returns the metadata but the user
decides when and how to get the actual data.

As I understand it, it is acceptable (normal?)  that the actual images
are not returned directly to the astronomer (or to VOSpace), but
instead, a VOTable containing various metadata including an ACCREF is
returned. Tools such as Aladin recognise the ACCREF, which in the case
of the MERLINImager is the URL of where the newly-generated data
reside on the MERLIN server, or it can be extracted from the
VOTable. This means that the astronomer can download large datasets
directly, rather than via VOSpace, including putting them somewhere
different from where they do their VO work or sending them directly to
a different application.  Typical MERLINImager users send off a
request via VOExplorer from their laptop, have a quick look at images
in Aladin, then use wget and the ACCREF url to suck the images they
want onto a desktop machine running their favourite package.

3.1 Basic capabilities

What does 'Updated query parameters' mean?  Is this intelligent
defaulting, e.g.  if I ask for an image at 8 GHz, the service will
know that the nearest available frequency is 6.8 GHz? It would be nice
to have two optional modes; either the user can 'use defaults' and let
a service decide what is the closest match even if it is outside the
query parameter range; or 'ask me' so the user is shown the nearest
offer.  I think that is what is being said, am I right?

3.2 Basic Whole-Image discovery and access

It is good to see support for non-positional queries, that is always a
bone of contention at workshops!

3.3 Image Cutouts and Mosaics
3.4 Cube Data Access

It seems to me that some of the introduction about cubes needs to come
at the start of or before 3.3.  As I understand it, 3.3 describes
services which may chop-up or combine data, including >2D, but does
not further modify it, and 3.4 describes more complicated services.

How about a separate brief description of what sorts of multi-D data
can be covered (and how the protocol might in future be extended - why
stop at 4?), followed by these two sub-sections renamed something like
'Cutouts and Mosaics' and 'Resampling'

Also, I guess that the vast majority of 3D data will be
position-position-(freq/wave/velocity) - or, the only? cubes currently
fully published to the VO, NDSS Stokes IQU. If we can start by
handling these cubes, returning simple cut-outs with no resampling, or
allowing access to data provider services, we are doing well.

Using the present structure:

3.3

Minor comment - don't be unnecessarily prescriptive unless there is a
handling problem e.g. why should Band and Time only apply to 3D data -
I might have a 2-D image of peak velocity drift with time to
investigate Keplerian motion.... I certainly have many 2D images with
position on one axis and velocity on the other, as do long-slit
spectroscopists.

Polarization:

  There is a summary of present usage, at least using radio
conventions, at ****, and at the Trieste IVOA I was told that there
was little or no other polarization data available for VO retrieval
Most radio (incl.(sub-)mm) astronomers think in terms of total
intensity, linear or circular polarization, and derived products,
which is reflected in the ucd's currently available.  Hence, if I
requested circular polarization data, I would expect to get back
either one image in Stokes V, or a pair of images in LL and RR, or in
XY and YX, with this information in the metadata.  If an astronomer
requests a reprocessed polarization product (e.g. to obtain Stokes V
from XY YX data), the processing should be left to the data provider
since different arrays (or even the same array at different times) use
different sign conventions, as fr as I can see. The formal Stokes
IQUV should all be the same, as should the parallel hands (LL etc.)
but the route from the cross hands (RL etc.) to Q U and V may differ.

Velocity etc.

I presume that in this section, the band axis supplied is in the same
units as the native data. STC defines exhaustively all the
possibilities for frequency, wavelength, energy, velocity conventions
and reference frames, but actual conversions are fraught.

* Frequency <> Wavelength
   Selection simple,
   Resampling (required in conversion of actual data) risky

Conversion is simple at the level of the region to be covered or cut
out (freq=c/wave) but it is usually undesirable to resample the axis
since this is non-linear in channel (i.e. pixel) width, which usually
screws up the data (some packages claim to be able to do this, I am
waiting to be convinced - anyway, if done at all, best done by the
data provider or the end user).

* Frequency <> Velocity of any type (data observed in freq units,
   Selection in one unit for data in another requires much metadata or
   approximation.
   Resampling for conversion is OK as long as metadata are available.

For full accuracy, conversion requires a knowledge not only of the
velocity reference frame and convention, but also of the line rest
frequency and the reference channel. The Heliocentric frame also
required the observation date.  In the case of data supplied in
velocity, has the object's peculiar motion been subtracted?  It might
be acceptable to use defaults for data selection.  We have to decide
how far to go in modelling options for conversion or resampling before
it gets too complicated, and we have to just leave it to the data
provider's interface or to the user.

3.4

Sampling

How necessary/rigid is the definition 'regularly sampled data'?
Can this mean just that the 3rd axis has some monotonically
incrementing (or decrementing) value?

- If I want a cut-out from a cube with frequency as the 3rd axis, no
   resampling, but I give the bounds in wavelength units, the data
   returned will be in frequency units but the 3rd axis is thus
   non-linearly sampled in wavelength.

- SiO maser movie cubes are monotonically but irregularly sampled on
   the time axis

- A continuum SED cube has frequency as the 3rd axis; each image plane
   is at a different frequency increment, and has a different channel
   width, such that the channels may have gaps (or even overlap - but
   that is a problem scientifically as well as for the model so maybe
   we can ignore it? In optical work is that taken care of by
   calibration?)

- The third axis values are integer labels, as in a Stokes axis

Characterisation can cope with these eventualities. Some are not
accurately represented in FITS headers although FITS cubes can be made
and handled adequately.  The CASA image format can cope with
heterogeneous multiple spectral windows. It would be very useful if the
VO could provide a standard for data providers to publish such
metadata (optionally - default would be regular contiguous sampling)
and for the astronomer to get it intelligibly.

Velocity etc. and observables

We need to distinguish between what Char calls independent axes (RA,
Dec, Frequency etc.) and Observable quantities.  The latter usually
includes flux density, but also velocity dispersion and spectral index
(which are flux densities combined in various ways, weighted by the
positions on an axis), etc.  It would be possible to imagine a cube
with the third axis being spectral index and the observable of each
plane of the cube being flux density at a particular position and flux
density.  However it would be far more common for spectral index to be
the value at each pixel of a 2-D image or of a cube with frequency as
the 3rd axis.

Do people publish cubes with redshift as the 3rd axis? If so, similar
conventions and caveats apply as for velocity conversions; supporting
simple cutouts (including converting the bounds, possibly only
approximately), but probably leaving resampling to the provider or end
user.

This also overlaps with the point on 'reduction', since generating
e.g. a velocity dispersion image involves collapsing a 3-D image with
RA, Dec and Velocity axes, and a flux density at each pixel of each
plane, with an RA-Dec image with the flux density weighted by velocity
difference at each pixel.

Interpolation

Different schemes are appropriate in different regimes and I think
that (at least initially) we have to leave it to the data provider to
chose the most suitable.  If the astronomer knows what they are doing,
they can download the data without interpolation and do it themselves;
if they don't, then there is no point offering a choice.

Rotation

If the rotation is done by the data provider, then the restrictions
will be peculiar to their regime - i.e. it is a matter for metadata,
not a service standard?

Convolution and resolution

I think that that the same arguments apply for some kinds of image
(let the provider decide or else let the astronomer do it off-line).
However, there is a related issue which should be spelt out
explicitly, which is that the resolution of an interferometry image is
not fixed.  If an image (or cube) is generated on-demand from
visibility data, the data weighting can be tweaked to change the
resolution in the sense of the restoring beam (PSF), by a factor of 2
or more.  This also affects the optimum image pixel size.  Offering a
range of resolutions for interferometry data is important, but I
suggest that how to achieve the resolution requested is best left to
the data provider (if you know enough to steer it, and care, you might
want a more specialised service or to download the visibilities).

Is that what is meant by 're-use legacy code'?

There is also the 'elephant in the room' that radio image flux density
units are often Jy/beam (not per pixel or per arcsec etc.) and the
beam size is rather hidden in FITS images.  Specialised packages know
where to find it, but data providers should be enabled to place this
information (which may be a range of values, for on-demand images) in
the metadata.

4.1 Query parameters

It would be good to try some actual examples to see what is covered
and whether terms are interpreted unambiguously by providers and
astronomers.  As ever, we should minimise the number of compulsory
terms and/or allow for defaults.

One comment - SNR is mentioned, I presume that the actual noise rms is
also a parameter?

Digression on nature of 3-D cubes

Cubes with two position axes (here, X Y) and one wavelength or
frequency axis (here, 3rd axis) can have a quite different character
depending on their instrumental origin.

Radio (up to sub-mm) and possibly other spectral line cubes

Spectral line radio cubes generated by interferometry and/or single
dish mosaicing are usually regularly sampled on the X and Y axes and
on the 3rd axis in frequency.  That is, the cubes can be visualised as
a stack of 2-D images, each at a small frequency increment with
respect to the previous image plane.

The same probably applies to optical interferometry cubes and to
stacks of X-ray images in different bands (but with different 3rd axis
units).

Radio etc. cubes are usually fairly well calibrated since this has to
be mostly completed prior to the Fourier transformation or mosaicing
stages. Hence, calibration metadata could be supplied, and a link to
the processing history should be available, but it is not usually
vital for science use.  Simple translation to improve the astrometry
(or to shift the rest frequency of a velocity cube to a different
line, without changing the convention), or linear rescaling to improve
the photometry, are likely to be the most that is needed, placing no
requirements apart from the basic descriptive metadata.

Radio cubes can be visualised by Aladin, GAIA etc. although more
specialised packages are needed for most quantitative measurements
(e.g. an HI spectrum could be extracted from a velocity cube using a
VO tool, to give the approximate range and central velocity of a
galaxy, but you would probably want to use AIPS or CASA etc. to
measure the optical depth and deduce the spin temperature -- although
this might be a service offered by the provider).

IFU

I have tried to follow the Euro3D specification and the Char example.
If I understand correctly, these can be visualised as an array of
spectra which are not necessarily regularly position in X and Y, nor
is the XY plane fully sampled, and the individual spectral may cover
different ranges. Calibration and editing data are supplied as part of
the data, I presume because some kind of conditional judgement
depending on the science goal is required in their application.  It
follows that specialised software is probably needed even to visualise
the data?

My only, rather selfish, concern, is that it seems to me that
radio-type cubes are simpler to describe and query, and radio data
providers and queriers should not have to wrestle with too many
concepts which are not really needed or applicable.

Best wishes

Anita

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Dr. A.M.S. Richards, UK ARC Node/AstroGrid,
Jodrell Bank Centre for Astrophysics, Alan Turing Building, 
University of Manchester, M13 9PL
+44 (0)161 275 4124
and
MERLIN/VLBI National Facility, Jodrell Bank Observatory, 
Cheshire SK11 9DL, U.K. +44 (0)1477 571321 (tel) 571618 (fax)

"Socialism or barbarism?" Rosa Luxemburg (1915)