Handling data cubes in VO
Doug Tody
dtody at nrao.edu
Mon Jan 23 10:56:43 PST 2006
Hi Francois -
I think we are fairly close to agreement on this. The key points are that
1) Some aid needs to be provided to the client to understand the
query response, wherein a number of views or different types of
access may be provided for a single data object. This is what
you call the "access and views" data model. This can go either
into the main QR table or into extension metadata; the main table
is preferred for anything required for routine access.
2) The actual access should, so far as possible, be based on standard
access methods which can be reused for many types of data.
What we are proposing for 2) is to generalize SIA to provide access to
multidimensional (ND) regularly sampled arrays with an associated WCS.
This is not a novel idea of course, it was suggested for DAL early on but
deferred to avoid trying to do too much with the initial impmlementations,
plus much existing astronomical software already works this way.
In practical terms what this means is that SIA would be used for both 2D
and 3D data such as 2D sky projections, 3D velocity cubes, 3D time series
cubes, etc. By the time we have generalized things enough to support
3D we may as well implement a general WCS and support multidimensional
data as well, the "ND" case, although in practical terms there is very
little actual data with more then 3 dimensions, not counting cases such
as degenerate axes or other techniques for adding additional metadata to
describe additional physical parameters such as polarization.
The essential core access methods are:
1) Whole dataset.
2) Cutout. The samples (pixels, voxels, etc.) are not modified. The
axes are not changed. Dimensional reduction is however possible.
This is simple and efficient, plus for many types of analysis we
do not want to resample the data since this degrades the data.
3) Reprojection. The samples are computed on the fly. Axes may
be transposed, the scale (sample size) may be modified, fully
general rotation is possible, dimensional reduction, etc.
A complex real-world example is a 2D cut through a 3D cube at an
arbitrary reference position and with an arbitrary orientation in
3D, i.e., a general 2D slice though a cube. Another example is a
projection through a cube which "squashes" 1 axis down to 1 pixel,
producing a 2D projection of the full cube. In general this is
a very powerful access method which can do all sorts of things.
In terms of interface, this case can be supported by defining the
WCS of the output data plus some image geometry parameters.
All of these are equally useful for both 2D and 3D data.
A key thing to realize about SIA (even the current 1.0 version) is that
it supports both data discovery AND data access with the same uniform
interface. This is what we are looking for - ideally we want a simple,
powerful interface which can do all these things, rather than a lot of
special case solutions which rapidly increase complexity. We do need
metadata extension to be able to describe and understand complex data
objects (essentially subclassing the basic data model defined by a given
DAL interface), but the basic access methods should ideally work for an
entire class of data.
What I mean by supporting both data discovery and data access is the
following:
o Data discovery. In this case we pose a general query to a service
and it describes all the data known to the service which matches
the general criteria specified, including providing standard
characterisation, identification, access, and other metadata for
each candidate data object.
o Data access. Simple data access is provided by the data discovery
query - this is 99% of what is done in VO now. However we don't
have to stop there. Given the detailed dataset characterisation
provided by the discovery query (which tells us the axes, WCS,
sampling, etc.) is is possible to pose an "access query" which
specifies the exact data desired by the client, down to the level
of the exact WCS, sampling, dimensionality, file format, etc.
If desired this query can be posed against a single dataset
using the "datasetID" parameter returned by the discovery query,
providing real access to individual datasets.
In effect, for data access the query becomes nothing more than an access
method (function) with parameters. There is no need for a templated URL
because we already have a function with parameters which can specify things
down to the level addressed by the templated URL. We just need to know the
capabilities of the service - whether it supports cutouts or reprojection,
a range of output formats, and so forth.
In principle we could go one step further and have some sort of "direct
access" parameter which could allow the generated virtual dataset to
be returned directly, and we would have achieved a general version of
what the templated URL seeks to provide. However, by using the standard
method it would still be possible to separate description (the query)
from access, so that we can get standard metadata for individual datasets,
so that standard support can be provided for asynchronous data staging,
and so forth. If we bypass the interface with a templated URL probably
none of these things can be done in a standard way.
In summary I think what is needed is to generalize basic access as described
above, plus add the capability to describe more complex objects. This is
still needed to do things like associate preview images or velocity field
images with the data, associate multi-band groupings, etc. Basic data
access should be provided directly by the service in a uniform, standard
fashion, based upon a description of the service capabilities, and logical
association of related data products should be provided by the sort of
"access and views" model you describe. We may need to rethink some
things such as how output formats are handled in the current SIA, making
it possible to move some of this to the service capabilities description
and to the query parameters for a detailed "access" type query, rather
than enumerating every possibility in the query response (unless the
client requests this - it is still needed for some simple use-cases).
- Doug
More information about the dal
mailing list