WD-AccessData-1.0-20140312

Douglas Tody dtody at nrao.edu
Wed Sep 24 17:51:05 CEST 2014


On Mon, 22 Sep 2014, François Bonnarel wrote:

>      4 ) Marco and I organized a session at the interop in Banff dedicated 
> to The AccessData/SIV2/Datalink trilogy. I really encourage people to give
> talks or present demo there under the AccesData topic.
>            I specially encourage DataCube providers to give their feelings 
> experiences and thoughts as well as CSP members and other TCG members. Before 
> the meeting or during this session.

For AccessData what is especially important is what functionality is
required by client analysis programs, e.g., for advanced visualization
and analysis of large cube datasets.

The DAL interfaces up until now have been mostly concerned with
discovery and retrieval of whole datasets (except for the subtlety of
automated virtual data discovery in queryData, which has been there from
the beginning).  AccessData is different in nature, as it provides
direct interactive access to an arbitrarily large remote dataset, and
may provide some pretty advanced capabilities for filtering, subsetting,
or transforming the dataset.

For this to be successful, we need to provide the right capabilities to
client applications.  So it would be good to hear about the capabilities
required for advanced scientific analysis of large datasets - we need to
understand the consumer perspective, as well as the data provider
perspective.

So far there has been little discussion of actual accessData science
capabilities - we mostly argue about how to represent parameters in the
interface, rehashing matters that were discussed years ago.

The kinds of data access capabilities we should be looking at include
things like:

     o   Filtering, e.g., the multi-dimensional cutout expressed in world
         coordinates.  Filtering however is not necessarily limited to a
         simple cutout region; it could also be a list of ranges along
         the spectral or time axes.  One might for example, filter the
         spectral or time axis and then compute a 2D projection of the
         filtered dataset.

     o   Pixel or array-space operations, e.g., extraction of an image
         section, or resampling along an axis (block operations, sum/avg
         etc.).  This is required for analysis applications that
         repeatedly access a single dataset, e.g., for visualization and
         analysis images or large cubes.

     o   WCS reprojection or mosaicing, where the coverage and sampling
         of the output virtual image is fully specified by the client.
         Commonly used to match data from multiple sources.  May also be
         used to drive on the fly imaging.

     o   Slice at an arbitrary position an orientation in a
         multi-dimensional dataset, possibly combined with a 2D
         projection of the result.  This is like a filter or cutout, but
         need no longer be aligned with the sampling axes.

     o   Application of a function or transform, e.g., moment computation,
         spectral index, etc.  This one is somewhat open-ended hence an
         extensible mechanism is needed.  There are two types of
         functions: well-defined generic or mathematical operations, and
         algorithms.  The former can be standardized but all we can do is
         classify the latter, and provide a standard method to describe
         and call them from applications.

The accessData model I presented in the Madrid interop addresses all of
this functionality, although I haven't tried to propose a parameterized
interface to do this (my VAO prototype does provide the filter and pixel
space terms).  Although the full thing sounds complicated, and it is,
the functionality required is already present in most large astronomy
packages.

A simple way to think about what is required to support real
applications, is to look at current applications that do these things on
local (disk resident) datasets.  Our ongoing effort at NRAO to modify
the CASA viewer to work on remote datasets is an example of this, but
there are other such use cases (the Viewer is one example of a cube
visualization and analysis tool).  The basic requirement for an
interface like accessData is that it provide what is required to take
such an application and make it work on remote data in a distributed
fashion, without losing important capabilities.

         - Doug


More information about the dal mailing list