Handling data cubes in VO

Thu Jan 12 11:34:52 PST 2006

On Wed, 11 Jan 2006, Anita Richards wrote:

>From me earlier:
>> The software discovery bit is merely describing components in the registry.
>> This is the easy part.  The hard part is defining the execution framework,
>> and container-component interface.

> Yes indeed, although that is exactly what we are working on in the 
> Euro-VOTech and using the AstroGrid CEA - and it should become easier now 
> that service descriptions are being handled more thoroughly by the Registry 
> standards.
>
> In fact (digressing even more) the other thing which seems to be helping a 
> lot is adoption of python as a common scripting language -
> AstroGrid CEA <=> python <=>ParselTongue <=> AIPS seems to work fine, 
> including Registry entries etc. - hope to loose it on the world soon! (but so 
> far only 2-D).

I am glad to hear you guys are making progress on this.  I agree that CEA
provides a good interface for interfacing applications into grid workflows
such as AstroGrid provides.

The approach you describe of using Python-enabled AIPS (ParselTongue)
to build VO services is exactly the sort of thing I think is needed, and
is consistent with what we have been working on within NVO and Opticon
as well.  That is, conventional data processing and analysis software,
which can also be used for interactive desktop or pipeline processing,
is wrapped to produce services which can be called from the VO.

A similar existing example of this from NVO has been done by Mike
Fitzpatrick at NOAO.  In this case a new application is scripted up in IRAF
and interfaced to VO as a Web service.  Applications can also be exposed via
a Web browser interface.  See for example  http://nvo.noao.edu/wcsfixer/.
An application such as this could also easily expose a CEA interface.

The key point here is that for applications to be useful in a VO/Grid
type of enviroment one will often need to write a new application, rather
than merely interface an existing one.  The existing tasks or components
we have in data analysis systems tend to be finer grain, with too many
assumptions about localized processing, e.g., the the high bandwidth
i/o capabilities of the environment they run in, to be used directly.
The conventional scripted, component-based data processing environment
running efficiently in a LAN is however well suited for constructing VO
applications (or services), and the same software can also be used for
desktop analysis and for pipelines.

So, we are starting to see data processing and analysis systems such as
ParselTongue (AIPS), IRAF, and probably others as well, being used to
construct VO services.   In the Opticon/NVO effort we are trying to go
one step further and develop some standard infrastructure beneath the
level of CEA.  Hence what we have is

     [CEA, WS, etc.]
 			(interface to the outside world)
 	VO-oriented application or service
 			(e.g., written in Python or Java)
 	    Execution Framework
 			(manages distributed processing, efficient, scalable)
 		Container
 			(defines framework interface as seen by a component)
 		    Component
 			(computational code, e.g., from a legacy system)

Some key interfaces are the component-container interface, the parameter
mechanism, and the binding of components into the applications layer, e.g.,
Python or Java (and potentially other things like Ruby in the future).
The wide adoption of Python as the scripting layer is also a type of
standard.

If we only standardize things down to the Python level that is still
quite useful, but standardizing things down to the level of the component
interface would allow us to mix and match components from different
origins within the same application.  That won't always be possible due
to semantic inconsistencies, but there are many cases where it would be
quite useful to be able to do so.

ParselTongue is an example of a simple execution framework (entirely Python
based in this case).  The CASA (formerly AIPS++) guys have a completely
different one, but both bind legacy computational code into Python.  PyRAF
provides something similar for IRAF, as does PyMIDAS for MIDAS.

Also needed is what we have been calling the VO-Client, which is a
client-side interface to the DAL services, SkyNode, and the registry
query interface, for VO data access from within the analysis environment.
This would probably include an integrated VOStore to cache data retrieved
from the VO, and for publishing new data back to the VO.  This would be
integrated into the local data analysis environment and would provide
efficient access to cached data for local computation.  I know that
AstroGrid already has something like this, and we are also working on
this within NVO.

 	- Doug