Reflections on SIA V2 and generic cutout services

Douglas Tody dtody at nrao.edu
Wed May 25 02:25:01 CEST 2016


Hi Francois -

This all sounds reasonable (see previous message for the full text),
I just want to expand upon the last bit.  This is of course beyond
the scope of the first version of SODA in preparation now, but I
think something like this is required eventually to provide useful
science data access for multiple classes of data.

On Wed, 25 May 2016, François Bonnarel wrote:

> Le 09/05/2016 02:49, Douglas Tody a écrit :
>> [...] But to support the
>> real world astronomical science research community we still need to
>> provide advanced capabilities for direct remote access to specific
>> classes of astronomical data, to enable distributed data analysis.
>> The use-cases, requirements, and capabilities required will differ
>> for each class of data.

> OK

>> A possible general solution here might be for SODA to define a generic
>> container service interface for data access services, providing
>> a generic WCS-based cuetout mechanism as in the current proposal,
>> but enabling data-specific "plugins", based upon data models and
>> data-specific access methods for each class of data.  This could
>> support either standardized or experimental/domain-convention
>> extensions for advanced data access, developed by sectors of the
>> community, based upon the common data access framework.  This would
>> allow the system wonks to focus on the form of the interfaces and on
>> issues such as how service parameters are composed and represented,
>> while domain experts focus on capabilities for advanced data access
>> to actually support distributed/scalable end-user data analysis.

> I think these are good reflexions for SODA 1.1. The key point for me in this 
> "plugin" is that we are probably able to describe most of the simple and 
> advanced access data operations in term of data model attributes :
>
>     The result  of an acces data operation is a dataset the description of 
> which can be made with new values of datamodel attributes (Obscore and later 
> Cube data model).
>
> In other words DataModels can help to build a "data acces" description 
> language

Yes of course.  And the data model will differ somewhat for each
class of data, if sufficiently detailed.  Especially if extended to
access methods.

The main point is that the service framework (SODA "container") would
provide the generic capabilities common to all classes of data, while
each plugin would provide what is required for a specific class of
data.  Spectral or time series data access might be much simpler than
for example advanced cube access, which could be far more complicated
and might need to support integration with the back-end functionality,
such as CASA for radio data or CIAO for X-Ray/Chandra data.  Large cube
data access, where individual cubes may be hundreds of GB in size,
and the data access functionality required is complex and specific to
the class of data, requiring integration with back-end data analysis
systems developed independently of VO, is likely the most demanding
use case currently driving the design of VO distributed/scalable data
access capabilities.

Since the generic data access service framework would be common to
all classes of data it would be justified to fully develop system
capabilities such as for interface introspection and discovery -
a real interface introspection or parameter description mechanism
might actually need to be more complex than just the 3-factor/votable
approach.  I agree that DataLink, DALI etc. already address part
of this.

The container-component/plugin architecture is normally an
implementation issue, but if it reflects the service architecture,
support would be required at the standards level, e.g., to define what
functionality is provided by the container and by each class of plugin.
It would be up to the implementation to define the details of the
physical implementation including language specifics.

I do not see how VO can address real world distributed data access
for multiple classes of data without something like this.  Either the
provided services will have too limited functionality to support
real distributed data analysis, or the time required to produce the
standards would exceed the patience (and possibly practical lifetime)
of the user community and individual projects.

It would be critical, for an effective effort to define standard
"plugin" data access functionality for a specific class of data,
to involve the broader user community, in particular the people and
projects developing the analysis software in use by astronomers.
This might be beyond the scope of the current IVOA structure, but it
is possible.

 	- Doug


More information about the dal mailing list