DAL2 interface consistency (was [VOSI] Re: TAP 1.0: sync vs async)
Douglas Tody
dtody at nrao.edu
Mon Jul 20 17:36:34 PDT 2009
Hi -
Thinking about this rehash of TAP/DAL2 standards a bit more I want
to expand on the importance of interface consistency, particularly
in the DAL interfaces. These recent discussions have been focused
very much on detailed technical issues such as how we provide async
and how hard it is to program a redirect. But perhaps we are losing
track of what we need to provide to the user.
For the most part the VO middleware is system software which the
user never sees. The DAL interfaces are different from most VO
middleware as they are the main interface used to access science data,
in particular by client applications which are often written by users
(e.g. the folks who come to our summer schools).
So lets stand back and look at this from the point of view of a user
trying to write or adapt science applications to talk to the VO.
Such a user will primarily see things like:
- The types of services provided and the implicit classification
of data (catalog/table, image, spectrum, etc.). Such a
classification is fundamentally object oriented, i.e., a class
structure. It should resemble common practice within astronomy.
- The operations one can perform on each type of data, defining the
functionality provided by the data access services.
- The data models and metadata used to describe data (most of
the effort for the user in fact involves this, at least for
data access/analysis).
- The data objects which are returned.
A science user will rarely see things like the details of how
asynchronous operations are performed, or authorization etc.; usually
some higher level interface will need to be provided.
So with that in mind lets go back and look at some of the issues which
have been discussed in the recent mail.
- Object model. Primary in the user interface. Since data
is inherently OO with a class structure (e.g., an image or
spectrum is an object) this is quite important. Data access
involves a data object with defined properties (metadata) and
operations which can be performed upon the object. In VO we are
dealing with virtual data not just static files or resources,
hence these operations can be nontrivial. REST has limited
capabilities for virtual data in that files can be dynamically
generated, but only if everything can be modeled as a resource
(a file-like hierarchy essentially). REST-like interfaces with
parameters can however work since this basically provides a
class with methods capability. It is important to observe REST
semantics at the HTTP level for this to work well with the Web.
- Interface consistency. Since what we have is a class hierarchy
with a high degree of inheritance of both functionality and
metadata, 90% of the service interface is common to each member
of the family of services. Since users often program directly
at the HTTP level they see these interfaces and write tools to
use these interfaces, and it is important to provide consistency
at this level. The details of what the interface looks like
are largely arbitrary but need to support the object model and
need to be standardizes otherwise we fail to provide consistency
(hence for DAL we tried to do this 3-4 years ago prior to the
roll out of all the DAL2 service interfaces).
- Sync/async. This is an important capability which users will
have to deal with at some level, however it has nothing to do
with science data. Few users will need to understand the details
of how we define this interface, e.g., in terms of /sync and
/async HTTP endpoints, or how UWS is modeled. The resource
model works well for things like kernel/process/job state,
so is reasonable to use at this level. For DAL it is important
for the services to be compliant with the GWS standards so that
we can share code, but the interface can look different than
what DAL defines for OO data access.
- Redirect of a static URL for load balancing. It takes more
time for us to discuss this here than it would take to write
the code to provide this at the service rather than applications
server level. In any case we will never have a load problem with
something like getCapabilities or getAvailability. Where we
will have load issues is with long running operations, for
which we already have UWS which already does not rely upon an
applications server to automate load balancing.
In summary the object model, service functionality provided, and
interface consistency are primary for user-developers whereas the
more technical aspects of the VO middleware, while critical, will
rarely be seen by scientific users of the VO.
Getting back to the interface consistency issue, I am reminded of a
conversation we had with science users at a recent NVO summer school.
They were complaining that cone search uses RA,DEC,SR whereas SSA and
DAL2 uses POS,SIZE, which is inconsistent. Our response was mainly
that this was due to the evolution of VO and that with DAL2 this
would all be resolved, at least for one generation of interfaces,
and everything would be standardized across all the second generation
interfaces so far as possible.
It would be very hard to explain to either such users or to the funding
agencies that all the interfaces look different because the mix of
people involved varied in each instance (or whatever the cause) and
thus the IVOA failed to successfully address such a basic concern.
Can we have a successful multi-year effort which addresses such
concerns or is it really a random walk depending upon whatever group
of people are actively involved in the discussions at a given time?
Cheers,
- Doug
More information about the dal
mailing list