roadmap 2010-2011

Douglas Tody dtody at NRAO.EDU
Sun Sep 12 14:12:01 PDT 2010


On Fri, 10 Sep 2010, Douglas Tody wrote:
> On Fri, 10 Sep 2010, Patrick Dowler wrote:
>> 1. remove all query parameters from SIAv2 and actively work on PQL
>
> As has been noted previously, the typed DAL services support actual
> data access (generation of virtual data), not just discovery (unlike
> ObsTAP for example which is a pure discovery/description interface).
> A generic PQL interface cannot support virtual data generation since
> this requires knowledge of the type of data being accessed.  Hence,
> PQL cannot replace the typed parameters in the DAL interfaces - this
> is fundamental in my opinion.  If all we want to do is data discovery
> and whole file data retrieval, then ObsTAP can be used instead of
> the typed data access interfaces.

I want to expand upon this as it is a key point affecting all the
DAL interfaces.

When we do data access we can do either whole file retrieval of
whatever is in the archive, or virtual data generation, producing
a derived data product of some sort.  Virtual data generation can be
complex, but a simple analogy is to existing interfaces like CFITSIO or
IRAF IMIO, except that the access is performed remotely (usually),
and what is returned is an object and not just a block of data.
In general virtual data generation can involve some combination of
subsetting, filtering, or transformation.

This vital to what we are trying to do with VO, to be able to scale
up to the very large datasets which are coming.  SIAV2 is a perfect
example of this since it will add the capability to deal with cube
data, and cubes can be many GB in size.  There are numerous other
examples where virtual data generation is needed, e.g. on the fly
imaging of radio or xray data, or computation of theoretical spectra.
There are also important use cases for conventional 2D images and
spectra, e.g.  cutting out a small 2D image region (or hundreds of
them), reprojecting 2D image data, or cutting out a region around
a single spectral line in a high resolution spectrum.  We already
do many of these things with the existing SIAV1 and SSA interfaces.

We need both simple whole-file and virtual data access capabilities.
Virtual data access capabilities are essential to enabling distributed
data analysis (analysis performed directly on remote data without
first downloading the data), and to scaling up.

Discovering and downloading whole archive files for local processing
is of course a major use case - this is probably still the dominant
form of data access.  But the typed interfaces like SIA/SSA already
support this simple mode of access; this is the "simple" mode these
interfaces support.  Soon we will have ObsTAP with both ADQL and
PQL query interfaces, which will provide a simpler alternative for
whole file discovery and access, adding the capability to access and
associate any type of data, at the cost of some lost object-specific
metadata.

PQL does play an essential role and should be a high priority for
development, it is just that it is by intention generic - it implements
the generic dataset model (ObsDM).  This is both an advantage and a
limitation.  But that is ok as we also have the typed DAL interfaces
to add knowledge of and advanced access capabilities for a specific
type of astronomical data.

The typed interfaces extend the generic query interface in important
ways for each type of data, and add virtual data generation
capabilities (the queryData response can describe virtual data).
In object modeling terms the typed interfaces in effect subclass
PQL to add support for a more specific data model (image, spectrum,
etc.), including defining object-specific semantics and additional
object-specific query/access parameters.  While the query may look
similar in each interface these added semantics are extremely important
as they represent the difference between (for example) a catalog or
an image or a spectrum or a theoretical model.

The primary role for PQL is in TAP (for catalog access) and ObsTAP (for
global data discovery of all data product types using the observation
data model).  Hence one might use ObsTAP with PQL to discover all the
data for a region on the sky, and then use SIA or SSA etc. for data
access more advanced than merely downloading entire archive files.
If whole file access is sufficient then ObsTAP alone might be enough.
More complex data analysis use cases require the capabilities of the
typed DAL interfaces with their customized parameter interfaces.

 	- Doug


More information about the dal mailing list