WD-SIA-2.0, going forward

Douglas Tody dtody at nrao.edu
Tue Jan 14 09:32:01 PST 2014


Hi Pat -

The traffic on the DAL list last week got messed up; our earlier
messages came through out of order and one earlier message (your initial
posting) was missed.  I did not notice until later that this message
from you below was a new one.

Anyway, to get back to the issue of the automated virtual data
generation (AVDG) capability, if the query can describe virtual data
products that will be computed only if accessed, and the service
implementation chooses to use accessData to compute those data products
(reasonable since accessData likely provides the functionality to do
so), then of course there can be coupling between the query and
accessData.  However, this is not done explicitly, i.e. there is no
explicit coupling.  The client just sees an acref URL, which may or may
not be invoking accessData internally within the URL.

AVDG is more than just an optimization.  Yes, in principle the client
might be able to do the same thing by 1) discovering the image dataset
or datasets of interest, 2) querying those for their image metadata; 3)
querying the metadata for the image service to determine whether an
accessData capability exists and what its capabilities are; 4) invoking
accessData to compute the metadata for a virtual image corresponding to
the original discovery query, 5) invoking the accessData request to
compute a virtual image.

The above is what AVDG does, all on a single operation.  Steps 1-4 are
part of the query, and 5) is hidden within the acref for the virtual
image, if indeed accessData is used to generate the specific image
product.  The service implementation could also use something other than
accessData, or it what is used might be image-specific.

This is more than a single operation as the client has to know a great
deal about the data and the service to do all this, and the high degree
of information hiding provides scope for the service to do things behind
the scenes that would not be possible via the public service API.  Even
then, if the client does all that, we are only talking about accessing a
single image via a single service.  If we then scale up to querying
multiple image datasets, data collections, or services, it is basically
unworkable.  AVDG makes automated virtual data generation simple for the
client and provides scalability, by moving most of the logic to the
server side, where many things which would otherwise have to be queried
via public interfaces are known intrinsically.

To summarize, a simple discovery query just finds static archival images
and tells the client what is available.  AVDG takes it a step further;
the client describes the image data it wants for analysis, and the
service automatically computes how close it can come to what is
requested by generating virtual data.

SIAV1 has always had this capability, however it was done by defining
different subclasses of image services: archival, cutout, mosaic.  It
turned out to be awkward to have different service subtypes (a service
that can do virtual data generation can also easily return archival
images); adding the mode parameter allows a single service to provide
the full range of capabilities it implements.

AccessData also does virtual data generation, but comes at it from the
complete opposite direction: explicit client-directed access instead of
automated data generation.  But client-directed access is what we need
for a different type of use-case, repeated precision access to a single
large image cube being a prime example.

Our VAO SIAV2 prototype already provides all of this: simple archival
image retrieval, AVDG via queryData and the mode parameter, and
accessData, although the accessData functionality provided is limited to
simple cutout generation at this point.

 	- Doug



On Tue, 7 Jan 2014, Patrick Dowler wrote:

>
> This sounds exactly like the scope of AccessData, where the caller decides on 
> the operations they want performed. So I will ask the question again in a 
> different way:
>
> Is the purpose of this "mode" parameter at the query stage to cause the 
> result to only include records where a specific access data operation is 
> available? And to pre-compute and include the access data URL directly in the 
> query result?
>
> If so, this is an optimisation of the general case that comes at the cost of 
> coupling this mode parameter in the query capability to the operations 
> available in the access data capability.
>
> Pat
>
> On 07/01/14 12:18, Douglas Tody wrote:
>> The main thing missing at this point is a "mode" parameter of some sort,
>> to give the client control over automated virtual data generation.  Also
>> possibly some clarity on the use of POS to define the ROI for image
>> generation, and a description of the AVDG capability at the registry /
>> VOSI level so that a client can determine if a service has this
>> capability.
>>
>> For mode, what I had before was "archival" (whole images), "cutout" (do
>> not interpolate or otherwise recompute pixels), and "match" (do
>> everthing the service can do to match the ideal image, e.g.,
>> reprojection, mosaicing, etc.).
>> 
>> We don't want to just always do AVDG as it is expensive and will slow
>> down the query, so a mode parameter or some such is needed to enable the
>> feature.  Whether or not new pixels are computed is critical or
>> irrelevant depending upon the client application, so control is needed.
>> 
>> Other than that we appear to have the critical elements, since virtual
>> images can be described in the query response and the provided acref can
>> point to AccessData to generate the image.  DataID.CreationType, which
>> we already have in the data model, can tell whether the described image
>> is archival, a cutout, etc.
>> 
>> Most of this logic applies to Spectra and TimeSeries as well, and
>> would want to be included in a future version of these services.
>
> -- 
>
> Patrick Dowler
> Canadian Astronomy Data Centre
> National Research Council Canada
> 5071 West Saanich Road
> Victoria, BC V9E 2M7
>
> 250-363-0044 (office) 250-363-0045 (fax)
>


More information about the dal mailing list