Reflections on SIA V2 and generic cutout services

Mon May 9 02:49:35 CEST 2016

Hi Tom, all -

What is required here for this specific SkyView-type use case is
for SIAV2 to support automated discovery of virtual data, e.g.,
the service would describe a virtual image that best matches what
the client requested (mainly this involves the spatial constraints
POS+SIZE but it could use other constraints as well).  This was
supported by the proposed MODE parameter in an earlier SIAV2
draft spec.  That is, MODE=archival|cutout|match, with "archival"
being pure discovery as at present in the basic SIA V2.0, "cutout"
meaning crop the dataset but return only original pixels/voxels,
"match" being full image generation such as in SkyView, including
features such as reprojection.  The beauty of this is that the basic
SIAV2 interface would be unchanged and the client would not need
to know about SODA, DataLink, etc., or the details of the specific
image collection, to be able to get an ideal result back in one query
(discovery followed by URL-based retrieval).

AccessData/SODA is (or should be) much more powerful, providing
advanced client-directed access to a dataset, similar to for example
the image IO capabilities of classical data analysis systems, but
operating in a distributed/scalable/multiwavelenth fashion.  This is
in essential for example for advanced remote access to large image
cubes since it becomes impractical to download and locally manipulate
the datasets when they are hundreds of GB or larger.  In the specfic
case of automated virtual data discovery via SIAv2, a local SODA
image service could (called from the colocated SIAV2 implementation)
compute the metadata for the virtual image that would be computed and
returned, and the SIAV2 discovery service would return the description
of this virtual image.  Upon later client-directed access, the SODA
service would generate and return the actual virtual image dataset.

The current SODA proposal is becoming a much more generic dataset
access protocol, not that much different than DataLink, merely directed
to dataset data access, a specific class of DataLink service.  As such
it is difficult for it to provide advanced client-server data access
to specific classes of data, e.g. for for 2D access use cases like
SkyView, advanced image cube data access, or other things in the
future such as to support distributed spectral or time series analysis.

Much of the DAL discussion recently has focused on the form of
the generic SODA interface, with very little attention to actual
data access functionality, e.g. for advanced access to large image
cubes as one example.  Certainly it is useful to have a robust
and well specified service interface for generic data access, even
including features such as a general parameter mechanism supporting
capabilities for interface introspection, common to all types of
data and supporting custom service parameters.  But to support the
real world astronomical science research community we still need to
provide advanced capabilities for direct remote access to specific
classes of astronomical data, to enable distributed data analysis.
The use-cases, requirements, and capabilities required will differ
for each class of data.

A possible general solution here might be for SODA to define a generic
container service interface for data access services, providing
a generic WCS-based cutout mechanism as in the current proposal,
but enabling data-specific "plugins", based upon data models and
data-specific access methods for each class of data.  This could
support either standardized or experimental/domain-convention
extensions for advanced data access, developed by sectors of the
community, based upon the common data access framework.  This would
allow the system wonks to focus on the form of the interfaces and on
issues such as how service parameters are composed and represented,
while domain experts focus on capabilities for advanced data access
to actually support distributed/scalable end-user data analysis.

 	- Doug

On Wed, 2 Mar 2016, Tom McGlynn (NASA/GSFC Code 660.1) wrote:

> Now that SIA V2 has been approved I've been contemplating how if and how I 
> might implement it for access to the SkyView services managed at the HEASARC. 
> I'm sharing some ideas with the DAL group since I think some of these 
> thoughts may have more general relevance.
>
> I  expect the ability to select SkyView surveys based upon coverage, bandpass 
> and resolution should be very helpful.  However there are two aspects that 
> may be somewhat problematic.
>
> 1. SkyView is a cutout and mosaicking service.  So in terms of retrieving an 
> image SkyView needs  two kinds of inputs: those that select the survey or 
> surveys we are interested in (e.g., bandpass and resolution) and the WCS 
> parameters that define the region to be generated.  Even in SIA v1 it was 
> unclear how to convey this information, but since the standard required a 
> position/size input it was pretty straightforward to implement a 'reasonable' 
> approach. SIA V2 is far more flexible.  It not only doesn't require a 
> positional constraint at all, it allows users to define regions that are a 
> union of a variety of shapes.  There seem to be three options here for going 
> forward:
>  a. Use the inputs to the SIA V2 service purely for survey selection and 
> return no actual pointers to data.  Instead return datalink requests where 
> the user will be prompted for the actual bounds of the images desired for a 
> survey which meets the requirement.  I.e., the positional inputs would be 
> used only to define a region in which the survey is to have some coverage, 
> but the user would later have to input the exact bounds for the subset to be 
> created.
>  I'm not clear if datalink can be used this way: to get additional data from 
> the user.  Even if it can, it seems clumsy and makes the V2 interface take an 
> extra step compared to v1.
>
> b. Use the positional constraint (all-sky if not specified) in both the 
> coverage request and the specification of the image to be created.  This is 
> essentially what we do in v1, but we need to understand what to do with 
> multiple POS fields, and with POS fields that aren't easily transformed to a 
> rectangle on the sky.  We can treat each POS field as a separate request, or 
> we can contemplate the region defined by their union.
>
> c. Use either fixed values for the WCS parameters, or pass them using 
> non-standard parameters in the SIA call.  We did some of this in the SIA v1 
> version where users could override defaults like the resampling method and 
> map projection this way.  However if the user needs to specify critical 
> features like the image center and field of view of the image this way, then 
> they are often going to be duplicating information, and they won't be able to 
> use the SkyView SIA normally.
>
>
> Option b seems best, but it requires some more or less arbitrary decisions. 
> My initial thought is to treat each POS field separately (perhaps with a 
> non-standard parameter to request the union).  The field of view would be the 
> smallest rectangle that encloses the requested region.  This isn't perfect 
> but I think it will meet most users needs.   Since there are many cutout 
> services out there, some general guidance on how such services should provide 
> SIA2 access would be helpful.
>
> 2,  The second issue has to do with a general problem that we have in what 
> might be called 'container' services that host a number of distinct datasets. 
> IRSA's and the HEASARC's TAP services which host tables from dozens of 
> missions are other examples.  SkyView hosts ~100 different survey datasets. 
> Suppose we have a SIA2 survey that supports all of them -- that certainly 
> seems like the right way to go to harness the power of the SIA selection 
> parameters.  Where does the survey metadata go?  We want to have nice 
> descriptions of the surveys and the copyrights and the appropriate references 
> and all of that good stuff.  We don't seem to have a place for it in the 
> registry anymore.  So a user searching the registry for a given survey might 
> not find it even though it's fully available through SkyView.    In the case 
> of the TAP services, Markus has defined a way were whereby we can annotate 
> separate table entries in the registry and note that they are served by the 
> TAP service, but I don't know how I'd do that for the image survey data sets 
> we have in SkyView since I don't think there is an image counter part to a 
> general TabularSkyService.  Maybe there is and if so someone like Markus may 
> need to define the appropriate structure for a resource which does not itself 
> provide VO image services but does represent an image capability that is 
> referenced by some other VO service.
>
> This issue did not arise in SIA V1.  There it's just as easy to register a 
> separate SIA capability for each survey so that's what I did.  The ability to 
> search by bandpass and such did not exist. While I could still do that in V2, 
> it really seems like that's not the right way to go.
>
>    Tom
>
>
> For the nonce this isn't a big issue but
>