Reflections on SIA V2 and generic cutout services

Wed May 25 00:35:19 CEST 2016

Hi Doug, Tom, all

    In Stellenbosch near Cape Town, SA some last progress have been made 
for convergence on the very fisrt version of SODA.

Le 09/05/2016 02:49, Douglas Tody a écrit :
> Hi Tom, all -
>
> What is required here for this specific SkyView-type use case is
> for SIAV2 to support automated discovery of virtual data, e.g.,
> the service would describe a virtual image that best matches what
> the client requested (mainly this involves the spatial constraints
> POS+SIZE but it could use other constraints as well).  This was
> supported by the proposed MODE parameter in an earlier SIAV2
> draft spec.  That is, MODE=archival|cutout|match, with "archival"
> being pure discovery as at present in the basic SIA V2.0, "cutout"
> meaning crop the dataset but return only original pixels/voxels,
> "match" being full image generation such as in SkyView, including
> features such as reprojection.  The beauty of this is that the basic
> SIAV2 interface would be unchanged and the client would not need
> to know about SODA, DataLink, etc., or the details of the specific
> image collection, to be able to get an ideal result back in one query
> (discovery followed by URL-based retrieval).
I personnally think that such an evolution is pretty possible in the 
next version of SIAV2 (and maybe also ObsTAP but this is probably more 
difficult because this will include "virtual" tables in a TAP service.  
Don't know if it is possible to "force" TAP to do that)
>
> AccessData/SODA is (or should be) much more powerful, providing
> advanced client-directed access to a dataset, similar to for example
> the image IO capabilities of classical data analysis systems, but
> operating in a distributed/scalable/multiwavelenth fashion.  This is
> in essential for example for advanced remote access to large image
> cubes since it becomes impractical to download and locally manipulate
> the datasets when they are hundreds of GB or larger.  In the specfic
> case of automated virtual data discovery via SIAv2, a local SODA
> image service could (called from the colocated SIAV2 implementation)
> compute the metadata for the virtual image that would be computed and
> returned, and the SIAV2 discovery service would return the description
> of this virtual image.  Upon later client-directed access, the SODA
> service would generate and return the actual virtual image dataset.
Yes to take into account properly all the data axes in their interfaces, 
SIAV2 and SODA had to go backwards and start to provide very basic 
functionalities but on all the cube axes together.
What we losed in complexity of functionalities we winned in axes 
multiplicity
>
> The current SODA proposal is becoming a much more generic dataset
> access protocol, not that much different than DataLink, merely directed
> to dataset data access, a specific class of DataLink service.  As such
> it is difficult for it to provide advanced client-server data access
> to specific classes of data, e.g. for for 2D access use cases like
> SkyView, advanced image cube data access, or other things in the
> future such as to support distributed spectral or time series analysis.
>
> Much of the DAL discussion recently has focused on the form of
> the generic SODA interface, with very little attention to actual
> data access functionality, e.g. for advanced access to large image
> cubes as one example. 
Advanced access will definitely come in future versions of SODA.
> Certainly it is useful to have a robust
> and well specified service interface for generic data access, even
> including features such as a general parameter mechanism supporting
> capabilities for interface introspection, common to all types of
> data and supporting custom service parameters.
Since Stellenbosch decisions, SODA descriptors and generic parameter 
description are now context dependant : the closer SODA is from the 
discovery phase the less we need to describe these parameters, because 
the discovery gives a lot of knowledge on SODA domains.

I personally think a full description of a generic parameter generation 
based on "three factor semantics" and including standard service 
autodescription could take place in DataLink 1.1, beside the description 
of the {"links} resource" and of the basic "service descriptor" 
definition  we allready have in DataLink 1.0
> But to support the
> real world astronomical science research community we still need to
> provide advanced capabilities for direct remote access to specific
> classes of astronomical data, to enable distributed data analysis.
> The use-cases, requirements, and capabilities required will differ
> for each class of data.
OK
>
> A possible general solution here might be for SODA to define a generic
> container service interface for data access services, providing
> a generic WCS-based cuetout mechanism as in the current proposal,
> but enabling data-specific "plugins", based upon data models and
> data-specific access methods for each class of data.  This could
> support either standardized or experimental/domain-convention
> extensions for advanced data access, developed by sectors of the
> community, based upon the common data access framework.  This would
> allow the system wonks to focus on the form of the interfaces and on
> issues such as how service parameters are composed and represented,
> while domain experts focus on capabilities for advanced data access
> to actually support distributed/scalable end-user data analysis.
I think these are good reflexions for SODA 1.1. The key point for me in 
this "plugin" is that we are probably able to describe most of the 
simple and advanced access data operations in term of data model 
attributes :

      The result  of an acces data operation is a dataset the 
description of which can be made with new values of datamodel attributes 
(Obscore and later Cube data model).

       In other words DataModels can help to build a "data acces" 
description language

Regards
François
>
>
>     - Doug
>
>
> On Wed, 2 Mar 2016, Tom McGlynn (NASA/GSFC Code 660.1) wrote:
>
>> Now that SIA V2 has been approved I've been contemplating how if and 
>> how I might implement it for access to the SkyView services managed 
>> at the HEASARC. I'm sharing some ideas with the DAL group since I 
>> think some of these thoughts may have more general relevance.
>>
>> I  expect the ability to select SkyView surveys based upon coverage, 
>> bandpass and resolution should be very helpful. However there are two 
>> aspects that may be somewhat problematic.
>>
>> 1. SkyView is a cutout and mosaicking service.  So in terms of 
>> retrieving an image SkyView needs  two kinds of inputs: those that 
>> select the survey or surveys we are interested in (e.g., bandpass and 
>> resolution) and the WCS parameters that define the region to be 
>> generated.  Even in SIA v1 it was unclear how to convey this 
>> information, but since the standard required a position/size input it 
>> was pretty straightforward to implement a 'reasonable' approach. SIA 
>> V2 is far more flexible.  It not only doesn't require a positional 
>> constraint at all, it allows users to define regions that are a union 
>> of a variety of shapes. There seem to be three options here for going 
>> forward:
>>  a. Use the inputs to the SIA V2 service purely for survey selection 
>> and return no actual pointers to data.  Instead return datalink 
>> requests where the user will be prompted for the actual bounds of the 
>> images desired for a survey which meets the requirement.  I.e., the 
>> positional inputs would be used only to define a region in which the 
>> survey is to have some coverage, but the user would later have to 
>> input the exact bounds for the subset to be created.
>>  I'm not clear if datalink can be used this way: to get additional 
>> data from the user.  Even if it can, it seems clumsy and makes the V2 
>> interface take an extra step compared to v1.
>>
>> b. Use the positional constraint (all-sky if not specified) in both 
>> the coverage request and the specification of the image to be 
>> created.  This is essentially what we do in v1, but we need to 
>> understand what to do with multiple POS fields, and with POS fields 
>> that aren't easily transformed to a rectangle on the sky.  We can 
>> treat each POS field as a separate request, or we can contemplate the 
>> region defined by their union.
>>
>> c. Use either fixed values for the WCS parameters, or pass them using 
>> non-standard parameters in the SIA call.  We did some of this in the 
>> SIA v1 version where users could override defaults like the 
>> resampling method and map projection this way.  However if the user 
>> needs to specify critical features like the image center and field of 
>> view of the image this way, then they are often going to be 
>> duplicating information, and they won't be able to use the SkyView 
>> SIA normally.
>>
>>
>> Option b seems best, but it requires some more or less arbitrary 
>> decisions. My initial thought is to treat each POS field separately 
>> (perhaps with a non-standard parameter to request the union).  The 
>> field of view would be the smallest rectangle that encloses the 
>> requested region.  This isn't perfect but I think it will meet most 
>> users needs.   Since there are many cutout services out there, some 
>> general guidance on how such services should provide SIA2 access 
>> would be helpful.
>>
>> 2,  The second issue has to do with a general problem that we have in 
>> what might be called 'container' services that host a number of 
>> distinct datasets. IRSA's and the HEASARC's TAP services which host 
>> tables from dozens of missions are other examples.  SkyView hosts 
>> ~100 different survey datasets. Suppose we have a SIA2 survey that 
>> supports all of them -- that certainly seems like the right way to go 
>> to harness the power of the SIA selection parameters.  Where does the 
>> survey metadata go?  We want to have nice descriptions of the surveys 
>> and the copyrights and the appropriate references and all of that 
>> good stuff.  We don't seem to have a place for it in the registry 
>> anymore.  So a user searching the registry for a given survey might 
>> not find it even though it's fully available through SkyView.    In 
>> the case of the TAP services, Markus has defined a way were whereby 
>> we can annotate separate table entries in the registry and note that 
>> they are served by the TAP service, but I don't know how I'd do that 
>> for the image survey data sets we have in SkyView since I don't think 
>> there is an image counter part to a general TabularSkyService.  Maybe 
>> there is and if so someone like Markus may need to define the 
>> appropriate structure for a resource which does not itself provide VO 
>> image services but does represent an image capability that is 
>> referenced by some other VO service.
>>
>> This issue did not arise in SIA V1.  There it's just as easy to 
>> register a separate SIA capability for each survey so that's what I 
>> did.  The ability to search by bandpass and such did not exist. While 
>> I could still do that in V2, it really seems like that's not the 
>> right way to go.
>>
>>    Tom
>>
>>
>> For the nonce this isn't a big issue but
>>