Multi-dimensional Data Access minimal requirements

Wed Mar 12 09:52:47 PDT 2014

OK...  We're now all agreed that we want to be able to specify 
circular cut-out regions...  I'd like to return to my sense of why 
this came up, answer Doug's question as to what distinguishes SIA from 
TAP (though he likely intended it rhetorically), and to suggest that 
it makes sense to consider the AccessData (or a part of it at least) 
as a full and separate interface used by but independent of SIA.

I think this confusion comes about because we instinctively know that 
an image cut-out is not a circle.  The act of cutting out the image 
creates a rectangle (or has traditionally, HEALPix and HTM pixel lists 
could be a counter example).  So when we say we are specifying a 
circular cutout what we really mean is that we are providing inputs to 
a procedure which given an image instance will calculate the actual 
cutout parameters and if requested do the cutting out.

The part of this that belongs within the SIA standard is suggested by 
the previous sentence: Where do we refer to images specifically? When 
we need to calculate the actual cutout parameters.  SIA handles all 
the aspects of the retrieval that require understanding images.  So 
anything involving WCS projections or coordinates or whatever should 
be done within the SIA service.   After the SIA service has determined 
that a particular image should be returned in the results, then if it 
supports subsetting and the user has requested a subset, it should 
know enough to immediately calculate the appropriate limits and be 
able to express them in terms of the structural parameters of that 
particular image.  Maybe it's some kind of dynamically created image 
from photon database, maybe it's just a simple FITS image, maybe we've 
decided to support HDF.  The SIA service knows how to create the 
appropriate cutout in the native terms of the image.

In the great majority of cases we will be talking about FITS image. 
We're going to return a URL to the user that will enable them to 
retrieve the specific cutout.  In the case of the FITS image there's 
no reason why this can't simply specify the axes' ranges and if we do 
that we have a lot of advantages:
   - This kind of generic subsetting capability is already implemented 
-- in lots of places I suspect. E.g., using the FITS filter 
capabilities of CFITSIO both array and table subsetting are fully 
supported in commands like FCOPY.  I think IRAF has similar features. 
  STILTS has lots of filtering of tables including both FITS and 
VOTables.  A site could just have a CGI script that allowed users to 
read a local FITS file, input the filter parameters, and write the 
output to the web.  This could be a few lines of Perl calling FCOPY.
   - Adding subsetting to SSA or SLAP or any other service that is 
going to index FITS data becomes straightforward.
   - The subsetting capability is usable outside of the SxA context to 
do anything users want.  E.g., maybe I want to get a subset of rows 
from a photon list.  I think this would be immensely useful to have 
generally, and since I think it's relatively easy to do (see FCOPY 
above) separating it out makes it easy to promote -- even to sites 
that have no intention of providing SIA services (perhaps they don't 
have images!).

And it makes it very clear what defines SIA: the use of the image data 
model.  SSA is defined by use of the spectral data model. 
Implementations of these may very well use TAP in part, and I'm 
suggesting that they should also use a model independent AccessData 
protocol to manipulate results (e.g., do cutouts).  Here I mean 
model-independent to suggest independence from any understanding of 
the semantics of the data (Image data model, Spectral data model, ...) 
although maybe it would appropriate for this protocol to be based upon 
a 'model' of abstract multi-dimensional arrays and tables which might 
then be specialized for the physical representations of these in FITS 
files and VOTables.

A standard that defined how such data subsetting and manipulation 
should be done in an archive, and that was  easy to implement, would 
have a real chance for broad adoption -- there's a real problem with 
very large data files.   I think addressing this one area could be 
immensely valuable to the community.  If we keep it very small -- 
which I think separating it out from the SxA standard does, then I 
think we could make rapid progress here.

	Tom

Douglas Tody wrote:
> On Tue, 11 Mar 2014, Ray Plante wrote:
>
>> Note that what you are calling "DataAccess" is the purpose of the
>> AccessData specification (now being written up by Pat).  And, yes, it
>> can be implemented independently of SIAv2.  In previous versions of
>> SIAv2 (as Doug mentioned), the access-data capabilities was not yet
>> split out.
>
> The queryData and accessData capabilities have always been separately
> implementable and callable service capabilities.  Both are still
> image-specific capabilities required for the full range of image access
> functionality.  What has changed is that they are now to be described in
> separate specifications, in part due to the complexity of accessData for
> things like advanced cube data access.
>
> Note, these capabilities can still be integrated together into a single
> image data access service.  The capabilities would still be separately
> callable, i.e., accessData could be used by itself if desired.
>
> See the older email excerpted below.
>
>      - Doug
>
>
> ----
>  From patrick.dowler at nrc-cnrc.gc.ca Tue Jan  7 12:52:59 2014
> Date: Tue, 7 Jan 2014 12:06:53 -0800
> From: Patrick Dowler <patrick.dowler at nrc-cnrc.gc.ca>
> To: "dal at ivoa.net" <dal at ivoa.net>
> Cc: Douglas Tody <dtody at nrao.edu>
> Subject: Re: WD-SIA-2.0, going forward
>
> On 07/01/14 10:21, Douglas Tody wrote:
>> My summary at this point would be that we have agreement on adding
>> additional query parameters, and on an integrated SIA service, but one
>> which is composed of separate capabilities described in more than one
>> document for logistical reasons
>
> If by "integrated SIA service" you mean that the implementer supports
> multiple capabilities in a single service, then yes we agree: an
> implementer can do that. The multiple capabilities could include:
>
> * query
> * metadata
> * datalink
> * access data
>
> for which the current plan is 3 documents. I expect the AccessData
> document to undergo several revisions as we include more access
> features and it may also be the right place to concentrate on
> self-describing custom services, in which case it could be somewhat
> heavy/abstract reading (we have to keep that reasonable to support
> take-up).
>
> And by integrated, I also include the flexibility feature that an
> implementer can implement a single web service resource and just use
> different REQUEST values *or* they can implement different services
> (as described in a previous message).
>
> [...]