Multi-dimensional Data Access minimal requirements

Thu Mar 13 00:12:10 PDT 2014

Hi all,

I am entering this debate quite late, but I am at a EuroVO meeting and this subject came up and I made a similar suggestion to Tom’s - I think that given the first iteration of the cube access is supposed to be simple and re-sampling etc. is not in scope then allowing cut-out simply by pixel ranges along each axis should at least be an option in the AccessData stage - I can imagine that a client that is presenting thumbnails  to a user might want to just show the middle plane in each axis of the cube for instance.

Viewed on its own for the client to have to work out at accessData stage what pixel ranges are needed to cover a particular patch of the sky and frequency band is not attractive and anyway needs full metadata to do so. However, I think that the common use case of “I’m only interested in this small sub-cube” can is covered by the Query stage additionally returning the pixel ranges within each cube that satisfy the original discovery query, which the service will have had to work out with full WCS anyway. I think that doing this does allow the first iteration of AccessData to be as simple as Tom suggests, whilst fulfilling the main data reduction use case of cut-outs.

Paul.

On 2014-03 -12, at 16:52, Tom McGlynn <Thomas.A.McGlynn at nasa.gov> wrote:

> OK...  We're now all agreed that we want to be able to specify circular cut-out regions...  I'd like to return to my sense of why this came up, answer Doug's question as to what distinguishes SIA from TAP (though he likely intended it rhetorically), and to suggest that it makes sense to consider the AccessData (or a part of it at least) as a full and separate interface used by but independent of SIA.
> 
> I think this confusion comes about because we instinctively know that an image cut-out is not a circle.  The act of cutting out the image creates a rectangle (or has traditionally, HEALPix and HTM pixel lists could be a counter example).  So when we say we are specifying a circular cutout what we really mean is that we are providing inputs to a procedure which given an image instance will calculate the actual cutout parameters and if requested do the cutting out.
> 
> The part of this that belongs within the SIA standard is suggested by the previous sentence: Where do we refer to images specifically? When we need to calculate the actual cutout parameters.  SIA handles all the aspects of the retrieval that require understanding images.  So anything involving WCS projections or coordinates or whatever should be done within the SIA service.   After the SIA service has determined that a particular image should be returned in the results, then if it supports subsetting and the user has requested a subset, it should know enough to immediately calculate the appropriate limits and be able to express them in terms of the structural parameters of that particular image.  Maybe it's some kind of dynamically created image from photon database, maybe it's just a simple FITS image, maybe we've decided to support HDF.  The SIA service knows how to create the appropriate cutout in the native terms of the image.
> 
> In the great majority of cases we will be talking about FITS image. We're going to return a URL to the user that will enable them to retrieve the specific cutout.  In the case of the FITS image there's no reason why this can't simply specify the axes' ranges and if we do that we have a lot of advantages:
>  - This kind of generic subsetting capability is already implemented -- in lots of places I suspect. E.g., using the FITS filter capabilities of CFITSIO both array and table subsetting are fully supported in commands like FCOPY.  I think IRAF has similar features.  STILTS has lots of filtering of tables including both FITS and VOTables.  A site could just have a CGI script that allowed users to read a local FITS file, input the filter parameters, and write the output to the web.  This could be a few lines of Perl calling FCOPY.
>  - Adding subsetting to SSA or SLAP or any other service that is going to index FITS data becomes straightforward.
>  - The subsetting capability is usable outside of the SxA context to do anything users want.  E.g., maybe I want to get a subset of rows from a photon list.  I think this would be immensely useful to have generally, and since I think it's relatively easy to do (see FCOPY above) separating it out makes it easy to promote -- even to sites that have no intention of providing SIA services (perhaps they don't have images!).
> 
> And it makes it very clear what defines SIA: the use of the image data model.  SSA is defined by use of the spectral data model. Implementations of these may very well use TAP in part, and I'm suggesting that they should also use a model independent AccessData protocol to manipulate results (e.g., do cutouts).  Here I mean model-independent to suggest independence from any understanding of the semantics of the data (Image data model, Spectral data model, ...) although maybe it would appropriate for this protocol to be based upon a 'model' of abstract multi-dimensional arrays and tables which might then be specialized for the physical representations of these in FITS files and VOTables.
> 
> A standard that defined how such data subsetting and manipulation should be done in an archive, and that was  easy to implement, would have a real chance for broad adoption -- there's a real problem with very large data files.   I think addressing this one area could be immensely valuable to the community.  If we keep it very small -- which I think separating it out from the SxA standard does, then I think we could make rapid progress here.
> 
> 	Tom
> 

Dr. Paul Harrison
JBCA, Manchester University
http://www.manchester.ac.uk/jodrellbank