World coordinates cutouts/versus pixel cutouts Re: Multi-dimensional Data Access minimal requirements

François Bonnarel francois.bonnarel at astro.unistra.fr
Fri Mar 14 04:13:28 PDT 2014


Hi Paul, Tom, all,
     Of course it would be nice to have this functionnality and it has 
been discussed in the DAL group vor AccessData version .... 1.1.
While it may seem simpler, (and it is as far as syntax definition is 
concerned maybe) it is actually not true. Because if it is for a pixel 
cutout query to have any scientific value, some a priori knowledge (even 
rough) of the Mapping between the pixels and the world coordinates. This 
knowledge has to be used either by a client or by the service itself to 
prepare usefull pixel cutouts queries.
      In the cube access caravane version 1.0 , due for May this year, 
what we have for discovery/description is ObsCore (obtained by an OBsTAp 
response to an ADQL query or by an SIAV2 query response obtained through 
a PQL request) and ObsCore doesn't contain any Mapping information.
      And we cannot assume that people operating ObsTap have the correct 
mapping information stored independantly from the cubes themselves to 
prepare  the pixel cutouts a priori.
      Version 1.1 of the caravane will have SIAV2 metadata resources 
including Mapping (from ImageDM) and also (hopefully for me) Virtual 
data generation and discovery. This will allow pixel cutout queries to 
make sense.
Best regards
François
Le 13/03/2014 08:12, Paul Harrison a écrit :
> Hi all,
>
> I am entering this debate quite late, but I am at a EuroVO meeting and this subject came up and I made a similar suggestion to Tom’s - I think that given the first iteration of the cube access is supposed to be simple and re-sampling etc. is not in scope then allowing cut-out simply by pixel ranges along each axis should at least be an option in the AccessData stage - I can imagine that a client that is presenting thumbnails  to a user might want to just show the middle plane in each axis of the cube for instance.
>
> Viewed on its own for the client to have to work out at accessData stage what pixel ranges are needed to cover a particular patch of the sky and frequency band is not attractive and anyway needs full metadata to do so. However, I think that the common use case of “I’m only interested in this small sub-cube” can is covered by the Query stage additionally returning the pixel ranges within each cube that satisfy the original discovery query, which the service will have had to work out with full WCS anyway. I think that doing this does allow the first iteration of AccessData to be as simple as Tom suggests, whilst fulfilling the main data reduction use case of cut-outs.
>
> Paul.
>
> On 2014-03 -12, at 16:52, Tom McGlynn<Thomas.A.McGlynn at nasa.gov>  wrote:
>
>> OK...  We're now all agreed that we want to be able to specify circular cut-out regions...  I'd like to return to my sense of why this came up, answer Doug's question as to what distinguishes SIA from TAP (though he likely intended it rhetorically), and to suggest that it makes sense to consider the AccessData (or a part of it at least) as a full and separate interface used by but independent of SIA.
>>
>> I think this confusion comes about because we instinctively know that an image cut-out is not a circle.  The act of cutting out the image creates a rectangle (or has traditionally, HEALPix and HTM pixel lists could be a counter example).  So when we say we are specifying a circular cutout what we really mean is that we are providing inputs to a procedure which given an image instance will calculate the actual cutout parameters and if requested do the cutting out.
>>
>> The part of this that belongs within the SIA standard is suggested by the previous sentence: Where do we refer to images specifically? When we need to calculate the actual cutout parameters.  SIA handles all the aspects of the retrieval that require understanding images.  So anything involving WCS projections or coordinates or whatever should be done within the SIA service.   After the SIA service has determined that a particular image should be returned in the results, then if it supports subsetting and the user has requested a subset, it should know enough to immediately calculate the appropriate limits and be able to express them in terms of the structural parameters of that particular image.  Maybe it's some kind of dynamically created image from photon database, maybe it's just a simple FITS image, maybe we've decided to support HDF.  The SIA service knows how to create the appropriate cutout in the native terms of the image.
>>
>> In the great majority of cases we will be talking about FITS image. We're going to return a URL to the user that will enable them to retrieve the specific cutout.  In the case of the FITS image there's no reason why this can't simply specify the axes' ranges and if we do that we have a lot of advantages:
>>   - This kind of generic subsetting capability is already implemented -- in lots of places I suspect. E.g., using the FITS filter capabilities of CFITSIO both array and table subsetting are fully supported in commands like FCOPY.  I think IRAF has similar features.  STILTS has lots of filtering of tables including both FITS and VOTables.  A site could just have a CGI script that allowed users to read a local FITS file, input the filter parameters, and write the output to the web.  This could be a few lines of Perl calling FCOPY.
>>   - Adding subsetting to SSA or SLAP or any other service that is going to index FITS data becomes straightforward.
>>   - The subsetting capability is usable outside of the SxA context to do anything users want.  E.g., maybe I want to get a subset of rows from a photon list.  I think this would be immensely useful to have generally, and since I think it's relatively easy to do (see FCOPY above) separating it out makes it easy to promote -- even to sites that have no intention of providing SIA services (perhaps they don't have images!).
>>
>> And it makes it very clear what defines SIA: the use of the image data model.  SSA is defined by use of the spectral data model. Implementations of these may very well use TAP in part, and I'm suggesting that they should also use a model independent AccessData protocol to manipulate results (e.g., do cutouts).  Here I mean model-independent to suggest independence from any understanding of the semantics of the data (Image data model, Spectral data model, ...) although maybe it would appropriate for this protocol to be based upon a 'model' of abstract multi-dimensional arrays and tables which might then be specialized for the physical representations of these in FITS files and VOTables.
>>
>> A standard that defined how such data subsetting and manipulation should be done in an archive, and that was  easy to implement, would have a real chance for broad adoption -- there's a real problem with very large data files.   I think addressing this one area could be immensely valuable to the community.  If we keep it very small -- which I think separating it out from the SxA standard does, then I think we could make rapid progress here.
>>
>> 	Tom
>>
> Dr. Paul Harrison
> JBCA, Manchester University
> http://www.manchester.ac.uk/jodrellbank
>


More information about the dal mailing list