Multi-dimensional Data Access minimal requirements
Tom McGlynn
Thomas.A.McGlynn at nasa.gov
Wed Mar 12 09:52:47 PDT 2014
OK... We're now all agreed that we want to be able to specify
circular cut-out regions... I'd like to return to my sense of why
this came up, answer Doug's question as to what distinguishes SIA from
TAP (though he likely intended it rhetorically), and to suggest that
it makes sense to consider the AccessData (or a part of it at least)
as a full and separate interface used by but independent of SIA.
I think this confusion comes about because we instinctively know that
an image cut-out is not a circle. The act of cutting out the image
creates a rectangle (or has traditionally, HEALPix and HTM pixel lists
could be a counter example). So when we say we are specifying a
circular cutout what we really mean is that we are providing inputs to
a procedure which given an image instance will calculate the actual
cutout parameters and if requested do the cutting out.
The part of this that belongs within the SIA standard is suggested by
the previous sentence: Where do we refer to images specifically? When
we need to calculate the actual cutout parameters. SIA handles all
the aspects of the retrieval that require understanding images. So
anything involving WCS projections or coordinates or whatever should
be done within the SIA service. After the SIA service has determined
that a particular image should be returned in the results, then if it
supports subsetting and the user has requested a subset, it should
know enough to immediately calculate the appropriate limits and be
able to express them in terms of the structural parameters of that
particular image. Maybe it's some kind of dynamically created image
from photon database, maybe it's just a simple FITS image, maybe we've
decided to support HDF. The SIA service knows how to create the
appropriate cutout in the native terms of the image.
In the great majority of cases we will be talking about FITS image.
We're going to return a URL to the user that will enable them to
retrieve the specific cutout. In the case of the FITS image there's
no reason why this can't simply specify the axes' ranges and if we do
that we have a lot of advantages:
- This kind of generic subsetting capability is already implemented
-- in lots of places I suspect. E.g., using the FITS filter
capabilities of CFITSIO both array and table subsetting are fully
supported in commands like FCOPY. I think IRAF has similar features.
STILTS has lots of filtering of tables including both FITS and
VOTables. A site could just have a CGI script that allowed users to
read a local FITS file, input the filter parameters, and write the
output to the web. This could be a few lines of Perl calling FCOPY.
- Adding subsetting to SSA or SLAP or any other service that is
going to index FITS data becomes straightforward.
- The subsetting capability is usable outside of the SxA context to
do anything users want. E.g., maybe I want to get a subset of rows
from a photon list. I think this would be immensely useful to have
generally, and since I think it's relatively easy to do (see FCOPY
above) separating it out makes it easy to promote -- even to sites
that have no intention of providing SIA services (perhaps they don't
have images!).
And it makes it very clear what defines SIA: the use of the image data
model. SSA is defined by use of the spectral data model.
Implementations of these may very well use TAP in part, and I'm
suggesting that they should also use a model independent AccessData
protocol to manipulate results (e.g., do cutouts). Here I mean
model-independent to suggest independence from any understanding of
the semantics of the data (Image data model, Spectral data model, ...)
although maybe it would appropriate for this protocol to be based upon
a 'model' of abstract multi-dimensional arrays and tables which might
then be specialized for the physical representations of these in FITS
files and VOTables.
A standard that defined how such data subsetting and manipulation
should be done in an archive, and that was easy to implement, would
have a real chance for broad adoption -- there's a real problem with
very large data files. I think addressing this one area could be
immensely valuable to the community. If we keep it very small --
which I think separating it out from the SxA standard does, then I
think we could make rapid progress here.
Tom
Douglas Tody wrote:
> On Tue, 11 Mar 2014, Ray Plante wrote:
>
>> Note that what you are calling "DataAccess" is the purpose of the
>> AccessData specification (now being written up by Pat). And, yes, it
>> can be implemented independently of SIAv2. In previous versions of
>> SIAv2 (as Doug mentioned), the access-data capabilities was not yet
>> split out.
>
> The queryData and accessData capabilities have always been separately
> implementable and callable service capabilities. Both are still
> image-specific capabilities required for the full range of image access
> functionality. What has changed is that they are now to be described in
> separate specifications, in part due to the complexity of accessData for
> things like advanced cube data access.
>
> Note, these capabilities can still be integrated together into a single
> image data access service. The capabilities would still be separately
> callable, i.e., accessData could be used by itself if desired.
>
> See the older email excerpted below.
>
> - Doug
>
>
> ----
> From patrick.dowler at nrc-cnrc.gc.ca Tue Jan 7 12:52:59 2014
> Date: Tue, 7 Jan 2014 12:06:53 -0800
> From: Patrick Dowler <patrick.dowler at nrc-cnrc.gc.ca>
> To: "dal at ivoa.net" <dal at ivoa.net>
> Cc: Douglas Tody <dtody at nrao.edu>
> Subject: Re: WD-SIA-2.0, going forward
>
> On 07/01/14 10:21, Douglas Tody wrote:
>> My summary at this point would be that we have agreement on adding
>> additional query parameters, and on an integrated SIA service, but one
>> which is composed of separate capabilities described in more than one
>> document for logistical reasons
>
> If by "integrated SIA service" you mean that the implementer supports
> multiple capabilities in a single service, then yes we agree: an
> implementer can do that. The multiple capabilities could include:
>
> * query
> * metadata
> * datalink
> * access data
>
> for which the current plan is 3 documents. I expect the AccessData
> document to undergo several revisions as we include more access
> features and it may also be the right place to concentrate on
> self-describing custom services, in which case it could be somewhat
> heavy/abstract reading (we have to keep that reasonable to support
> take-up).
>
> And by integrated, I also include the flexibility feature that an
> implementer can implement a single web service resource and just use
> different REQUEST values *or* they can implement different services
> (as described in a previous message).
>
> [...]
More information about the dal
mailing list