World coordinates cutouts/versus pixel cutouts Re: Multi-dimensional Data Access minimal requirements

Fri Mar 14 12:03:06 PDT 2014

Hi Francois,

There is no doubt that there needs to be a capability that is able to 
translate subset parameters expressed in celestial coordinate to a 
parametrization in pixel space.  And we clearly need to be able to 
extract subsets, not just describe them.

What I am suggesting is that these are potentially separable 
capabilities and that there could be substantial benefits to doing 
this separation.  Since I frequently get confused by the abstract 
discussion I'd like to illustrate this with an example.

Let's start with my example of an SIA request for a distant galaxy 
where we want a 5" cutout region.

The user invokes a request something like:

    http://host/getSIA.pl?POS=167,57&SIZE=0.0013888&CUTOUT=true

(I'm not worrying too much about the details here).

Hopefully the user gets back a VOTable with one or more rows.  For 
each row there is URL that gets the cutout the user requested.  The 
returned URL implements the standard way we present cutouts to the user.

The question is does this URL look like?

Is it
    http://host/getSIACutout?file=baseFile&POS=167,57&SIZE=0.00138888

or

    http://host/getFitsSubset?file=baseFile&XR=1000..1020&YR=1400..1420

?

In the first case, all the initial SIA service request does is pass 
the subsetting parameters to some other service after it ascertains 
that there is coverage for the particular file.  The program 
getSIACutout knows something about images and so it's going calculate 
the appropriate axes ranges and then return the subset to the user. 
One way that it could do this (and this overall approach would be fine 
with me) is to calculate the actual image subranges and then do a 
redirect to the appropriate call to the getFitsSubset in the second 
choice.

In the second approach it is the explicit responsibility of the SIA 
service to do the image-based calculation right away.  This makes 
sense to me since then we have the Image service handling the 
image-based calculations and returning something that is now usable in 
contexts that don't necessarily understand images and WCS's as such. 
We've normalized the code so that all the image stuff happens in the 
same place.  But if we want to have an intermediate subsetting layer 
that converts from WCS to pixels I can live with that.

What I think is a bad idea is tightly coupling the calculations of the 
actual data subset range with the extraction of that range from the 
data files, i.e., having getSIACutout directly returning the cutout 
FITS file.

I've said this above, so to reiterate: if we define the actual 
extraction step as a separately implementable interface then not only 
can we immediately think about supporting subsetting in VO interfaces 
other than SIA, we free our community to use subsetting however they 
would like it.

Even if we just consider images this would be very useful.

E.g., a few years back I created some mosaics using ROSAT PSPC data. 
For some I wanted to try to maximize the resolution which degrades 
rapidly offcent for the PSPC.  If I was doing something like this with 
the PSPC images I could just request the center fraction of the image 
with a given pixel boundary.  Don't need to worry about WCS, just the 
fixed pixel locations.  There are lots of cases where the actual pixel 
locations are important for a given set of images.

For reasons that I'm not aware of some GALEX images are provided with 
a circular field of view within the square image frame where the FOV 
is not centered.  You can get the pixel center from the database. If I 
wanted to retrieve more centered GALEX data I could have used this 
information to get a nice subset where all of my GALEX data would 
actually be the same rather than wandering over the image.

And we've unlocked users to use the subsetting when they already have 
the ability to do the image calculations.  There's lots of WCS-aware 
software out there.  What isn't there is the ability to extract only a 
subset of a file over the web.  We can make it easy for lots of tools 
to take full advantage of the Web to extract only what they need over 
the web.

If we have a simple and generic capability users can build lots on 
non-image tools that extract data from photon lists, time series, 
object lists, anything and everything that's described by tables or 
arrays.

And last but not least, for the case of FITS and VOTables all the 
software to do the extraction already exists and we just need to 
define how it is to be invoked -- not a trivial task but one which is 
easier if we limit the functionality rather than trying to support 
semantic data models.

	Tom

François Bonnarel wrote:
> Hi Markus, all
> Le 14/03/2014 12:56, Markus Demleitner a écrit :
>> On Fri, Mar 14, 2014 at 12:13:28PM +0100, François Bonnarel wrote:
>>> Hi Paul, Tom, all,
>>>      Of course it would be nice to have this functionnality and it has
>>> been discussed in the DAL group vor AccessData version .... 1.1.
>>> While it may seem simpler, (and it is as far as syntax definition is
>>> concerned maybe) it is actually not true. Because if it is for a
>>> pixel cutout query to have any scientific value, some a priori
>>> knowledge (even rough) of the Mapping between the pixels and the
>>> world coordinates. This knowledge has to be used either by a client
>>> or by the service itself to prepare usefull pixel cutouts queries.
>> Knowing full well I'm getting on everyone's nerves, I'd still like to
>> point out that in the structured parameters approach --
>>
>> http://www.ivoa.net/pipermail/dal/2013-December/006602.html, chapter
>> 6.1 "common parameters"
>>
>> --, pixel-wise cutouts aren't in any way special and are cheap to
>> implement for both clients and servers (although, as I said, I'd now
>> use PIX(n) and PIX(n)_WIDTH, although for pixels, MIN and MAX are
>> just as appropriate).
> Sure it's possible to define easilly pixel cutouts syntax. An
> alternative to the parameters you arev propsing is one parameter
> (PIXCUTOUT or whatever) with cfitsio syntax which is very general for
> all n-d arrays of values. But my point here was about the availability
> of mapping information necessary to build usefull pixel limitations.
>
> Cheers
> François
>> François is right, though, that for most interesting use cases,
>> operating on pixel coordinates requires knowledge of the mapping.  At
>> least for common FITS images, that's again easy for clients and
>> servers with the proposed mechanism, too: clients would use KIND
>> (trivial to implement) and get back the FITS header they can already
>> interpret.
>>
>> Not quite as general as the full DM approach, but very cheap measured
>> in implementation effort and, I would claim, effective in terms of
>> "Wow!" potential in our clients' users.
>>
>> Cheers,
>>
>>           Markus