World coordinates cutouts/versus pixel cutouts Re: Multi-dimensional Data Access minimal requirements

Mon Mar 17 02:08:50 PDT 2014

Hi Tom, all

      I fully agree that having a pixel cutout facility EVENTUALLY would 
be great exactly for the use cases you have been emphasizing.
But the problem whe have for the scheduling is the  following:
      We want to go to recommendation for the minimal requirements in 
our "Cube access caravane" ASAP , as close to the may interop as 
possible.  Each piece in version 1.0 has to be consistent with the 
status of the others (I am speaking of SIAV2, DataLink and AccessData 
and of course ObsTAP which is already there. ImageDM is in the background)
       * So for the data discovery we have two paths :
                          - ObsTAP
                          - SIAV2 1.0
       Actually the difference is the first is TAP based and accepts 
ADQL queries (and only that)
           The   second, (in its initial version it will not have any 
virtual data discovery) is not requiring the TAP infrastructure and 
accepts only PQL queries
       The description of datacubes available in the Query response is 
the ObsCore view in both cases.
        This view doesn't include anything about Mapping or the pixel 
structure of the dataset.
         The basic requirement doesn't ethen imply that the Cube is 
stored in the same environment than the database. The ObsCore metadata 
could be built from an observation log, exempli gratia, and the only 
thing you are required to "own" is the URL where you can find the dataset.

       * The Linking to AccessData (cutout and later other) 
functionalities is done via declaring the service in the Query response 
or in a DataLink response.

       * AccessData is the service or resource providing the cutout. 
This service has to provide direct access to the dataset. It is in 
direct contact with them.

             The end of the discussion can ne found  below.
Le 14/03/2014 20:03, Tom McGlynn a écrit :
> Hi Francois,
>
> There is no doubt that there needs to be a capability that is able to 
> translate subset parameters expressed in celestial coordinate to a 
> parametrization in pixel space.  And we clearly need to be able to 
> extract subsets, not just describe them.
>
> What I am suggesting is that these are potentially separable 
> capabilities and that there could be substantial benefits to doing 
> this separation.  Since I frequently get confused by the abstract 
> discussion I'd like to illustrate this with an example.
>
> Let's start with my example of an SIA request for a distant galaxy 
> where we want a 5" cutout region.
>
> The user invokes a request something like:
>
>    http://host/getSIA.pl?POS=167,57&SIZE=0.0013888&CUTOUT=true
>
> (I'm not worrying too much about the details here).
>
> Hopefully the user gets back a VOTable with one or more rows.  For 
> each row there is URL that gets the cutout the user requested.  The 
> returned URL implements the standard way we present cutouts to the user.
>
> The question is does this URL look like?
>
> Is it
>    http://host/getSIACutout?file=baseFile&POS=167,57&SIZE=0.00138888
>
> or
>
>    http://host/getFitsSubset?file=baseFile&XR=1000..1020&YR=1400..1420
>
> ?
>
>
> In the first case, all the initial SIA service request does is pass 
> the subsetting parameters to some other service after it ascertains 
> that there is coverage for the particular file.  The program 
> getSIACutout knows something about images and so it's going calculate 
> the appropriate axes ranges and then return the subset to the user. 
> One way that it could do this (and this overall approach would be fine 
> with me) is to calculate the actual image subranges and then do a 
> redirect to the appropriate call to the getFitsSubset in the second 
> choice.
>
> In the second approach it is the explicit responsibility of the SIA 
> service to do the image-based calculation right away.  This makes 
> sense to me since then we have the Image service handling the 
> image-based calculations and returning something that is now usable in 
> contexts that don't necessarily understand images and WCS's as such. 
> We've normalized the code so that all the image stuff happens in the 
> same place.  But if we want to have an intermediate subsetting layer 
> that converts from WCS to pixels I can live with that.
>
> What I think is a bad idea is tightly coupling the calculations of the 
> actual data subset range with the extraction of that range from the 
> data files, i.e., having getSIACutout directly returning the cutout 
> FITS file.
>
This, I don't really understand. When you read the dataset for the 
extraction you can also read the header and make the WC to Pixel 
transformation for the only concerned dataset. That's exactly the way 
our CDS prototype demonstrated in Heidelberg and Hawaï is working.
On the other side ObsTap doesn't provide anything to manage the 
necessary information for the mapping and if you want SIAV2 to manage 
that it is a strong additional requirement for version 1.0. Because you 
have to manage this information and store it somewhere (in a database or 
whatever) independantly from datasets. This is needed anyway if we want 
the "metadata" resource in SIAV2 later anyway, so it will come,  but for 
now I don't think we can force people to implement that in SIAV2 1.0 to 
provide pixel cutout URL only.

  As a matter of conclusion, we probably can add the pixel syntax 
easilly in the AccessData rec, but it will probably be rather unusefull 
if people do not implement some complexity in their SIAV2 service.

Best regards
François
> I've said this above, so to reiterate: if we define the actual 
> extraction step as a separately implementable interface then not only 
> can we immediately think about supporting subsetting in VO interfaces 
> other than SIA, we free our community to use subsetting however they 
> would like it.
>
>
> Even if we just consider images this would be very useful.
>
> E.g., a few years back I created some mosaics using ROSAT PSPC data. 
> For some I wanted to try to maximize the resolution which degrades 
> rapidly offcent for the PSPC.  If I was doing something like this with 
> the PSPC images I could just request the center fraction of the image 
> with a given pixel boundary.  Don't need to worry about WCS, just the 
> fixed pixel locations.  There are lots of cases where the actual pixel 
> locations are important for a given set of images.
>
> For reasons that I'm not aware of some GALEX images are provided with 
> a circular field of view within the square image frame where the FOV 
> is not centered.  You can get the pixel center from the database. If I 
> wanted to retrieve more centered GALEX data I could have used this 
> information to get a nice subset where all of my GALEX data would 
> actually be the same rather than wandering over the image.
>
> And we've unlocked users to use the subsetting when they already have 
> the ability to do the image calculations.  There's lots of WCS-aware 
> software out there.  What isn't there is the ability to extract only a 
> subset of a file over the web.  We can make it easy for lots of tools 
> to take full advantage of the Web to extract only what they need over 
> the web.
>
> If we have a simple and generic capability users can build lots on 
> non-image tools that extract data from photon lists, time series, 
> object lists, anything and everything that's described by tables or 
> arrays.
>
> And last but not least, for the case of FITS and VOTables all the 
> software to do the extraction already exists and we just need to 
> define how it is to be invoked -- not a trivial task but one which is 
> easier if we limit the functionality rather than trying to support 
> semantic data models.
>
>     Tom
>
>
>
>
> François Bonnarel wrote:
>> Hi Markus, all
>> Le 14/03/2014 12:56, Markus Demleitner a écrit :
>>> On Fri, Mar 14, 2014 at 12:13:28PM +0100, François Bonnarel wrote:
>>>> Hi Paul, Tom, all,
>>>>      Of course it would be nice to have this functionnality and it has
>>>> been discussed in the DAL group vor AccessData version .... 1.1.
>>>> While it may seem simpler, (and it is as far as syntax definition is
>>>> concerned maybe) it is actually not true. Because if it is for a
>>>> pixel cutout query to have any scientific value, some a priori
>>>> knowledge (even rough) of the Mapping between the pixels and the
>>>> world coordinates. This knowledge has to be used either by a client
>>>> or by the service itself to prepare usefull pixel cutouts queries.
>>> Knowing full well I'm getting on everyone's nerves, I'd still like to
>>> point out that in the structured parameters approach --
>>>
>>> http://www.ivoa.net/pipermail/dal/2013-December/006602.html, chapter
>>> 6.1 "common parameters"
>>>
>>> --, pixel-wise cutouts aren't in any way special and are cheap to
>>> implement for both clients and servers (although, as I said, I'd now
>>> use PIX(n) and PIX(n)_WIDTH, although for pixels, MIN and MAX are
>>> just as appropriate).
>> Sure it's possible to define easilly pixel cutouts syntax. An
>> alternative to the parameters you arev propsing is one parameter
>> (PIXCUTOUT or whatever) with cfitsio syntax which is very general for
>> all n-d arrays of values. But my point here was about the availability
>> of mapping information necessary to build usefull pixel limitations.
>>
>> Cheers
>> François
>>> François is right, though, that for most interesting use cases,
>>> operating on pixel coordinates requires knowledge of the mapping.  At
>>> least for common FITS images, that's again easy for clients and
>>> servers with the proposed mechanism, too: clients would use KIND
>>> (trivial to implement) and get back the FITS header they can already
>>> interpret.
>>>
>>> Not quite as general as the full DM approach, but very cheap measured
>>> in implementation effort and, I would claim, effective in terms of
>>> "Wow!" potential in our clients' users.
>>>
>>> Cheers,
>>>
>>>           Markus