Multi-dimensional Data Access minimal requirements

Tue Mar 11 14:03:33 PDT 2014

On Tue, 11 Mar 2014, Tom McGlynn wrote:

> So is the DataAccess method independent of the SIA in the sense that I could 
> implement it by itself?  Could I use it for arbitrary images where I mean 
> image in the FITS sense of an n-dimensional rectangle? Or it it tied to the 
> data mode?  Enabling our archives with this capability alone would be 
> extraordinarily valuable.

SIA means "simple image access" - it is supposed to provide actual image
access (access directly into an image), not merely discovery, or we
might as well just do a database query with TAP.  But yes, queryData and
accessData are separate capabilities, and it would be possible to
implement the accessData capability by itself.

> I'm envisaging something like supporting
>
> http://myarchive.data.edu?file=localImageName.fits?filter=subset&ELEMENT=0&CRPIX1=1000&NAXIS1=300&CRPIX2=2000&NAXIS2=500&compress=f
>
> to get the subset x=1000:1299, y=2000:2499 from the primary HDU.

This is equivalent to a pixel-space cutout via accessData.
Pixel-space cutouts are like the image cutouts in CFITSIO or IRAF
(or Python NumPy for that matter).

> I've not been able to track down any details on AccessData to see if that's 
> what's intended.

See section 3.3.2 of the Cube Whitepaper for the original concept for
AccessData (http://wiki.ivoa.net/internal/IVOA/SiaInterface/CubeDataInVO.pdf)

Very briefly, the full access model is:

     [data] -> filter -> wcs-transform -> image-cutout -> function -> [out]

where all terms are optional.  A simple cutout is just the filter
term, or could optionally be done with the image-cutout term if one
knows the image geometry and can work in pixel rather than world
space.  The image-cutout term can also be used for things like
block averaging or projecting the n-D image an axis.

Something like reprojection is a WCS-transform.

More complex things like computing moments are done with the function
term.

Note that in general, the input [data] does not have to be a FITS image,
or a pixellated image at all, i.e., it could be some other data format,
or event an event or visibility dataset.

Your table example is probably better handled by TAP, unless you are
referring to an image stored as a FITS binary table in which case SIA
would work fine.

 	- Doug

> 	Tom
>
> Douglas Tody wrote:
>> On Tue, 11 Mar 2014, Tom McGlynn wrote:
>> 
>>> However it suggests to me that it would make sense to have this
>>> separated into two protocols.  The SIA protocol would take the
>>> requested region and WCS of the image and calculate the actual image
>>> subset that meets this requirement at whatever level the standard
>>> and implementation decided upon.
>>> A second lower level protocol (Maybe this is the DataAccess layer.
>>> For the nonce I'll assume so) is invoked to actually get the subset.
>> 
>> This is exactly what was specified in the Sept-2013 SIAV2 draft, and
>> what is currently implemented in our VAO SIAV2 prototype.
>>
>>      -   The SIAV2 query, with mode=cutout (or mode=match), "take[s] the
>>          requested region and WCS of the image and calculate[s] the
>>          actual image subset that meets this requirement".
>>
>>      -   The access reference URL in the query response is a call back to
>>          the service (actually to the accessData service request not a
>>          different protocol, but it amounts to the same thing).  The
>>          information required to generate the virtual image is passed
>>          internally to the accessData capability.
>>
>>      -   When the client GETs the virtual image, the accessData method
>>          "is invoked to actually get the subset".
>> 
>> If the client knows enough about the image to be accessed (e.g, via a
>> prior queryData and/or getMetadata) then it can instead just call the
>> accessData request on the desired image.  This is how we do things like
>> interactive image cube visualization and analysis, where the same image
>> is repeatedly accessed.  It is too low level however for simple
>> automated virtual image generation.
>>
>>          - Doug
>> 
>> 
>> On Tue, 11 Mar 2014, Tom McGlynn wrote:
>> 
>>> To my mind there is a bit of a confusion here between what should be
>>> two levels of the interface.  I'm not sure what AccessData is and it
>>> may be that its development is addressing my concerns.  If not and
>>> all of this is out of left field I will return to my box....
>>> 
>>> It seems like the the data interfaces, SIA, SSA, whatever, talk to
>>> users in the terms of the data model on which they are based.  So in
>>> SIA we specify some set of geometric terms that describe a region in
>>> the sky that we wish for a cutout.
>>> 
>>> When we retrieve a cutout we are going to retrieve a subset of a
>>> larger image, where as I understand it we are limiting the subset to
>>> n-dimensional sub-box of the original image.
>>> 
>>> There is no requirement that the specification of the cutout in the
>>> SIA request have any relationship to the coordinate system,
>>> orientation, ... of the actual data (again as I understand it).
>>> 
>>> So, the request might be a simple RA/Dec box a couple of arcminutes
>>> on a side.  But if the image being cutout is oriented in Galactic
>>> coordinates, then the service will not provide data in the requested
>>> box to the user.  As I understand it the intent is that what will be
>>> returned is the smallest box (in Galactic coordinates) which fully
>>> includes the requested region.
>>> 
>>> In this framework supporting a circular region makes a lot of sense
>>> to me.  I suspect it's easier to calculate the appropriate bounds
>>> for true circular regions (i.e., circles on the sphere not some
>>> particular projection plan) than it will be for an RA/Dec rectangle.
>>> 
>>> However it suggests to me that it would make sense to have this
>>> separated into two protocols.  The SIA protocol would take the
>>> requested region and WCS of the image and calculate the actual image
>>> subset that meets this requirement at whatever level the standard
>>> and implementation decided upon.
>>> 
>>> A second lower level protocol (Maybe this is the DataAccess layer.
>>> For the nonce I'll assume so) is invoked to actually get the
>>> subset.  Any service implementing a cutout SIA service would be
>>> required to implement (or provide access to someone who implements)
>>> the DA protocol where can specify at the data level that one wishes
>>> a particular extraction of a given file.  The DA level knows nothing
>>> about WCS's or data models or such.  In FITS terms the only thing it
>>> cares about are the NAXISn keywords (well it would update the
>>> CRPIX's too I guess).  Say the DA level only supports simple subsets
>>> of arrays.  That handles our image subsetting of course, but it can
>>> also be used to extract regions in a spectrum or rows in a table.
>>> Upgraded versions of the DA could support skips between pixels
>>> returned, or averages or other filters defined purely in terms of
>>> the array indices.
>>> 
>>> More importantly, for me, the DA could be accessible directly
>>> without going through the SIA service.  Now if some scientist wants
>>> to get subsets and she happens to know where the data is she can
>>> just grab the subset directly. Providing a generic capability of
>>> downloading subsets of data  -- regardless of whether we've attached
>>> them to some lovely data model -- would be an invaluable
>>> contribution to the community.
>>> 
>>> Note, by the by, that my vision of a data access service isn't
>>> limited to FITS data.  An implementation could get a row subset of a
>>> VOTable just as easily.  Manifestly the protocol would need to be
>>> able to deal with multiHDU FITS data and multi-table VOTables, but
>>> that's easy enough to do.  Any it would be fine for a service to
>>> respond with 'I don't know how to do that' when invoked
>>> inappropriately.
>>> 
>>> Just my two cents...
>>> 
>>>
>>>     Tom
>>> 
>>> Ray Plante wrote:
>>>> On Tue, 11 Mar 2014, Robert J. Hanisch wrote:
>>>>> Remember, too, we are talking about the query that gets sent from the
>>>>> interface to a service.  SIAP queries will most likely result from
>>>>> web
>>>>> forms or programmatic interfaces in which user-friendly inputs can be
>>>>> specified.  So we need not make the range specification so
>>>>> dumbed-down as
>>>>> "circle".
>>>> 
>>>> Recalling that this question arose from the requirement for supporting
>>>> simple cut-outs, we should clarify where the use of circle/range would
>>>> appear.  I'm gathering from Pat's response that this something that
>>>> goes specifically into AccessData, and would not affect image
>>>> searching, which is handled by SIAv2.  Is this correct?
>>>> 
>>>> This reminds me of another related question.  SIAv1 had the feature
>>>> that allowed a service to bill itself specificially as a "cutout"
>>>> service, which meant that the search queries would specifically return
>>>> images that are cut-outs matching (as close as possible) the search
>>>> region.  Is this expected to be allowed/supported by SIAv2?
>>>> 
>>>> cheers,
>>>> Ray
>>>> 
>>> 
>