gzipped images in SIAP 1.0 (fwd)

Guy Rixon guyrixon at gmail.com
Wed May 30 23:23:41 PDT 2007


Hi folks,

it seems to me that we are conflating three uses cases:

   1. Service only has compressed images, won't serve anything else,  
and client needs to be told this.

   2. Images are available uncompressed, but client wants them  
compressed in delivery to save bandwidth.

   3. Images are available uncompressed, but client wants several  
images grouped into a zip set for delivery (and
        possibly compressed in the process).

I think these need different solutions.

For case 1, can we add a Content-Encoding column to the queryData  
result? This would denote encoding
and compression in the same way as the HTTP header of the same name,  
except that we would need to
add some way to denote tile compression. If we knew that all the  
accrefs were HTTP URLs we wouldn't
need the column as we could  just look at the HTTP headers; but some  
accrefs may lead to FTP servers.

Case 2 is Transport-Encoding, a matter for the transport protocol. If  
an accref is HTTP, then the implementations
can negotiate to do this and IVOA doesn't need to rule on it. If the  
accref is FTP then no transport
compression is available.

Case 3 is the interesting one. If the SIA is to do it then we need a  
nice long argument about the details.
However, VOSpace is already aiming to do this: it's a planned  
(optional) feature on VOSpace 1.1 to
offer a view of a directory that is a zip of all the files in the  
directory. We're also planning to allow
data staging from an SIA to VOSpace. If we do the latter, then lets  
use the VOSpace feature to zip
the files, rather than duplicating it .

Note that #2 works with existing SIAP. #1 could be done now but the  
Content-Encoding column
would be non-standard, so we should revise the protocol to  
standardize it. #3 needs major changes
to the protocol (but changes we're planning anyway).

Cheers,
Guy

On 31 May 2007, at 00:42, Doug Tody wrote:

> Hi Pat -
>
> This issue has been discussed off and on for a long time now.  There
> are attractions to handling the format stuff at the getData stage;  
> some
> implementations actually do this already, e.g.,
>
>     http://webtest.aoc.nrao.edu/ivoa-dal/JhuProxySsap?
>     REQUEST=getData&FORMAT=votable&
>     PubDID=ivo%3A%2F%2Fjhu%2Fsdss%2Fdr5%2380442261170552832
>
> (which is a real acref, and you can replace the "votable" with other
> formats such as "native" or "csv" and get the data today).  Maybe we
> will go this way in the future, but it is harder than it seems to
> formally standardize in a rigorous fashion, as a service can serve any
> kind of data, the available formats could differ depending upon the
> image or data collection, etc.  It can be done, but it requires more
> complexity (probably one would want to replace Access.Format in the
> query response with a list of some sort, and replace the acref with
> a template, hence this is the old issue of the templated access ref;
> it could work but has its own issues which we won't go into here).
>
> Just listing the available formats which match the query is a simple
> technique which always works.  The query response gets annoyingly
> bloated, but it is simple and it works, and handles all the odd cases
> (plus now we know how to compress it!).  We can use a MultiFormat
> Association to describe the multiple formats available for a dataset
> in the QR, and the Association mechanism used is general and can deal
> with any other type of logical association.
>
> I think we may want to consider promoting getData to a real operation
> at some point, but a simple opaque access reference URL has its
> advantages as well.
>
> 	- Doug
>
>
> On Wed, 30 May 2007, Patrick Dowler wrote:
>
>> On Wednesday 30 May 2007 14:59, Doug Tody wrote:
>>> The first question asked was what SIAP 1.0 intended, and this is  
>>> what
>>> I have addressed above.  SIAP has always worked this way, and I am
>>> surprised that anyone is confused.
>>
>> I don't think the FORMAT thing in SIA 1.0 is confusing. For most  
>> people, I
>> expect, the collection they are serving is in one format or they  
>> decide to
>> serve one format, so if a query comes in asking for GIF and they  
>> only have
>> FITS, they return an empty VOTable (this is what we do).
>>
>> We have also toyed with on-the-fly conversion, which if deployed  
>> would mean we
>> could respond "yes" to any format and do the conversion in the  
>> getData stage.
>> This and the above approach are both kind of "a priori" knowledge  
>> of the
>> possible types, without doing a DB query.
>>
>> We also have in some cases pre-computed preview images in graphics  
>> format, but
>> I always found it kind of ugly for the observation catalog to know  
>> about
>> these different formats. Essentially, trying to support this in a  
>> really
>> general way brings in a very large database denormalisation problem,
>> especially when you have set of systems for storing all the files  
>> and another
>> for enabling the querying of metadata. I prefer to leave the file  
>> type stuff
>> for the delivery mechanism to handle (eg the getData method) since  
>> that can
>> be largely independent of the querying (except in this case of  
>> formats).
>>
>> if one had the on-the-fly conversion in place, then pre-computed  
>> previews (for
>> example) could be just an optimisation of the retrieval process,  
>> which is
>> again nice and clean and simple. But in general I think this  
>> FORMAT thing is
>> an optimisation that introduces some complexity to implementing  
>> the service
>> in some cases. It is easy enough to avoid that by reducing the  
>> scope of the
>> service (eg we have fits and jpg but only deliver fits because we  
>> don't
>> always have jpg on hand) but that does reduce overall value.
>>
>> summary: it's more complicated than it looks, but not confusing :-)
>>
>>



More information about the dal mailing list