gzipped images in SIAP 1.0 (fwd)

Patrick Dowler patrick.dowler at nrc-cnrc.gc.ca
Wed May 30 10:28:27 PDT 2007


On Tuesday 29 May 2007 11:16, Doug Tody wrote:
> On Tue, 22 May 2007, Roy Williams wrote:
> > I would like to know if it is possible for a compliant SIAP to return
> > *compressed* FITS images.
> Hence if the client app asks for FORMAT=image/fits, it is illegal
> for it to end up with a GZIP-compressed FITS file.  However, it is
> ok for the service to return a compressed file, so long as this is
> handled transparently at the level of the HTTP protocol.  That is,
> when we fetch the data (as others have already pointed out), the HTTP
> response headers can be
>
>     Content-Type: image/fits
>     Content-Encoding: gzip
>
> in which case the HTTP-level client code should transparently unzip
> the byte stream as it reads the data.  NOTE though, it may not be
> advisable for the service to do this unless the HTTP-level client
> code issues an Accept-Encoding header which explicitly states that
> the client code can handle optional stream-level compression.

This is not consistent with the HTTP standard from my understanding. 
Transport-level encoding (that which should be transparent to applications, 
eg http libraries should decode this) uses the Transport-Encoding header; if 
this is used, Content-Length refers to the non-encoded length. 
Content-Encoding is just a server side declaration that the resource in 
question is encoded but this does generally become visible to the application 
(eg http libraries should not decode this).

Neither Transport-Encoding nor Content-Encoding modify the Content-Type 
(mimetype) of the resource. 

> On Tue, 22 May 2007, Roy Williams wrote:
> > -- Does anyone remeber the intention of the comma-delimited list of MIME
> > types?  Should my code look for "application/x-gzip,image/fits"
> >   Or maybe the other way around?
>
> If this refers to the FORMAT query parameter, which takes a list
> of MIME types, then this tells the service to describe, in the
> query response, only images with the given MIME types.  If it can't
> generate the requested MIME type it should return nothing.  Hence if
> the client asks for image/jpeg and the client cannot return a JPEG,
> the response will be a null query (REQUEST_STATUS=OK and no data).

This bit appears on the surface to be useful, and I think somewhat in conflict 
with everything else I and others have posted. Will have to think about it... 
it is basically moving the content-type negotiation from where it naturally 
goes (the download stage) to the query stage, which seems much more efficient 
in the case where all formats are not available (probably the typical case).

>
> As I mentioned earlier, SSAP adds a new parameter COMPRESS, which
> attempts to deal more explicitly with the issue of whole-file
> (gzip-style) compression.  Since this is a request parameter,
> this refers to compression *as seen by the client application*.
> If the client enables compression (it is disabled by default)
> then the service is permitted to return compressed files, or not,
> as it sees fit.  If the client app enables compression it has to
> be able to deal with the returned data optionally being returned
> using whole-file compression.  For this to work we have to limit the
> compression options to only widely-implemented algorithms such as gzip.
>
> In this case the data product itself (the Content-Type) is compressed,
> independent of what is happening to the data stream at the HTTP level.
> What the returned MIME type should be is not entirely clear: it could
> be application/x-gzip or maybe something like image/fits;encoding=gzip.

No, this must be (assuming the 2d image case, otherwise application/fits):

Content-Type=image/fits
Content-Encoding=gzip

gzip is not a mimetype, despite the horrible misuse of mimetypes on the net at 
large :(

> This feature allows compressed files to be passed through the protocol
> unchanged, and manipulated by applications in compressed form.  I think
> generic whole-file compression of this sort is a separate issue from
> something like tiled images with Rice compression; the latter is an
> internal feature of FITS.  We might enable it at the protocol level
> in the future, but currently there is no attempt to support this.

The way Rice compresion (a la cfitsio) works, such a resource must be declared 
as either

Content-Type=image/fits

IFF this compression to bintable form is part of the FITS standard (and the 
content is a 2 image) and it must be declared as

Content-Type=application/fits

if it is not in the FITS standard (eg it is an arbitray fits file). There is 
no "external" encoding involved (there could be, eg you could have rice on 
the inside and gzip on the outside) and despite that probably being dumb it 
is legal and easily characterised with the 2-3 http headers mentioned here.

> Probably COMPRESS has not had enough discussion - even though it
> has been in the SSAP specification for about a year.  What do folks
> think: do we need both HTTP-level transparent data stream compression,
> and actual dataset-level whole file compression?

IMO it is important to separate encoding from mimetype just as is done in the 
http protocol. Further, transport-encoding allow for some things that 
content-encoding does not (recently saw an article about this by one of the 
http spec authors - will try to dig up the link). It seems to me that if http 
(de facto protocol) covers some areas like content-type, content-encoding, 
content-length, negotiation, chunking and resume (Range and Content-Range), 
etc then we can save ourselves a lot of work and leave those things out of 
the DAL specs and allow services to use any and all of http that they find 
useful. I am not advocating that http be required or standard, but rather 
that http is basic enough that if it covers a topic we should not duplicate 
that in DAL specs. If another protocol is used to deliver data, clients and 
services have to both understand it and IMO adopting one means anything in 
there is legal (ftp, jparss, gridftp, transport-fu, etc) 

Specifically for this topic, that approach implies that:

- services can use whatever content-encoding they like (some services may 
allow negotiation or just declare what they have) -- COMPRESS would be 
removed from DAL specs; it is up to service providers to be useful -- as 
always :)

- a DAL standard specifies which mimetypes (content-type) are 
allowed/preferred; the file format standards (typically FITS for us) define 
what can be in the file... in the spectral data model case, the serialization 
formats are a separate standard from SSA (specifically, FITS+VOTable+SDM), 
which is fine and consistent with this

my 2c,

-- 

Patrick Dowler
Tel/Tél: (250) 363-6914                  | fax/télécopieur: (250) 363-0045
Canadian Astronomy Data Centre   | Centre canadien de donnees astronomiques
National Research Council Canada | Conseil national de recherches Canada
Government of Canada                  | Gouvernement du Canada
5071 West Saanich Road               | 5071, chemin West Saanich
Victoria, BC                                  | Victoria (C.-B.)



More information about the dal mailing list