gzipped images in SIAP 1.0 (fwd)

Patrick Dowler patrick.dowler at nrc-cnrc.gc.ca
Thu May 31 11:34:37 PDT 2007


Content-Encoding is a perfectly valid way for the getData method to say "this 
stuff is encoded with XYZ" (eg gzip). It does not change the Content-Type to 
apply an encoding, be it Content-Encoding or Transport-Encoding. Given that, 
SIA says that services can (should) emit certain Content-Type(s): image/fits 
plus some graphics formats. As it stands it does not say getData cannot 
deliver compressed data, but best practice in HTTP is to declare the 
Content-Encoding if you do so.

Both Content-Encoding and Transport-Encoding appear to have semantics of "once 
you undo this encoding, you will have something of the type specified by 
Content-Type. Neither is transparent, except that the design goal is that 
Transport-Encoding is intended to be on the wire and negotiated (eg. an issue 
for the http libraries on both ends, presumably).

In this light, tile compression within the FITS file is a different beast 
entirely. The data is not encoded in the sense of HTTP Content-Encoding 
because you can read it with a FITS reader without undoing the encoding (and 
application/fits is a correct type). Furthermore, if one interprets 
image/fits strictly as "2d image in a data block" and not "tiled image in a 
bintable" then it appears that tile compression changes the content-type 
(inconsistent with http principles), so it cannot be a content-encoding in 
the HTTP sense. If we interpret image/fits loosely as "logically a 2d image, 
encoded in FITS in one of these standard ways" then both forms are image/fits 
and there is no content-encoding (in the http sense). 

** No matter how you slice it, there is no Content-Encoding in the HTTP sense 
for tile compressed files. **

I see no way or reason that we can or should say anything about what encoding 
(compression) is used in any DAL service -- in the sense of what is returned 
via the getData method (typically via http and declared/described by the 
Content-Encoding http header). Saying that SIA 1.0 cannot return compressed 
data is placing a (large and) unnecessary limitation/burden on services. 

Pat

PS-Just as with FORMAT vs Content-Type, it is sub-optimal to have to initiate 
the getData method to find out you won't be able to understand/decode the 
data. However, putting both format and encoding in the query and/or response 
should be recognised as an optimisation -- maybe a very useful one -- but not 
required for correctness per se. It is almost like the course vs fine-grained 
registry debate in some sense...


On Wednesday 30 May 2007 23:23, Guy Rixon wrote:
> Hi folks,
>
> it seems to me that we are conflating three uses cases:
>
>    1. Service only has compressed images, won't serve anything else,
> and client needs to be told this.
>
>    2. Images are available uncompressed, but client wants them
> compressed in delivery to save bandwidth.
>
>    3. Images are available uncompressed, but client wants several
> images grouped into a zip set for delivery (and
>         possibly compressed in the process).
>
> I think these need different solutions.
>
> For case 1, can we add a Content-Encoding column to the queryData
> result? This would denote encoding
> and compression in the same way as the HTTP header of the same name,
> except that we would need to
> add some way to denote tile compression. If we knew that all the
> accrefs were HTTP URLs we wouldn't
> need the column as we could  just look at the HTTP headers; but some
> accrefs may lead to FTP servers.
>
> Case 2 is Transport-Encoding, a matter for the transport protocol. If
> an accref is HTTP, then the implementations
> can negotiate to do this and IVOA doesn't need to rule on it. If the
> accref is FTP then no transport
> compression is available.
>
> Case 3 is the interesting one. If the SIA is to do it then we need a
> nice long argument about the details.
> However, VOSpace is already aiming to do this: it's a planned
> (optional) feature on VOSpace 1.1 to
> offer a view of a directory that is a zip of all the files in the
> directory. We're also planning to allow
> data staging from an SIA to VOSpace. If we do the latter, then lets
> use the VOSpace feature to zip
> the files, rather than duplicating it .
>
> Note that #2 works with existing SIAP. #1 could be done now but the
> Content-Encoding column
> would be non-standard, so we should revise the protocol to
> standardize it. #3 needs major changes
> to the protocol (but changes we're planning anyway).
>
> Cheers,
> Guy
>
> On 31 May 2007, at 00:42, Doug Tody wrote:
> > Hi Pat -
> >
> > This issue has been discussed off and on for a long time now.  There
> > are attractions to handling the format stuff at the getData stage;
> > some
> > implementations actually do this already, e.g.,
> >
> >     http://webtest.aoc.nrao.edu/ivoa-dal/JhuProxySsap?
> >     REQUEST=getData&FORMAT=votable&
> >     PubDID=ivo%3A%2F%2Fjhu%2Fsdss%2Fdr5%2380442261170552832
> >
> > (which is a real acref, and you can replace the "votable" with other
> > formats such as "native" or "csv" and get the data today).  Maybe we
> > will go this way in the future, but it is harder than it seems to
> > formally standardize in a rigorous fashion, as a service can serve any
> > kind of data, the available formats could differ depending upon the
> > image or data collection, etc.  It can be done, but it requires more
> > complexity (probably one would want to replace Access.Format in the
> > query response with a list of some sort, and replace the acref with
> > a template, hence this is the old issue of the templated access ref;
> > it could work but has its own issues which we won't go into here).
> >
> > Just listing the available formats which match the query is a simple
> > technique which always works.  The query response gets annoyingly
> > bloated, but it is simple and it works, and handles all the odd cases
> > (plus now we know how to compress it!).  We can use a MultiFormat
> > Association to describe the multiple formats available for a dataset
> > in the QR, and the Association mechanism used is general and can deal
> > with any other type of logical association.
> >
> > I think we may want to consider promoting getData to a real operation
> > at some point, but a simple opaque access reference URL has its
> > advantages as well.
> >
> > 	- Doug
> >
> > On Wed, 30 May 2007, Patrick Dowler wrote:
> >> On Wednesday 30 May 2007 14:59, Doug Tody wrote:
> >>> The first question asked was what SIAP 1.0 intended, and this is
> >>> what
> >>> I have addressed above.  SIAP has always worked this way, and I am
> >>> surprised that anyone is confused.
> >>
> >> I don't think the FORMAT thing in SIA 1.0 is confusing. For most
> >> people, I
> >> expect, the collection they are serving is in one format or they
> >> decide to
> >> serve one format, so if a query comes in asking for GIF and they
> >> only have
> >> FITS, they return an empty VOTable (this is what we do).
> >>
> >> We have also toyed with on-the-fly conversion, which if deployed
> >> would mean we
> >> could respond "yes" to any format and do the conversion in the
> >> getData stage.
> >> This and the above approach are both kind of "a priori" knowledge
> >> of the
> >> possible types, without doing a DB query.
> >>
> >> We also have in some cases pre-computed preview images in graphics
> >> format, but
> >> I always found it kind of ugly for the observation catalog to know
> >> about
> >> these different formats. Essentially, trying to support this in a
> >> really
> >> general way brings in a very large database denormalisation problem,
> >> especially when you have set of systems for storing all the files
> >> and another
> >> for enabling the querying of metadata. I prefer to leave the file
> >> type stuff
> >> for the delivery mechanism to handle (eg the getData method) since
> >> that can
> >> be largely independent of the querying (except in this case of
> >> formats).
> >>
> >> if one had the on-the-fly conversion in place, then pre-computed
> >> previews (for
> >> example) could be just an optimisation of the retrieval process,
> >> which is
> >> again nice and clean and simple. But in general I think this
> >> FORMAT thing is
> >> an optimisation that introduces some complexity to implementing
> >> the service
> >> in some cases. It is easy enough to avoid that by reducing the
> >> scope of the
> >> service (eg we have fits and jpg but only deliver fits because we
> >> don't
> >> always have jpg on hand) but that does reduce overall value.
> >>
> >> summary: it's more complicated than it looks, but not confusing :-)

-- 

Patrick Dowler
Tel/Tél: (250) 363-6914                  | fax/télécopieur: (250) 363-0045
Canadian Astronomy Data Centre   | Centre canadien de donnees astronomiques
National Research Council Canada | Conseil national de recherches Canada
Government of Canada                  | Gouvernement du Canada
5071 West Saanich Road               | 5071, chemin West Saanich
Victoria, BC                                  | Victoria (C.-B.)



More information about the dal mailing list