gzipped images in SIAP 1.0 (fwd)
dtody at nrao.edu
Tue May 29 11:16:46 PDT 2007
This is a resend of a message posted over the weekend; due to some
sort of mailer issue it appears to not have been distributed properly.
Apologies to anyone who gets two copies.
One thing I forgot to mention, is that on the fly, transparent
HTTP-level compression can be useful not only for returning a dataset,
but for the query response itself, which is highly compressible text.
In the case of a dataset, a spectrum in VOTable format compressed
with gzip for transport over the wire, becomes comparable in data
size to the equivalent spectrum in FITS binary table format.
As Markus notes in a later email, further discussion won't affect
the promotion of SSAP to a PR, which should go forward shortly.
Nontheless this is an important issue, especially for cases where
the protocol is text-based. - Doug
---------- Forwarded message ----------
Date: Sun, 27 May 2007 16:40:34 -0600 (MDT)
From: Doug Tody <dtody at nrao.edu>
To: dal at ivoa.net
Subject: Re: gzipped images in SIAP 1.0
Hi All -
I missed this conversation earlier, which occurred while I was still
on travel. This is relevant not only for SIAP 1.0, but also for
SSAP, for which the problem is exactly the same, and in principle can
affect any future DAL protocol, hence it is worthwhile to look at
this carefully now. Note also that SSAP adds a COMPRESS parameter
(discussed further below), and attempts to deal more carefully with
the issue of compression than did SIAP 1.0.
This message is a bit long as I deal with three distinct but related
issues (these are separated by double blank lines below).
On Tue, 22 May 2007, Roy Williams wrote:
> I would like to know if it is possible for a compliant SIAP to return
> *compressed* FITS images.
Having read through all the postings, I think we need to keep two
issues clearly separated here: 1) what the client application wants
to see, and 2) what happens at the level of the HTTP protocol.
Compression can refer either to the data product which is returned
to the client application (this is visible to the client), or at the
level of the HTTP protocol, which already has built-in capabilities
for compression (this is handled at the HTTP client-server level and is
transparent to the client application). Hence we need to distinguish
between compression as seen by the "client" or "client application",
and compressesion as seen by the "HTTP-level client code": these are
not the same thing.
In general, anything which is a request parameter (such as FORMAT)
refers to *what the client app wants to see*. This is independent of
the underlying protocol. Currently we use HTTP, but if we were to
use something else, a request parameter such as FORMAT should still
have the same semantics.
Hence if the client app asks for FORMAT=image/fits, it is illegal
for it to end up with a GZIP-compressed FITS file. However, it is
ok for the service to return a compressed file, so long as this is
handled transparently at the level of the HTTP protocol. That is,
when we fetch the data (as others have already pointed out), the HTTP
response headers can be
in which case the HTTP-level client code should transparently unzip
the byte stream as it reads the data. NOTE though, it may not be
advisable for the service to do this unless the HTTP-level client
code issues an Accept-Encoding header which explicitly states that
the client code can handle optional stream-level compression.
On Tue, 22 May 2007, Roy Williams wrote:
> -- Does anyone remeber the intention of the comma-delimited list of MIME
> types? Should my code look for "application/x-gzip,image/fits"
> Or maybe the other way around?
If this refers to the FORMAT query parameter, which takes a list
of MIME types, then this tells the service to describe, in the
query response, only images with the given MIME types. If it can't
generate the requested MIME type it should return nothing. Hence if
the client asks for image/jpeg and the client cannot return a JPEG,
the response will be a null query (REQUEST_STATUS=OK and no data).
As I mentioned earlier, SSAP adds a new parameter COMPRESS, which
attempts to deal more explicitly with the issue of whole-file
(gzip-style) compression. Since this is a request parameter,
this refers to compression *as seen by the client application*.
If the client enables compression (it is disabled by default)
then the service is permitted to return compressed files, or not,
as it sees fit. If the client app enables compression it has to
be able to deal with the returned data optionally being returned
using whole-file compression. For this to work we have to limit the
compression options to only widely-implemented algorithms such as gzip.
In this case the data product itself (the Content-Type) is compressed,
independent of what is happening to the data stream at the HTTP level.
What the returned MIME type should be is not entirely clear: it could
be application/x-gzip or maybe something like image/fits;encoding=gzip.
This feature allows compressed files to be passed through the protocol
unchanged, and manipulated by applications in compressed form. I think
generic whole-file compression of this sort is a separate issue from
something like tiled images with Rice compression; the latter is an
internal feature of FITS. We might enable it at the protocol level
in the future, but currently there is no attempt to support this.
Probably COMPRESS has not had enough discussion - even though it
has been in the SSAP specification for about a year. What do folks
think: do we need both HTTP-level transparent data stream compression,
and actual dataset-level whole file compression?
More information about the dal