[fitsbits] start of Public Comment Period on compressed FITS image and tables

Thu Jun 25 00:31:48 EDT 2015

On 6/24/2015 6:44 AM, Mark Taylor wrote:
> Dear FITS,
>
> regarding the tiled image/table compression proposal, in the spirit
> of this clarification from Lucio:
>
> On Wed, 24 Jun 2015, Lucio Chiappetti wrote:
>
>> First of all I clarify that the purpose of the Public Comment Period is just
>> to assess the feeling of the community.
>
> I'd like to express my feeling that: it looks pretty complicated.

Yes, and no.  The basic convention is really quite simple: divide the 
image into a rectangular grid of tiles, like a checkerboard, and then 
copy each tile into the corresponding row of a vector column of a binary 
table, where the first tile is stored in the first row, ..., and the 
last tile goes in the last row.  It only takes simple book keeping to 
figure out which tile contains a particular pixel of interest, or to put 
all the tiles back together to recreate the original image.

The complications come from supporting several different options for 
compressing the tile of pixels before storing them in the binary table. 
Image compression is still an evolving art, and some improved 
compression methods (such as dithering) have been added to this 
convention since it was first developed in 2001.  It is also possible 
that yet more compression options will be added in the future in 
response to user requests and needs.

>
> One of the attractions of the FITS format (as opposed to formats
> such as HDF5 or HDS that are essentially defined by their data
> access libraries) is that access to the basic numeric data is very
> straightforward in implementation terms.  When I needed to
> write a data access library for BINTABLE from scratch I could
> do it without a great deal of effort.  Supporting tiled table
> compression would complicate the implementation task considerably.
> The same goes even more so (complication by an order of magnitude?)
> for tiled images, though those are less close to my current
> personal concerns.  That is particularly true if the implementation
> language does not have existing libraries for the various defined
> compression algorithms.

I would guess that there is already a suitable implementation of the 
gzip algorithm for just about every language.  One would then just need 
implementations of the Rice, H-compress, and PLIO algorithms to complete 
the set of currently supported algorithms.  Rice and PLIO are actually 
pretty simple, so it would likely not be difficult to transcribe the 
existing C code into any other language that supports bit-level 
manipulations of arrays of bytes  (for Rice, in particular).  The 
quantizing and dithering algorithms are also pretty simple.

> The content of the proposed text seems to be reasonably clear,
> but three weeks is not enough time to attempt an implementation
> to check it does actually provide all the description required
> to implement these conventions.  The text looks like what it
> presumably is, a post-hoc codification of a series of experiments
> in compression (various different compression algorithms,
> dithering options), rather than a designed proposal for how to
> specify compression in a clean way.  Doing it that way is cheap
> for the implementation that served as the experimental testbed,
> but expensive for other implementations.

As mentioned above, the original design of the basic tiled image format 
has not changed, but a few new features have been added over the years 
to support some newer compression methods

> Yes the implementation of this proposed addition to the standard
> is all there in CFITSIO, but that can't be used directly by
> non-C-friendly languages such as java, javascript, and who knows
> what future platforms might arise.  That means that for instance
> browser-based FITS image viewers which currently can display any
> legal FITS image would likely, following incorporation of this
> convention to the standard, find themselves unable to deal with
> some standard FITS image data if they are unable to afford
> considerable extra implementation effort.

This is beyond my area of expertise, but according to Eric Mandel's 
message earlier today, it may in fact be possible to incorporate C 
libraries into the languages that are used for Web applications.

> I am aware there are some committed advocates and strong arguments
> for use of this convention.  This message is not itself a call to block
> incorporation of this text in the standard.  But since it's a Public
> Comment Period, I wanted to make a public comment noting my
> reservations.
>
> Mark
>