[fitsbits] start of Public Comment Period on compressed FITS image and tables

Rob Seaman seaman at noao.edu
Wed Jul 22 02:14:06 EDT 2015


On Jul 21, 2015, at 10:21 PM, van Nieuwenhoven, Richard <Richard.vanNieuwenhoven at adesso.at> wrote:

> 1. since 2005 the maximum for BITPIX is -64 (double) and 64 (long).
> since the value range of keywords has been extended, should there not be
> support for -128 and 128? I do not know if that kind of precision is
> necessary or a over-kill?

Generally overkill for empirical data from physical processes and finite integration times. There might be some edge cases.

> (in the next ~10 years).

Such a caveat implies some kind of Moore’s law / looming singularity world-view, but screaming increases in precision are not really in the cards. For example, projects will often choose to only use 16-bits out of an 18-bit A/D converter due to trade-offs with full-well or binning or gain, etc. Another way to view that is that old data retains archival value.

Or to put it another another way, the whole point of compression is to identify efficient representations of data, and this typically corresponds to requirements of only a few bits per pixel for astronomical data. The uncompressed original is not ground truth (and indeed CFITSIO permits the original to be compressed).

> As a coder and a bad math-text reader could I ask somebody to take a
> look at the description of the compression algorithms if it needs
> extension for BITPIX=64 as I noticed the compression algorithm Rice in
> cfitsio (as an example) directly only supports 8/16 and 32 bit. If the
> need for -128 and 128 is present then these should be checked also.

The Rice algorithm compresses integer pixel values and the same 16-bit data written into 32-bit pixels will compress to the same final file size. This would apply to 64-bit input arrays, too, as long as the values being represented were the same. This isn’t specific to Rice, but is easy to comprehend for the simple Rice algorithm which replaces a vector of pixels with a scale factor and a sequence of unary numbers. If the values are the same on input, they’ll be the same on output no matter the input precision.

Of course, the whole point of choosing a larger value for an integer BITPIX is to permit the representation of a larger data range, and for typical astronomical instruments this will include increased Poisson noise. Noise is entropy; entropy is incompressible.

Floating point data use lossy sigma-scaling to convert to the integers that are actually compressed. Assuming the mantissas are actually spread throughout the range in a way that requires the full size of the input representation, the question is how much precision you choose to retain. Critically scaled data (FPACK q=1) corresponds to 10:1 compression whatever the input image (Fig 1 of http://arxiv.org/pdf/1007.1179v1.pdf); this increases the noise by about 4%.

Which is to same that the quantization into integers quiets the noise by that amount and then the integers compress as above.

But floating-point data, for instance from processing pipelines, can’t manufacture precision. The usual rules for propagating precision (Bevington, etc) apply and if you flat-field and bias correct a 16-bit image, for instance, it doesn’t magically require 32-bit precision whether floating point or integer, but remains near 16-bits for even fairly complex sequences of data manipulations. It is the additional noise from the flats and biases, for instance, that needs to be accommodated. Signal is generally sparse and negligible in astronomical data, and besides, flat-fielding an image doesn’t increase the original signal. The movie industry even uses as 16-bit floating point representation. There is nothing magic about IEEE floating-point and 2’s complement integers.

Which is all to say that future discussions of increasing BITPIX would benefit from casting into terms of Shannon entropy and quantization noise.

Rob Seaman
NOAO




More information about the fitsbits mailing list