[fitsbits] reopening of Public Comment Period on the compression conventions

Fri Jan 15 13:45:58 EST 2016

Many scientists are hesitant at the prospect of using a lossy 
compression algorithm on their precious data, but a more careful 
analysis shows that these fears are often unwarranted. This is 
especially true for astronomical images stored in 32-bit floating-point 
format, which is capable of storing each pixel value with 7 decimal 
places of precision. In fact, a quantitative analysis of 32-bit 
astronomical images often shows that more than half of those bits are 
effectively being used to record useless random noise, which by 
definition cannot be compressed by any algorithm. This explains why one 
often finds that floating point images can only be losslessly compressed 
by a factor of 2 or less.

Using a lossy compression algorithm enables one to remove some of the 
useless noise in the image while still preserving the scientifically 
interesting information. One can quantitatively predict how much the 
scientific information in an image will be degraded as a function of 
increasing compression ratio. This enables scientists to choose the 
maximum amount of information loss that is acceptable for a given data 
set. For example, if a 32-bit floating point image is compressed by a 
factor 6.0, this will only increase the measurement error on the 
brightness and position of the faintest objects in the image by a 
negligible 0.26%. (The errors on brighter objects will be even less). 
Increasing the compression ratio to 8.0 or 10.0 will increase the 
measurement errors by 1.03% or 4.08%, respectively. (These numbers are 
based on the seminal work by Shannon 1948 on communications theory and 
have been verified by numerical experiments on actual astronomical images).

Note that many papers and entire textbooks have been written on this 
topic. For further information, I suggest starting with our paper on 
"Optimal compression of floating-point FITS images" by Pence, White, and 
Seaman (http://adsabs.harvard.edu/abs/2010PASP..122.1065P).

-Bill

On 1/14/2016 3:32 PM, Demitri Muna wrote:
> Hi,
>
> I don’t have any opinions on compression algorithms, but I want to note
> that I’m opposed to the FITS format supporting any lossy compression as
> part of the format. FITS should be archival, and I think that any lossy
> representation of data (e.g. a thumbnail) should be created outside of
> the file. Lossy algorithms can improve dramatically over time with the
> increase of CPU speed (see the ~50% space improvement from H.264 to
> H.265 for the same quality). Thumbnails are extremely cheap to create
> and cache and become cheaper to do so over time with faster I/O and CPU.
> It’s common in archives to have jpg files next to FITS files for this
> reason - I don’t see a benefit to building this into the format. Keep
> things light.
>
> Cheers,
> Demitri
>
> _________________________________________
> Demitri Muna
>
> Department of Astronomy
> Le Ohio State University
>
> Home page: http://muna.com
>
> My Projects:
> http://nightlightapp.io
> http://trillianverse.org
> http://scicoder.org
>
>
>
>
>
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>