[fitsbits] reopening of Public Comment Period on the compression conventions
William Pence
William.Pence at nasa.gov
Fri Jan 15 13:45:58 EST 2016
Many scientists are hesitant at the prospect of using a lossy
compression algorithm on their precious data, but a more careful
analysis shows that these fears are often unwarranted. This is
especially true for astronomical images stored in 32-bit floating-point
format, which is capable of storing each pixel value with 7 decimal
places of precision. In fact, a quantitative analysis of 32-bit
astronomical images often shows that more than half of those bits are
effectively being used to record useless random noise, which by
definition cannot be compressed by any algorithm. This explains why one
often finds that floating point images can only be losslessly compressed
by a factor of 2 or less.
Using a lossy compression algorithm enables one to remove some of the
useless noise in the image while still preserving the scientifically
interesting information. One can quantitatively predict how much the
scientific information in an image will be degraded as a function of
increasing compression ratio. This enables scientists to choose the
maximum amount of information loss that is acceptable for a given data
set. For example, if a 32-bit floating point image is compressed by a
factor 6.0, this will only increase the measurement error on the
brightness and position of the faintest objects in the image by a
negligible 0.26%. (The errors on brighter objects will be even less).
Increasing the compression ratio to 8.0 or 10.0 will increase the
measurement errors by 1.03% or 4.08%, respectively. (These numbers are
based on the seminal work by Shannon 1948 on communications theory and
have been verified by numerical experiments on actual astronomical images).
Note that many papers and entire textbooks have been written on this
topic. For further information, I suggest starting with our paper on
"Optimal compression of floating-point FITS images" by Pence, White, and
Seaman (http://adsabs.harvard.edu/abs/2010PASP..122.1065P).
-Bill
On 1/14/2016 3:32 PM, Demitri Muna wrote:
> Hi,
>
> I don’t have any opinions on compression algorithms, but I want to note
> that I’m opposed to the FITS format supporting any lossy compression as
> part of the format. FITS should be archival, and I think that any lossy
> representation of data (e.g. a thumbnail) should be created outside of
> the file. Lossy algorithms can improve dramatically over time with the
> increase of CPU speed (see the ~50% space improvement from H.264 to
> H.265 for the same quality). Thumbnails are extremely cheap to create
> and cache and become cheaper to do so over time with faster I/O and CPU.
> It’s common in archives to have jpg files next to FITS files for this
> reason - I don’t see a benefit to building this into the format. Keep
> things light.
>
> Cheers,
> Demitri
>
> _________________________________________
> Demitri Muna
>
> Department of Astronomy
> Le Ohio State University
>
> Home page: http://muna.com
>
> My Projects:
> http://nightlightapp.io
> http://trillianverse.org
> http://scicoder.org
>
>
>
>
>
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>
More information about the fitsbits
mailing list