[fitsbits] Potential new compression method for FITS tables

William Pence William.Pence at nasa.gov
Thu Oct 28 16:42:26 EDT 2010


For the past few months, several of us (Rob Seaman, Rick White, and 
myself) have been experimenting with a new compression method for FITS 
binary tables that appears to be significantly more effective than the 
usual method of simply compressing the whole FITS file with gzip.  We 
have produced a document, available at 
http://fits.gsfc.nasa.gov/tiletable.pdf that describes this proposed 
convention in more detail;  here is a brief description from that document:

"This document describes a convention for compressing FITS binary
tables that is modeled after the FITS tiled-image compression method
(White et al. 2009) that has been in use for about a decade. The input
table is first optionally subdivided into tiles, each containing an
equal number of rows,  then every column of data within each tile is
compressed and stored as a variable-length array of bytes in the
output FITS binary table.  All the header keywords from the input
table are copied to the header of the  output table and remain
uncompressed for efficient access. The output compressed  table
contains the same number and order of columns as in the input
uncompressed binary table. There is one row in the output table
corresponding to each tile of rows in the input table.  In principle,
each column of data can be compressed using a different algorithm
that is optimized for the type of data within that column, however in
the prototype implementation described here, the gzip algorithm is
used to compress every column."

In experiments on a sample of FITS tables from the HEASARC archive, this 
new compression method produced about 50% more disk space savings than 
the simple "gzip-the-whole-file" method.  This compression improvement 
is mainly a result of a) compressing the table column by column, instead 
of on a row-by-row basis, and b) using a byte shuffling technique on 
numeric columns that sorts the bytes in decreasing order of significance.

This is still a prototype, and we plan to do further testing before even 
considering using this compression method on any publicly available FITS 
files.  In the meantime, we would be interested in any comments or 
suggestions on this potential new FITS compression convention.  We are 
also interested in gathering a larger sample of representative FITS 
tables for test purposes, so I would appreciate any suggestions of 
suitable FITS files from different projects or observatories.

Bill Pence
-- 
____________________________________________________________________
Dr. William Pence                       William.Pence at nasa.gov
NASA/GSFC Code 662       HEASARC        +1-301-286-4599 (voice)
Greenbelt MD 20771                      +1-301-286-1684 (fax)





More information about the fitsbits mailing list