[fitsbits] Potential new compression method for FITS tables
William Pence
William.Pence at nasa.gov
Thu Oct 28 16:42:26 EDT 2010
For the past few months, several of us (Rob Seaman, Rick White, and
myself) have been experimenting with a new compression method for FITS
binary tables that appears to be significantly more effective than the
usual method of simply compressing the whole FITS file with gzip. We
have produced a document, available at
http://fits.gsfc.nasa.gov/tiletable.pdf that describes this proposed
convention in more detail; here is a brief description from that document:
"This document describes a convention for compressing FITS binary
tables that is modeled after the FITS tiled-image compression method
(White et al. 2009) that has been in use for about a decade. The input
table is first optionally subdivided into tiles, each containing an
equal number of rows, then every column of data within each tile is
compressed and stored as a variable-length array of bytes in the
output FITS binary table. All the header keywords from the input
table are copied to the header of the output table and remain
uncompressed for efficient access. The output compressed table
contains the same number and order of columns as in the input
uncompressed binary table. There is one row in the output table
corresponding to each tile of rows in the input table. In principle,
each column of data can be compressed using a different algorithm
that is optimized for the type of data within that column, however in
the prototype implementation described here, the gzip algorithm is
used to compress every column."
In experiments on a sample of FITS tables from the HEASARC archive, this
new compression method produced about 50% more disk space savings than
the simple "gzip-the-whole-file" method. This compression improvement
is mainly a result of a) compressing the table column by column, instead
of on a row-by-row basis, and b) using a byte shuffling technique on
numeric columns that sorts the bytes in decreasing order of significance.
This is still a prototype, and we plan to do further testing before even
considering using this compression method on any publicly available FITS
files. In the meantime, we would be interested in any comments or
suggestions on this potential new FITS compression convention. We are
also interested in gathering a larger sample of representative FITS
tables for test purposes, so I would appreciate any suggestions of
suitable FITS files from different projects or observatories.
Bill Pence
--
____________________________________________________________________
Dr. William Pence William.Pence at nasa.gov
NASA/GSFC Code 662 HEASARC +1-301-286-4599 (voice)
Greenbelt MD 20771 +1-301-286-1684 (fax)
More information about the fitsbits
mailing list