[fitsbits] Rice compression from the command line

William Pence pence at milkyway.gsfc.nasa.gov
Wed Jul 12 18:35:54 EDT 2006


Rob Seaman wrote:
> updating an HDU does not necessarily update the checksums.  Failing 
> this, the checksum convention mandates that the CHECKSUM and DATASUM 
> keywords be deleted, but instead CFITSIO leaves stale keywords (which 
> remain stale even after restoring the uncompressed HDU, see #1).

Verifying or updating the checksum keywords can be an expensive operation 
for large images.  In a real world application, the user should be able to 
control whether or not to compute these keywords.  I think the default 
should be to simply delete the keywords.

> Finally the FITS compression convention is incomplete.  It doesn't 
> actually express a coherent strategy for compressing and/or 
> uncompressing general FITS objects, but is limited to per-HDU issues.  
> For example, if an "SIF" file (that is, not an "MEF") is compressed, an 
> MEF is generated to contain the resulting binary table.  No information 
> is retained to describe the original file structure, so uncompressing 
> this file later generates an ambiguity about whether the original was 
> indeed an SIF or rather was an uncompressed MEF with a single IMAGE 
> extension. 

The tile compression format currently fails to provide a way to distinguish 
between a compressed primary array image, and a compressed image extension, 
since the original SIMPLE or XTENSION keywords are not preserved in the 
compressed file.  Also, the BLOCKED and EXTEND primary array keywords, if 
present, and the PCOUNT and GCOUNT image extension keywords in the original 
image are not preserved in the compressed image file (at least in the 
CFITSIO implementation).  This makes it impossible to reconstruct the 
original comment fields that may have been present in these keywords.  To 
solve these problems, I think the definition of the tile compressed image 
format needs to be extended to allow the following special keywords to be 
present in the compressed image header:

    ZSIMPLE
    ZEXTEND
    ZBLOCKED
    ZTENSION
    ZPCOUNT
    ZGCOUNT

These keywords will preserve an exact image of the corresponding header 
records in the original primary array or image extension.  The existence of 
the ZSIMPLE or the ZTENSION keyword in the compressed image header will 
indicate whether the original image was in the primary array or an IMAGE 
extension.

> For instance, I would be grateful if somebody could tell me how 
> to infer the compression status of an HDU using CFITSIO. 

There is an undocumented function in CFITSIO to test if the HDU pointed to 
by the 'fitsfile' pointer contains a compressed image:

int fits_is_compressed_image(fitsfile *fptr,  int *status)

It returns 1 if it is a compressed image, otherwise it returns 0.

> 2) What features should a general purpose command line FITS compression 
> tool have?  (For instance, should the checksums from the original file 
> be cached for later comparison to restored HDUs - whether on disk or in 
> memory?)

There are a couple other parameters that users may need to specify:

a) the compression tile size (in pixel units).  By default, the image is 
compressed on a row by row basis (i.e., the tiles are NAXIS1 x 1 in size for 
a 2D image) but this may not be optimal, especially for small images.

b) when doing lossy compression of floating point format images, there is 
parameter that controls the number of bits of 'noise' to preserve in the 
compressed image.  The default value is 4 in CFITSIO, but it can range from 
1 up to 16 (or maybe 32).  The smaller the value the more the file is 
compressed.

> 3) Should idempotency and correct checksum handling be the 
> responsibility of CFITSIO, or rather of the application?

If the additional 6 keywords listed above are added, then idempotency should 
be preserved in most cases (at least for lossless compression of integer 
images).  The exact order of the keywords may not be preserved, but this 
does not affect the checksum values at least.

> 4) What logistical procedures and semantic structures need to be added 
> to the FITS compression convention to support real-world usage?
> 
> 5) Note that I have not talked about compression algorithms at all.  Has 
> any progress been made on these issues in the last few years that FITS 
> could benefit from?  The compression convention is intended to support 
> multiple algorithms, of course.

The Hcompress algorithm should probably be the next one to add.

Bill Pence
-- 
____________________________________________________________________
Dr. William Pence                       pence at milkyway.gsfc.nasa.gov
NASA/GSFC Code 662       HEASARC        +1-301-286-4599 (voice)
Greenbelt MD 20771                      +1-301-286-1684 (fax)





More information about the fitsbits mailing list