[fitsbits] Rice compression from the command line

Arnold Rots arots at head.cfa.harvard.edu
Wed Jul 19 10:54:31 EDT 2006


One more issue: some images (I suspect largely empty but with subtle
variations) compress much better when converted to integer.
I had a recent case (working with background and psf images) where
compressed size went from 70% to 1%.

  - Arnold

Rob Seaman wrote:
> On Jul 18, 2006, at 9:05 PM, Mark Calabretta wrote:
> 
> > For the FITS binary table, 7zip is costly in CPU time for compression
> > but beats gzip and bzip2 handsomely in compression ratio.  However,  
> > 7zip
> > is not nearly so costly in elapsed time for decompression.  If these
> > results are typical then 7zip would have to be the compressor of  
> > choice
> > for FITS data distributed on the web.
> 
> Which raises the general question of constructing a figure of merit  
> for data compression.  Discussions like this usually focus on  
> compression ratio, the speed to compress and the speed to decompress,  
> but there are a number of important, less quantifiable, parameters:
> 
> 1) market penetration - gzip is a clear leader here
> 
> 2) openness of software - Both ends of the spectrum may have issues.   
> Patents held by some multi-national can quell our access (and  
> interest) if there is no loophole for educational licensing, but  
> navigating the intricacies of some extreme copyleft can do the same.
> 
> 3) applicability to a particular purpose - tiled Rice and PLIO are  
> very attractive, tiled gzip much less so (with default parameters)
> 
> 4) tailoring to data - a tile compressed FITS file is still a FITS file
> 
> 5) stability across a range of data sets - Even good ol' gzip varies  
> quite a bit in compression ratio from one file to the next.  For  
> example, the average gzip compression ratio over two years of NOAO  
> Mosaic II data is 0.586 +/- 0.0449.  Four and a half percent (1- 
> sigma) may not seem like a very wide distribution, but it's all in  
> the meaning of "average".  This is from 170 nights selected from 304  
> total.  All nights with binned data were rejected.  All multi- 
> instrument nights were rejected.  All nights with fewer than 10  
> object exposures were rejected.  And more to the point, average here  
> means "the mean of nightly means".  Picking a random recent night,  
> the compression ratio varies between 0.33 and 0.79 across several  
> dozen overtly identical 140 MB files.  Calibrations at the low end,  
> of course, and object frames at the top.  Obviously there are issues  
> of information theory here and one could use the incompressibility of  
> the "science" data to gauge the skill of the observer :-)
> 
> 6) availability of software - if God hadn't created cfitsio, it would  
> have had to be invented.  (Those who might be thinking that the same  
> applies for the Devil and IRAF - shame on you!)
> 
> 7) community support - after 7 years one might have hoped that more  
> projects and software would support tile compression.
> 
> 8) <your feature here>
> 
> In general, we often get bound up in theoretical discussions about  
> things like lossy compression, rather than focusing on pragmatic  
> issues of usability and suitability.  Meanwhile the LSST tidal wave  
> approaches, but there are going to be several smaller waves impacting  
> astronomy's shores first, including Pan-STARRS (however many  
> telescopes) and next generation instruments like the One-Degree  
> Imager and the Dark Energy Camera.
> 
> Features like #'s 1-8 can all be addressed through coordinated  
> community action - it might as well be the FITS community.  On the  
> other hand, the best way to understand the figure of merit parameters  
> of compression ratio, speed in, and speed out may be to focus not on  
> static archival holdings, but rather on the costs of bandwidth and  
> latency encountered when moving the data around.  After all, isn't  
> the point of the emerging Virtual Observatory to keep the pixels in  
> play, ever moving and interacting?  Even if we co-locate processing  
> with data, the data have to shuttle from a SAN across gigabit or  
> fiber channel to the Beowulf next door.  As Arnold just pointed out,  
> customer satisfaction (and thus our job security, I might add) depend  
> on the aggregate response of our systems.
> 
> I stumbled across a very interesting, very recent, paper on lossless  
> floating point compression:
> 
> 	http://www-static.cc.gatech.edu/~lindstro/papers/floatzip/paper.pdf
> 
> ...so recent it has yet to appear in either author's online  
> publication list.
> 
> As far as I can tell, there is nothing about any of the algorithms  
> referenced that would keep them from being used with astronomical  
> data.  The real question is how to turn academic advances into useful  
> tools for our community.  The FITS tile compression convention is one  
> step toward greasing the rails.
> 
> Bill Pence wants to add Hcompress to the cfitsio support for tile  
> compression.  Imagine, rather, supporting any and all of the  
> algorithms mentioned above - perhaps using some sort of plug-in/ 
> component architecture.  We're never going to identify a single best  
> compression scheme for all our data.  This was the subtext of the  
> tile compression proposal in the first place.  It's time to follow  
> through to the logical conclusion.  If any application could  
> transparently access data compressed a dozen different ways (perhaps  
> HDU by HDU in the same MEF), there would be no reason not to store  
> such heterogeneous representations or to convert the data on-the-fly  
> for task-specific purposes.  A suite of layered benchmark  
> applications would provide the tools to make these decisions.  Those  
> tools could even be automated to operate in adaptive ways within the  
> data handling components of our archives, pipelines, web services and  
> portals.
> 
> Sounds like a nifty ADASS abstract to me :-)  I'd already asked Bill  
> if he wanted to work on such a paper - anybody else want to pile on?
> 
> Rob
> 
> 

> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.cv.nrao.edu
> http://listmgr.cv.nrao.edu/mailman/listinfo/fitsbits
--------------------------------------------------------------------------
Arnold H. Rots                                Chandra X-ray Science Center
Smithsonian Astrophysical Observatory                tel:  +1 617 496 7701
60 Garden Street, MS 67                              fax:  +1 617 495 7356
Cambridge, MA 02138                             arots at head.cfa.harvard.edu
USA                                     http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------



More information about the fitsbits mailing list