[fitsbits] Rice compression from the command line
Arnold Rots
arots at head.cfa.harvard.edu
Wed Jul 19 10:54:31 EDT 2006
One more issue: some images (I suspect largely empty but with subtle
variations) compress much better when converted to integer.
I had a recent case (working with background and psf images) where
compressed size went from 70% to 1%.
- Arnold
Rob Seaman wrote:
> On Jul 18, 2006, at 9:05 PM, Mark Calabretta wrote:
>
> > For the FITS binary table, 7zip is costly in CPU time for compression
> > but beats gzip and bzip2 handsomely in compression ratio. However,
> > 7zip
> > is not nearly so costly in elapsed time for decompression. If these
> > results are typical then 7zip would have to be the compressor of
> > choice
> > for FITS data distributed on the web.
>
> Which raises the general question of constructing a figure of merit
> for data compression. Discussions like this usually focus on
> compression ratio, the speed to compress and the speed to decompress,
> but there are a number of important, less quantifiable, parameters:
>
> 1) market penetration - gzip is a clear leader here
>
> 2) openness of software - Both ends of the spectrum may have issues.
> Patents held by some multi-national can quell our access (and
> interest) if there is no loophole for educational licensing, but
> navigating the intricacies of some extreme copyleft can do the same.
>
> 3) applicability to a particular purpose - tiled Rice and PLIO are
> very attractive, tiled gzip much less so (with default parameters)
>
> 4) tailoring to data - a tile compressed FITS file is still a FITS file
>
> 5) stability across a range of data sets - Even good ol' gzip varies
> quite a bit in compression ratio from one file to the next. For
> example, the average gzip compression ratio over two years of NOAO
> Mosaic II data is 0.586 +/- 0.0449. Four and a half percent (1-
> sigma) may not seem like a very wide distribution, but it's all in
> the meaning of "average". This is from 170 nights selected from 304
> total. All nights with binned data were rejected. All multi-
> instrument nights were rejected. All nights with fewer than 10
> object exposures were rejected. And more to the point, average here
> means "the mean of nightly means". Picking a random recent night,
> the compression ratio varies between 0.33 and 0.79 across several
> dozen overtly identical 140 MB files. Calibrations at the low end,
> of course, and object frames at the top. Obviously there are issues
> of information theory here and one could use the incompressibility of
> the "science" data to gauge the skill of the observer :-)
>
> 6) availability of software - if God hadn't created cfitsio, it would
> have had to be invented. (Those who might be thinking that the same
> applies for the Devil and IRAF - shame on you!)
>
> 7) community support - after 7 years one might have hoped that more
> projects and software would support tile compression.
>
> 8) <your feature here>
>
> In general, we often get bound up in theoretical discussions about
> things like lossy compression, rather than focusing on pragmatic
> issues of usability and suitability. Meanwhile the LSST tidal wave
> approaches, but there are going to be several smaller waves impacting
> astronomy's shores first, including Pan-STARRS (however many
> telescopes) and next generation instruments like the One-Degree
> Imager and the Dark Energy Camera.
>
> Features like #'s 1-8 can all be addressed through coordinated
> community action - it might as well be the FITS community. On the
> other hand, the best way to understand the figure of merit parameters
> of compression ratio, speed in, and speed out may be to focus not on
> static archival holdings, but rather on the costs of bandwidth and
> latency encountered when moving the data around. After all, isn't
> the point of the emerging Virtual Observatory to keep the pixels in
> play, ever moving and interacting? Even if we co-locate processing
> with data, the data have to shuttle from a SAN across gigabit or
> fiber channel to the Beowulf next door. As Arnold just pointed out,
> customer satisfaction (and thus our job security, I might add) depend
> on the aggregate response of our systems.
>
> I stumbled across a very interesting, very recent, paper on lossless
> floating point compression:
>
> http://www-static.cc.gatech.edu/~lindstro/papers/floatzip/paper.pdf
>
> ...so recent it has yet to appear in either author's online
> publication list.
>
> As far as I can tell, there is nothing about any of the algorithms
> referenced that would keep them from being used with astronomical
> data. The real question is how to turn academic advances into useful
> tools for our community. The FITS tile compression convention is one
> step toward greasing the rails.
>
> Bill Pence wants to add Hcompress to the cfitsio support for tile
> compression. Imagine, rather, supporting any and all of the
> algorithms mentioned above - perhaps using some sort of plug-in/
> component architecture. We're never going to identify a single best
> compression scheme for all our data. This was the subtext of the
> tile compression proposal in the first place. It's time to follow
> through to the logical conclusion. If any application could
> transparently access data compressed a dozen different ways (perhaps
> HDU by HDU in the same MEF), there would be no reason not to store
> such heterogeneous representations or to convert the data on-the-fly
> for task-specific purposes. A suite of layered benchmark
> applications would provide the tools to make these decisions. Those
> tools could even be automated to operate in adaptive ways within the
> data handling components of our archives, pipelines, web services and
> portals.
>
> Sounds like a nifty ADASS abstract to me :-) I'd already asked Bill
> if he wanted to work on such a paper - anybody else want to pile on?
>
> Rob
>
>
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.cv.nrao.edu
> http://listmgr.cv.nrao.edu/mailman/listinfo/fitsbits
--------------------------------------------------------------------------
Arnold H. Rots Chandra X-ray Science Center
Smithsonian Astrophysical Observatory tel: +1 617 496 7701
60 Garden Street, MS 67 fax: +1 617 495 7356
Cambridge, MA 02138 arots at head.cfa.harvard.edu
USA http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------
More information about the fitsbits
mailing list