<HTML><BODY style="word-wrap: break-word; -khtml-nbsp-mode: space; -khtml-line-break: after-white-space; "><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;">Work on the next release of the NOAO Science Archive has caused me to revisit an earlier selection of gzip (which itself was the result of an exercise in "satisficing" the choice of compression). For all the obvious reasons (improved read/write speed, higher compression factors, transparent access) we're taking another look at FITS Rice compression. Not much seems to have changed over the past five years - except that it seems like the example imcopy program in the cfitsio distribution is actually being used in production environments. This program has several functional shortcomings, in addition to all the obvious logistical features that are missing in comparison to the unix gzip command, for instance.</SPAN></FONT><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><BR class="khtml-block-placeholder"></SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;">I've appended a quickly modified prototype that addresses some of those issues. (Compile and link as with imcopy.c.) If there are alternative FITS Rice compression tools already available, I would be delighted to hear about them. In the mean time, let me describe some of the issues I see with Rice compression, whether at the level of the FITS Convention, CFITSIO or imcopy:</SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><BR class="khtml-block-placeholder"></SPAN></FONT></DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre"> </SPAN><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><A href="http://heasarc.gsfc.nasa.gov/docs/software/fitsio/compression.html">http://heasarc.gsfc.nasa.gov/docs/software/fitsio/compression.html</A></SPAN></FONT></DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre"> </SPAN><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><A href="http://heasarc.gsfc.nasa.gov/docs/software/fitsio/compression/compress_image.html">http://heasarc.gsfc.nasa.gov/docs/software/fitsio/compression/compress_image.html</A></SPAN></FONT></DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre"> </SPAN><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><A href="http://heasarc.gsfc.nasa.gov/docs/software/fitsio/fitsio.html">http://heasarc.gsfc.nasa.gov/docs/software/fitsio/fitsio.html</A></SPAN></FONT></DIV><DIV><SPAN class="Apple-tab-span" style="white-space:pre"> </SPAN><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><A href="http://heasarc.gsfc.nasa.gov/docs/software/fitsio/cexamples/imcopy.c">http://heasarc.gsfc.nasa.gov/docs/software/fitsio/cexamples/imcopy.c</A></SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><BR class="khtml-block-placeholder"></SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;">Starting with the imcopy application first, there are as I say many missing feature. The two most obvious such are the ability to compress "in-place" and to process a list of files. One of the primary use cases for compression is as a magic wand to wave over a file or a directory to shrink the disk usage. Such a compression utility that instead creates a second file misses the point that many users will be aiming for.</SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><BR class="khtml-block-placeholder"></SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;">The next two issues appear to me to reflect limitations in the conceptual design of the CFITSIO interface. 1) a copy operation is not idempotent. Since the interface is semantically aware of the meaning, as well as the contents of headers, a new copy may differ in various ways from the original. This is a problem for a compression application that wants to be able to restore a byte-by-byte copy of the original. 2) updating an HDU does not necessarily update the checksums. Failing this, the checksum convention mandates that the CHECKSUM and DATASUM keywords be deleted, but instead CFITSIO leaves stale keywords (which remain stale even after restoring the uncompressed HDU, see #1).</SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><BR class="khtml-block-placeholder"></SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;">(Tests indicate that the output file resulting from compressing and then uncompressing whatever input file, may itself be idempotent. I don't know if this will hold up for all cases or for FITS interfaces other than CFITSIO. Such an action is something like the FITS equivalent of canonicalizing XML.)</SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><BR class="khtml-block-placeholder"></SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;">Finally the FITS compression convention is incomplete. It doesn't actually express a coherent strategy for compressing and/or uncompressing general FITS objects, but is limited to per-HDU issues. For example, if an "SIF" file (that is, not an "MEF") is compressed, an MEF is generated to contain the resulting binary table. No information is retained to describe the original file structure, so uncompressing this file later generates an ambiguity about whether the original was indeed an SIF or rather was an uncompressed MEF with a single IMAGE extension. A complementary issue arises with MEF input, if the primary HDU is not dataless. Does the "extra" extension resulting from compression become the first output extension or the last? How many extensions does such a restored file have? N or N+1?</SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><BR class="khtml-block-placeholder"></SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;">Philosophically FITS compression is not like gzip or other "opaque" compression. The output is itself a legal FITS object and interfaces like CFITSIO or tools like imcopy can invisibly regard a compressed image array as equivalent to an uncompressed array. This is a great strength, but it doesn't remove the utility of other compression use cases. For instance, I would be grateful if somebody could tell me how to infer the compression status of an HDU using CFITSIO. Invisibility is nice, but Claude Rains tells us its limits. (Which are that the prototype doesn't currently uncompress, simply because it can't a priori decide if the input is compressed to begin with. Obvious workaround is to have separate "grice" and "gunrice" commands. This might be desirable in any case for reasons I won't belabor here.)</SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><BR class="khtml-block-placeholder"></SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;">Some questions to mull over:</SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><BR class="khtml-block-placeholder"></SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;">1) Does a better alternative to the CFITSIO imcopy already exist? (Options don't have to be limited to ANSI C.) How best might we encourage a wide adoption of a single standard across the astronomical community? Gzip is ubiquitous, but so is FITS.</SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><BR class="khtml-block-placeholder"></SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;">2) What features should a general purpose command line FITS compression tool have? (For instance, should the checksums from the original file be cached for later comparison to restored HDUs - whether on disk or in memory?)</SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><BR class="khtml-block-placeholder"></SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;">3) Should idempotency and correct checksum handling be the responsibility of CFITSIO, or rather of the application?</SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><BR class="khtml-block-placeholder"></SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;">4) What logistical procedures and semantic structures need to be added to the FITS compression convention to support real-world usage?</SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><BR class="khtml-block-placeholder"></SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;">5) Note that I have not talked about compression algorithms at all. Has any progress been made on these issues in the last few years that FITS could benefit from? The compression convention is intended to support multiple algorithms, of course.</SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><BR class="khtml-block-placeholder"></SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;">Please take a look at the attached code. Please don't just take it and use it under battlefield conditions - this appears to be what happened with the original imcopy program :-) I've traded some email with Bill Pence about this issue, but would be delighted to hear additional feedback. If it turns out that further work is warranted on this prototype, I'll gladly donate the results to be incorporated into CFITSIO as Bill may deem appropriate. Folks interested in collaborating are always welcome.</SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><BR class="khtml-block-placeholder"></SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;">Rob Seaman</SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;">NOAO</SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><BR class="khtml-block-placeholder"></SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;">--------</SPAN></FONT></DIV><DIV><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"><BR class="khtml-block-placeholder"></SPAN></FONT></DIV><DIV><SPAN><FONT class="Apple-style-span" size="3"><SPAN class="Apple-style-span" style="font-size: 13px;"></SPAN></FONT></SPAN></DIV></BODY></HTML>