[fitsbits] Potential new compression method for FITS tables

Mark Taylor m.b.taylor at bristol.ac.uk
Wed Dec 22 05:29:18 EST 2010


Bill and others,

On Tue, 21 Dec 2010, William Pence wrote:

> Thank you for carefully reading our document (describing a potential new
> compression method for FITS binary tables).  Here are a few more comments, in
> addition to the previous ones from Rob Seaman:

Thank you for considering my comments.  I have a couple of follow ups:

> I agree with Mark's observation that this compressed table format is not very
> convenient for applications that need random access to the rows and columns of
> data.  This is no different, however, from the case where the entire FITS file
> is compressed with gzip.   In both cases, it is usually necessary to
> uncompress the table before the application reads or writes data in the table.

Quite true.  There is a significant difference in convenience/usability
however, in that everybody understands what a .fits.gz file is and how
to uncompress it, whereas it will be much less obvious to people what
a tile-compressed table is, and how to make sense of it.  If the format
becomes widely used this issue will be ameliorated, but that would probably
take quite some time.

> This can be done either by explicitly creating an uncompressed copy of the
> FITS file (e.g., by using our fpack/funpack FITS file compression utility
> programs) which is then processed by the application program, or by having the
> FITS reader create an uncompressed virtual FITS file in memory, which is then
> accessed by the application program on the fly.  I'm planning to implement
> this latter approach in the CFITSIO library, similar to what has already been
> done to support the tiled-image compression format.  Application programs that
> use CFITSIO to access these compressed tables will be able to do so in exactly
> the same way as for normal uncompressed tables;  CFITSIO will transparently
> uncompress the table when necessary, and if the application modifies the
> table, then CFITSIO will automatically recompress it when the application is
> finished.

If I have time, and if this format looks like becoming widely used, I'd do
something similar in STIL/TOPCAT.  But in the case of large tables, 
it would still equate to a significantly longer processing time
than being able to do direct random access on an existing disk file.

My feeling is that, disk space being cheap, for most *user* contexts
the compression levels achievable with tile-compressed FITS will not
represent a good trade-off against the additional inconvenience of 
using them.  I am happy to admit however that for archives the reverse 
may well be true.

> Mark also expressed concerns about possible confusion between the compressed
> and uncompressed versions of the same table, by humans or by software that is
> unaware of this compression convention.  It is true that the headers of the
> uncompressed and uncompressed tables look quite similar, because only the
> NAXIS2, PCOUNT, and TFORMn keyword value must necessarily differ.  All the
> other keywords can remain unchanged.   I think this is largely a positive,
> because readers of the compressed table header (whether human or software) can
> quite easily understand the contents of the compressed table.   I don't think
> there is any danger than unsuspecting software could mistakenly process the
> compressed table and produce misleading scientific results, if for no other
> reason than because the compressed table will only contain a single row of
> data in most cases.  Mark suggested inventing a new extension type (instead of
> BINTABLE) for these compressed tables, but I don't think we want to encourage
> a proliferation of new extension types simply because the contents of the
> table are slightly different.  In any case, section 3.4.2 of the FITS standard
> says that only one extension format shall be approved for each type of data
> organization.

I do agree that this is not likely to lead to subtly inaccurate
scientific results.  I still think user confusion is quite likely,
but admit that this is a less serious issue.

> One possible improvement we could make is to add a few COMMENT keywords to the
> header of the compressed table to tell readers that table columns have been
> compressed, and include a link to further information about how to interpret
> the contents.

I think recommending this kind of additional annotation, along with 
some discussion in the document of the pros and cons of using this 
format in various contexts, would be an appropriate way to address 
my concerns.

Best festive wishes,

Mark

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/




More information about the fitsbits mailing list