[fitsbits] Potential new compression method for FITS tables
Tim Pearson
tjp at astro.caltech.edu
Wed Dec 22 11:32:36 EST 2010
As a FITS user, i.e., an astronomer, I am very uncomfortable with the idea that I may be sent a valid FITS bintable that I can't make sense of. I regularly use tools like TOPCAT and fv (and my own homegrown tools) to inspect binary tables, and I would very much prefer that the tools would tell me "I do not understand this file" rather than just displaying garbage. When considering new ways of storing data within binary tables, please keep in mind naive users like me!
My concerns could be simply addressed by using a new extension type instead of BINTABLE. Programs that understand the compression convention would accept such extensions transparently; those that do not would tell me something is wrong.
Tim Pearson
On Dec 22, 2010, at 2:29 AM, Mark Taylor wrote:
> Bill and others,
>
> On Tue, 21 Dec 2010, William Pence wrote:
>
>> Thank you for carefully reading our document (describing a potential new
>> compression method for FITS binary tables). Here are a few more comments, in
>> addition to the previous ones from Rob Seaman:
>
> Thank you for considering my comments. I have a couple of follow ups:
>
>> I agree with Mark's observation that this compressed table format is not very
>> convenient for applications that need random access to the rows and columns of
>> data. This is no different, however, from the case where the entire FITS file
>> is compressed with gzip. In both cases, it is usually necessary to
>> uncompress the table before the application reads or writes data in the table.
>
> Quite true. There is a significant difference in convenience/usability
> however, in that everybody understands what a .fits.gz file is and how
> to uncompress it, whereas it will be much less obvious to people what
> a tile-compressed table is, and how to make sense of it. If the format
> becomes widely used this issue will be ameliorated, but that would probably
> take quite some time.
>
>> This can be done either by explicitly creating an uncompressed copy of the
>> FITS file (e.g., by using our fpack/funpack FITS file compression utility
>> programs) which is then processed by the application program, or by having the
>> FITS reader create an uncompressed virtual FITS file in memory, which is then
>> accessed by the application program on the fly. I'm planning to implement
>> this latter approach in the CFITSIO library, similar to what has already been
>> done to support the tiled-image compression format. Application programs that
>> use CFITSIO to access these compressed tables will be able to do so in exactly
>> the same way as for normal uncompressed tables; CFITSIO will transparently
>> uncompress the table when necessary, and if the application modifies the
>> table, then CFITSIO will automatically recompress it when the application is
>> finished.
>
> If I have time, and if this format looks like becoming widely used, I'd do
> something similar in STIL/TOPCAT. But in the case of large tables,
> it would still equate to a significantly longer processing time
> than being able to do direct random access on an existing disk file.
>
> My feeling is that, disk space being cheap, for most *user* contexts
> the compression levels achievable with tile-compressed FITS will not
> represent a good trade-off against the additional inconvenience of
> using them. I am happy to admit however that for archives the reverse
> may well be true.
>
>> Mark also expressed concerns about possible confusion between the compressed
>> and uncompressed versions of the same table, by humans or by software that is
>> unaware of this compression convention. It is true that the headers of the
>> uncompressed and uncompressed tables look quite similar, because only the
>> NAXIS2, PCOUNT, and TFORMn keyword value must necessarily differ. All the
>> other keywords can remain unchanged. I think this is largely a positive,
>> because readers of the compressed table header (whether human or software) can
>> quite easily understand the contents of the compressed table. I don't think
>> there is any danger than unsuspecting software could mistakenly process the
>> compressed table and produce misleading scientific results, if for no other
>> reason than because the compressed table will only contain a single row of
>> data in most cases. Mark suggested inventing a new extension type (instead of
>> BINTABLE) for these compressed tables, but I don't think we want to encourage
>> a proliferation of new extension types simply because the contents of the
>> table are slightly different. In any case, section 3.4.2 of the FITS standard
>> says that only one extension format shall be approved for each type of data
>> organization.
>
> I do agree that this is not likely to lead to subtly inaccurate
> scientific results. I still think user confusion is quite likely,
> but admit that this is a less serious issue.
>
>> One possible improvement we could make is to add a few COMMENT keywords to the
>> header of the compressed table to tell readers that table columns have been
>> compressed, and include a link to further information about how to interpret
>> the contents.
>
> I think recommending this kind of additional annotation, along with
> some discussion in the document of the pros and cons of using this
> format in various contexts, would be an appropriate way to address
> my concerns.
>
> Best festive wishes,
>
> Mark
More information about the fitsbits
mailing list