[fitsbits] Potential new compression method for FITS tables
Mark Taylor
m.b.taylor at bristol.ac.uk
Thu Dec 16 12:15:25 EST 2010
On Thu, 16 Dec 2010, Rob Seaman wrote:
> > Although it doesn't say so explicitly, I presume since there's no
> > indication otherwise that tables encoded in the way described by this
> > document are still XTENSION = 'BINTABLE'.
>
> Yes.
>
> > Although a table encoded according to this convention
> > is syntactically a correct BINTABLE, if interpreted as a normal BINTABLE,
> > the contents will be garbage.
>
> This is true for the tiled-image convention, too. Bill can likely do the best job of discussing the trade-offs.
I think in practice the problems will be more serious in this case,
since the uncompressed and compressed HDUs have the same extension types.
If you pass a tile-compressed image (primary or IMAGE) HDU to an
application which is expecting an image, then if it doesn't understand
tile-compressed images, it will see a BINTABLE HDU and know it can't use it.
For tile-compressed tables on the other hand the uncompressed and
compressed HDUs have the same XTENSION type (a tile-compressed BINTABLE
is simply a BINTABLE with a few unfamiliar headers), so a
non-tile-compression-aware application has no clue that it's getting
something other than a normal table and is likely to attempt to process
it as normal, with results that are going to be useless and confusing,
if not worse.
> > For instance the TDIMn header,
> > whose content is not changed under the proposed convention, will no
> > longer contain the shape of elements in the column, and TUNITn will no
> > longer contain its units.
>
> These are interpreted as applying to the decompressed elements (and have little utility for the compressed vectors).
Yes, that's how you have to interpret them if you understand the tiled-table
convention. But according to sec 7.3 of the FITS standard, that's
not how they are to be interpreted.
> > For this reason it seems to me that if the
> > proposal is to be adopted, it ought to propose a new XTENSION type for
> > tile-compressed tables, so that unaware software realises that it doesn't
> > know how to interpret such HDUs.
>
> I think there is a general concern about multiplying the number of XTENSION types. This would also guarantee that such software can't make heads or tails of any sort of the HDUs.
I sense I'm not going to win this argument, but I'd argue that a
mechanism that ensures a software item will fail to interpret an
HDU at all is better than one which allows it to make an interpretation
which is bound to be badly wrong.
> > I also have a concern that these tables are harder to use than
> > existing non-compressed BINTABLEs. There are two aspects to this.
> > Most obviously, tool/library authors who wish to support such files
> > will need to write additional code for uncompression and/or
> > compression.
>
> Yes. These files are the FITS equivalent of columnar database technology, which faces similar advantages and disadvantages.
Not really - you can have uncompressed columnar databases which are
amenable to random access. If you compress a columnar (or indeed
row-based) DB then you see these issues. But compressibility is not
the only reason to use a column-based arrangement (see e.g.
2008ASPC..394..422T).
> > Secondly, tables which have been compressed in
> > this way are unsuitable for random access, since unlike for a
> > normal BINTABLE, it's not possible to calculate the HDU offset
> > of a given row/column cell.
>
> Rather the tiling works here as with random access for tiled-images. The software can calculate the offset to the row containing the tile.
Yes. For data access patterns which require large runs of sequential
access that would help a lot. For data access patterns which are very
scattered, it wouldn't.
Mark
--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
More information about the fitsbits
mailing list