[fitsbits] Potential new compression method for FITS tables

Mark Taylor m.b.taylor at bristol.ac.uk
Thu Dec 16 12:15:25 EST 2010


On Thu, 16 Dec 2010, Rob Seaman wrote:

> > Although it doesn't say so explicitly, I presume since there's no
> > indication otherwise that tables encoded in the way described by this
> > document are still XTENSION = 'BINTABLE'.
> 
> Yes.
> 
> > Although a table encoded according to this convention
> > is syntactically a correct BINTABLE, if interpreted as a normal BINTABLE,
> > the contents will be garbage.
> 
> This is true for the tiled-image convention, too.  Bill can likely do the best job of discussing the trade-offs.

I think in practice the problems will be more serious in this case, 
since the uncompressed and compressed HDUs have the same extension types.
If you pass a tile-compressed image (primary or IMAGE) HDU to an 
application which is expecting an image, then if it doesn't understand 
tile-compressed images, it will see a BINTABLE HDU and know it can't use it.
For tile-compressed tables on the other hand the uncompressed and 
compressed HDUs have the same XTENSION type (a tile-compressed BINTABLE
is simply a BINTABLE with a few unfamiliar headers), so a
non-tile-compression-aware application has no clue that it's getting
something other than a normal table and is likely to attempt to process 
it as normal, with results that are going to be useless and confusing,
if not worse.

> > For instance the TDIMn header,
> > whose content is not changed under the proposed convention, will no
> > longer contain the shape of elements in the column, and TUNITn will no
> > longer contain its units.
> 
> These are interpreted as applying to the decompressed elements (and have little utility for the compressed vectors).

Yes, that's how you have to interpret them if you understand the tiled-table
convention.  But according to sec 7.3 of the FITS standard, that's
not how they are to be interpreted.

> > For this reason it seems to me that if the
> > proposal is to be adopted, it ought to propose a new XTENSION type for
> > tile-compressed tables, so that unaware software realises that it doesn't
> > know how to interpret such HDUs.
> 
> I think there is a general concern about multiplying the number of XTENSION types.  This would also guarantee that such software can't make heads or tails of any sort of the HDUs.

I sense I'm not going to win this argument, but I'd argue that a
mechanism that ensures a software item will fail to interpret an
HDU at all is better than one which allows it to make an interpretation
which is bound to be badly wrong.

> > I also have a concern that these tables are harder to use than
> > existing non-compressed BINTABLEs.  There are two aspects to this.
> > Most obviously, tool/library authors who wish to support such files 
> > will need to write additional code for uncompression and/or
> > compression.
> 
> Yes.  These files are the FITS equivalent of columnar database technology, which faces similar advantages and disadvantages.

Not really - you can have uncompressed columnar databases which are 
amenable to random access.  If you compress a columnar (or indeed 
row-based) DB then you see these issues.  But compressibility is not
the only reason to use a column-based arrangement (see e.g.
2008ASPC..394..422T).

> > Secondly, tables which have been compressed in
> > this way are unsuitable for random access, since unlike for a
> > normal BINTABLE, it's not possible to calculate the HDU offset
> > of a given row/column cell.
> 
> Rather the tiling works here as with random access for tiled-images.  The software can calculate the offset to the row containing the tile.

Yes.  For data access patterns which require large runs of sequential
access that would help a lot.  For data access patterns which are very
scattered, it wouldn't.

Mark

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/




More information about the fitsbits mailing list