[fitsbits] Fastest way to read a binary table

Clive Page cgp at nospam.le.ac.uk
Fri Jul 12 04:55:23 EDT 2002


In article <mailman.1026326160.14706.fitsbits at listmgr.cv.nrao.edu>,
Phil Hodge  <hodge at stsci.edu> wrote:

>Yes, the STSDAS table format can be either row-ordered or column-ordered,
>but the column-ordered option was a mistake, in my opinion.  It's fine
>if you only read/write column-by-column, but many of the generic tools
>operate on tables row-by-row, e.g. selecting rows, or sorting when you
>get to the point of physically rearranging the data, or writing to a FITS
>file.  We found that these operations were extremely slow on large,
>column-ordered tables.  Suppose you have a table with five columns (for
>example) and many rows.  If the table is row ordered, reading it column-
>by-column requires five passes through the file, which might not be too
>much overhead.  If the table is column ordered, however, reading it row-
>by-row requires almost as many passes through the file as there are rows,
>and that was painful.

I think that depends on how clever the software is.  A case that is quite
common in my experience is having a FITS binary table with a lot of columns
(sometimes more than 100) but for a given application you only want data
from 2 or 3 of them.  If the table is in row order, you have no option but
to read the entire file from start to finish; if the table is in column
order you _should_ be able to read just the few columns of interest, which
amounts to just a few percent of the total number of disc blocks.

Of course, there are other cases in which you want only a few rows but all
the columns; here row-major order is obviously more efficient (given an
index or other way of locating just the rows you want).  But this caveat is
rather an important one, and one may note that with a column-ordered table
you always have an efficient way of locating the offset to the column you
want, via the table headers.

By the way, I agree with Steve Allen's point that we can't expect CFITSIO
to emulate an RDBMS, but our FITS tables are getting larger, and we do need
to think about making access more efficient.


-- 
Clive Page   cgp at le.ac.uk



More information about the fitsbits mailing list