[fitsbits] Fastest way to read a binary table

William Pence pence at tetra.gsfc.nasa.gov
Wed Jul 10 14:26:02 EDT 2002


This discussion of CFITSIOs internal memory management is perhaps of little
interest to most readers of this group, but here are some details for those
that are interested.  The CFITSIO User's Guide also discusses this issue, as
well as other optimizations strategies (available online at
http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c_user/node112.html).

CFITSIO generally reads a FITS file in chunks of 2880 bytes at a time; these
chunks are first copied into a set of temporary buffers in CFITSIO.  Earlier
versions of CFITSIO allocated 25 such buffers (72000 bytes total) and more
recently this has been increased to 40 buffers (115200 bytes). These buffers
are shared among all the opened FITS files.  For example, when a program
reads a header keyword, CFITSIO reads the relevant 2880-byte block from the
FITS file into a buffer, then extracts the actual keyword value from the
buffer and passes that to the application.  If the application then reads
another keyword, CFITSIO first checks if the needed FITS block is already
loaded in a buffer. 

Increasing the number of buffers allocated by CFITSIO (which is controlled
by the NIOBUF parameter in the fitsio2.h file) may improve the I/O speed of
some programs, especially if they read information scattered throughout the
file in more or less random order.  If, however, a program reads through a
file in close to sequential order, then increasing the number of buffers
will not help.  In fact having too many buffers will hurt performance
because CFITSIO then spends more time searching to see if the needed FITS
block is already loaded in one of the buffers.  Recent tests on a 'typical'
program seemed to show a slight improvement in overall I/O speed by
increasing the number of buffers to a couple hundred, but increasing it to
several thousand caused the program to run slower.

Finally, there is one important exception to the above behavior:  If an
application asks CFITSIO to read (or write) a large contiguous block of data
from a FITS file (as when reading a whole FITS image at once) then CFITSIO
will bypass the temporary buffers and will read the data directly from the
FITS file into the array that is provided by the application program (doing
byte swapping if necessary).  This works well for reading or writing images,
and transfer rates of more than 50 MBytes/s have been measured on some new
machines with very fast disks.  This doesn't generally help when reading
tables, however, because the elements in a column are usually not contiguous
in the FITS file.  As Clive Page pointed out in a subsequent message,
writing FITS tables in 'column-major' order, so that there is only a single
row in the table and each column is a (fixed length) vector, would enable
CFITSIO to read and write large tables much faster (perhaps by a factor of 2
or 3).  CFITSIO can handle vector columns with up to 2**31 = 2 x 10**9
elements.  The main problem with tables in this format is that is not very
convenient if the number of elements in the vector needs to be modified, or
if the final vector length is not known at the time the table is first
created. 

-Bill Pence
-- 
____________________________________________________________________
Dr. William Pence                          pence at tetra.gsfc.nasa.gov
NASA/GSFC Code 662         HEASARC         +1-301-286-4599 (voice)     
Greenbelt MD 20771                         +1-301-286-1684 (fax)


Clive Page wrote:
> 
> In article <mailman.1026146161.6715.fitsbits at listmgr.cv.nrao.edu>,
> William Pence  <pence at tetra.gsfc.nasa.gov> wrote:
> 
> >The number of rows to read in each iteration, M, can be determined
> >by calling the 'fits_get_rowsize' routine in CFITSIO.  The value of M is not
> >very critical; as a rough rule of thumb you could try using M = 60000 /
> >NAXIS1 where 'NAXIS1' is the width of one row of your table in bytes.
> 
> Erm, 60000 seems quite a small buffer size nowadays, given that it's hard
> to find a computer around here with less than 256 megabytes.  So a couple
> of questions, if I may:
> 
> How did you choose that value?
> 
> Is it possible to recompile CFITSIO with a bigger buffer
> 
> Would that make it more efficient at handling large binary tables?



More information about the fitsbits mailing list