[fitsbits] Interpretation of repeat count in binary tables

Tue Apr 1 11:39:55 EDT 2008

Paddy Leahy wrote:
> On Mon, 31 Mar 2008, William Pence wrote:
>> In the example you cite below, the binary table contains 12288 rows of data, 
>> and each row contains 3 vectors, containing 1024 single precision floating 
>> point numbers each.  There is no explicit relationship between the vectors 
>> in different rows of the table.
> 
> "No explicit relationship" sounds worrying. Does this mean that different 
> FITS readers can legitimately interpret the same table in different ways?
> Or are there some keywords that clarify the relationship between rows?

The mandatory keywords in this case simply define a FITS table that 
consists of a sequence of 1024 element vectors.  There is no explicitly 
defined relationship between the vectors on different rows. Implicitly, 
however, the fact that all these vectors have been grouped together into 
a single table suggests that they are probably related in some way. 
There could be other mission-specific keywords in the header that 
explains the relationship.

> <snip>
>> The TDIMn keyword, if present in this case, would specify the dimensionality 
>> of each individual 1024 element vector in that column. For example, TDIM1 = 
>> '(16,64)', would mean that each vector in column 1 should be interpreted as a 
>> 16 x 64 2D array.  If there is no TDIMn keyword, then the vector would be 
>> interpreted as a 1024 element 1-D array.
> 
> So, let me be clear: a fits reader is asked to return the data from column 
> 2 of a table which has N_row rows and a repeat count of R for column 2. 
> This should definitely be returned as a 2-D array of (R, N_row) elements, 
> even if TDIM2 is not present? (Or maybe, given your "no explicit 
> relationship" comment, as something looser, like a linked list of arrays 
> of length R?).

It mainly depends on what the application program itself wants to read 
from the table.  Some applications might read the vectors sequentially 
one row at at time in a loop; other applications might ask for all the 
vectors at once, as a big array of length = (column vector count) * 
(NAXIS2 rows). The application could optionally interpret this as a 2D 
array if it desires, but the FITS keywords do not mandate this.

> The point being that, I guess because some fits readers are highly 
> inefficient at reading tables with short rows, it has become common 
> practice at least in the CMB field to use a repeat count just as a way of 
> packing data in a way that can be read efficiently. The writer of my 
> example intended each column to be read as 1 long vector of R*N_row 
> elements. Is that definitely contrary to the standard?

No it is not contrary to the standard. Projects are free to invent their 
own local conventions for storing the data in whatever structure is most 
convenient for them.  If, however, the intention is to pack 3  1024 x 
12288 element 2-D arrays into a binary table, then a more explicit way 
to do this would be as follows:

NAXIS1  =            150994944 / width of table (1024*12288*3*4)
NAXIS2  =                    1 / table has only 1 row
...
TFIELDS = 3
TFORM1 = '12582912E'
TDIM1  = '(1024,12288)'
TFORM2 = '12582912E'
TDIM2  = '(1024,12288)'
TFORM3 = '12582912E'
TDIM3  = '(1024,12288)'

In this structure, all the elements for each array are contiguous in the 
table, which would make it more efficient when reading the whole
2-D array at once.

Bill Pence
-- 
____________________________________________________________________
Dr. William Pence                       William.Pence at nasa.gov
NASA/GSFC Code 662       HEASARC        +1-301-286-4599 (voice)
Greenbelt MD 20771                      +1-301-286-1684 (fax)