[fitsbits] FITS 'P' descriptors: signed or unsigned?

Wed Jun 15 18:21:27 EDT 2005

This note concerns a relatively small technical issue in the larger proposal 
to add 64-bit integer support to FITS:

At issue is whether to reverse the recent decision to define the 'P' 
variable-length array descriptors in FITS binary tables to be a pair of 
'signed 32-bit integers', and make them 'unsigned 32-bit integers' instead. 
    The first integer gives the number of elements in the array, and the 2nd 
integer gives the byte offset in the heap to the first element of the array. 
  The practical consequence of this change is that it will double the 
allowed heap size from about 2.1 GB to about 4.2 GB.

This is not just a theoretical issue because there are existing applications 
  that can easily produce FITS files with binary table heaps larger than 2.1 
GB (e.g., using the 'tiled' image compression convention where the 
compressed rows of an image are stored in a variable length array table 
column).  Allowing this extra factor of 2 in size will benefit software 
applications that would otherwise need to be rewritten to use the proposed 
'Q' 64-bit descriptors (assuming that the 'Q' type is eventually approved by 
the FITS committees).  There are no technical reasons not to support 
unsigned descriptor values (e.g., it is impossible to have negative 
descriptors).  Forcing the descriptors to be signed 32-bit integers 
artificially cuts in half the potential size of the heap.

The main argument for keeping the descriptors as signed integers is that 
FITS has never supported unsigned integers as a raw data type (although it 
does support unsigned integers by applying an offset to the FITS signed 
integer values).   Thus, it is argued, the definition of FITS remains more 
'pure' if we don't introduce unsigned integers in this case.   There is 
however a real distinction between the array descriptor values and the other 
FITS table column data types because the descriptor values themselves are 
almost never directly accessible at the application software level. 
Instead, the descriptor values are only used by the low-level FITS interface 
software routines, when accessing the arrays that the descriptor points to.

I don't consider this to be a major issue, but given a choice, I think the 
practical advantages of doubling the allowed size of the heap out weighs the 
more intangible 'purity of FITS' issue.

How do others feel about this issue?  Is there a clear consensus one way or 
the other?  Should the FITS committees be explicitly asked to vote on a 
preference?

This issue does not affect the proposed 'Q' 64-bit descriptors, because 
even signed 64-bit integers provide vastly more address space than could 
conceivably be used by any applications in the foreseeable future. 
Presumably the sign of the 'Q' descriptors should be defined to be the same 
as whatever is decided for the 'P' descriptors.

As a final note, to put this in historical perspective, the original FITS 
binary table definition paper did not specify the sign of the descriptor 
integers.   It was only when the variable-length array convention was 
approved by the FITS committees earlier this year that the wording was made 
more rigorous to define the sign.  The reason for choosing 'signed' rather 
than 'unsigned' was mainly because at the time there did not exist any 
software implementations that supported unsigned descriptor values. 
Subsequently, some FITS libraries (e.g., CFITSIO) have been enhanced to 
support unsigned descriptor values.  If we make this change now, it will 
reverse a decision that was only finally approved in April 2005.  Also, it 
will not invalidate any existing FITS files, because the positive, signed 
descriptor values can always be treated as unsigned integers.

Bill Pence
-- 
____________________________________________________________________
Dr. William Pence                          William.D.Pence at nasa.gov
NASA/GSFC Code 662         HEASARC         +1-301-286-4599 (voice)
Greenbelt MD 20771                         +1-301-286-1684 (fax)