[fitsbits] FITS 'P' descriptors: signed or unsigned?
William Pence
William.D.Pence at nasa.gov
Wed Jun 15 18:21:27 EDT 2005
This note concerns a relatively small technical issue in the larger proposal
to add 64-bit integer support to FITS:
At issue is whether to reverse the recent decision to define the 'P'
variable-length array descriptors in FITS binary tables to be a pair of
'signed 32-bit integers', and make them 'unsigned 32-bit integers' instead.
The first integer gives the number of elements in the array, and the 2nd
integer gives the byte offset in the heap to the first element of the array.
The practical consequence of this change is that it will double the
allowed heap size from about 2.1 GB to about 4.2 GB.
This is not just a theoretical issue because there are existing applications
that can easily produce FITS files with binary table heaps larger than 2.1
GB (e.g., using the 'tiled' image compression convention where the
compressed rows of an image are stored in a variable length array table
column). Allowing this extra factor of 2 in size will benefit software
applications that would otherwise need to be rewritten to use the proposed
'Q' 64-bit descriptors (assuming that the 'Q' type is eventually approved by
the FITS committees). There are no technical reasons not to support
unsigned descriptor values (e.g., it is impossible to have negative
descriptors). Forcing the descriptors to be signed 32-bit integers
artificially cuts in half the potential size of the heap.
The main argument for keeping the descriptors as signed integers is that
FITS has never supported unsigned integers as a raw data type (although it
does support unsigned integers by applying an offset to the FITS signed
integer values). Thus, it is argued, the definition of FITS remains more
'pure' if we don't introduce unsigned integers in this case. There is
however a real distinction between the array descriptor values and the other
FITS table column data types because the descriptor values themselves are
almost never directly accessible at the application software level.
Instead, the descriptor values are only used by the low-level FITS interface
software routines, when accessing the arrays that the descriptor points to.
I don't consider this to be a major issue, but given a choice, I think the
practical advantages of doubling the allowed size of the heap out weighs the
more intangible 'purity of FITS' issue.
How do others feel about this issue? Is there a clear consensus one way or
the other? Should the FITS committees be explicitly asked to vote on a
preference?
This issue does not affect the proposed 'Q' 64-bit descriptors, because
even signed 64-bit integers provide vastly more address space than could
conceivably be used by any applications in the foreseeable future.
Presumably the sign of the 'Q' descriptors should be defined to be the same
as whatever is decided for the 'P' descriptors.
As a final note, to put this in historical perspective, the original FITS
binary table definition paper did not specify the sign of the descriptor
integers. It was only when the variable-length array convention was
approved by the FITS committees earlier this year that the wording was made
more rigorous to define the sign. The reason for choosing 'signed' rather
than 'unsigned' was mainly because at the time there did not exist any
software implementations that supported unsigned descriptor values.
Subsequently, some FITS libraries (e.g., CFITSIO) have been enhanced to
support unsigned descriptor values. If we make this change now, it will
reverse a decision that was only finally approved in April 2005. Also, it
will not invalidate any existing FITS files, because the positive, signed
descriptor values can always be treated as unsigned integers.
Bill Pence
--
____________________________________________________________________
Dr. William Pence William.D.Pence at nasa.gov
NASA/GSFC Code 662 HEASARC +1-301-286-4599 (voice)
Greenbelt MD 20771 +1-301-286-1684 (fax)
More information about the fitsbits
mailing list