[fitsbits] FITS 'P' descriptors: signed or unsigned?

Wed Jun 15 18:41:16 EDT 2005

Hi Bill -

Unless someone can come up with a compelling reason why this causes a
technical problem I would support it (using unsigned for 32-bit pointers).
The 2 GB data size limit is getting to be a major problem which we have
to deal with.  The real solution is 64-bit support, but a factor of 2 for
something like this makes a big difference.  The only issue I can see is
that older programs not expecting unsigned would interpret such offsets
has having a negative value and probably reject the file.  In the worst
case (software fails to check for a negative value) a pointer error could
occur and invalid data could be returned.

	- Doug

On Wed, 15 Jun 2005, William Pence wrote:

> This note concerns a relatively small technical issue in the larger proposal 
> to add 64-bit integer support to FITS:
> 
> At issue is whether to reverse the recent decision to define the 'P' 
> variable-length array descriptors in FITS binary tables to be a pair of 
> 'signed 32-bit integers', and make them 'unsigned 32-bit integers' instead. 
>     The first integer gives the number of elements in the array, and the 2nd 
> integer gives the byte offset in the heap to the first element of the array. 
>   The practical consequence of this change is that it will double the 
> allowed heap size from about 2.1 GB to about 4.2 GB.
> 
> This is not just a theoretical issue because there are existing applications 
>   that can easily produce FITS files with binary table heaps larger than 2.1 
> GB (e.g., using the 'tiled' image compression convention where the 
> compressed rows of an image are stored in a variable length array table 
> column).  Allowing this extra factor of 2 in size will benefit software 
> applications that would otherwise need to be rewritten to use the proposed 
> 'Q' 64-bit descriptors (assuming that the 'Q' type is eventually approved by 
> the FITS committees).  There are no technical reasons not to support 
> unsigned descriptor values (e.g., it is impossible to have negative 
> descriptors).  Forcing the descriptors to be signed 32-bit integers 
> artificially cuts in half the potential size of the heap.
> 
> The main argument for keeping the descriptors as signed integers is that 
> FITS has never supported unsigned integers as a raw data type (although it 
> does support unsigned integers by applying an offset to the FITS signed 
> integer values).   Thus, it is argued, the definition of FITS remains more 
> 'pure' if we don't introduce unsigned integers in this case.   There is 
> however a real distinction between the array descriptor values and the other 
> FITS table column data types because the descriptor values themselves are 
> almost never directly accessible at the application software level. 
> Instead, the descriptor values are only used by the low-level FITS interface 
> software routines, when accessing the arrays that the descriptor points to.
> 
> I don't consider this to be a major issue, but given a choice, I think the 
> practical advantages of doubling the allowed size of the heap out weighs the 
> more intangible 'purity of FITS' issue.
> 
> How do others feel about this issue?  Is there a clear consensus one way or 
> the other?  Should the FITS committees be explicitly asked to vote on a 
> preference?
> 
> This issue does not affect the proposed 'Q' 64-bit descriptors, because 
> even signed 64-bit integers provide vastly more address space than could 
> conceivably be used by any applications in the foreseeable future. 
> Presumably the sign of the 'Q' descriptors should be defined to be the same 
> as whatever is decided for the 'P' descriptors.
> 
> As a final note, to put this in historical perspective, the original FITS 
> binary table definition paper did not specify the sign of the descriptor 
> integers.   It was only when the variable-length array convention was 
> approved by the FITS committees earlier this year that the wording was made 
> more rigorous to define the sign.  The reason for choosing 'signed' rather 
> than 'unsigned' was mainly because at the time there did not exist any 
> software implementations that supported unsigned descriptor values. 
> Subsequently, some FITS libraries (e.g., CFITSIO) have been enhanced to 
> support unsigned descriptor values.  If we make this change now, it will 
> reverse a decision that was only finally approved in April 2005.  Also, it 
> will not invalidate any existing FITS files, because the positive, signed 
> descriptor values can always be treated as unsigned integers.
> 
> Bill Pence
>