[fitsbits] 64-bit integer comments

Wed May 11 15:43:29 EDT 2005

Here are some comments on several aspects of the 64-bit integer discussion:

1) On the suggestion to only add support for 64-bit integers in binary 
tables, and not in primary arrays:

 From an abstract "data model" point of view, a primary array or image 
extension is identical to a table containing 1 row and 1 column (containing 
the image as a vector).  CFITSIO (and possibly other FITS software) takes 
advantage of this by internally treating a FITS image as a special case of a 
binary table.  Thus, support for 64-bit images comes at little extra cost if 
that data type is supported in tables, so I see little reason to exclude it.

Also, FITS images can be used to store n-dimensional arrays of any type of 
quantity; they are not just used for 2-D spatial 'counts' images.  It is not 
difficult to dream up only slightly contrived cases where one might want to 
store a 1D array of 64-bit pointers, or object ID numbers, or time offset 
values in a FITS primary array or image extension.

2) On the lack of support for unsigned integers in some software environments:

It will be possible to write unsigned 64-bit integers into FITS files by 
setting the BZERO or TZEROn keyword to the value 2**63.  Fortran does not 
natively support unsigned integers, but in the past this was not a major 
problem because unsigned 16-bit FITS values could be cast to signed 32-bit 
Fortran variables, and unsigned 32-bit values could be cast to 64-bit 
floating point variables.  It is not as easy to support the full range of 
unsigned 64-bit integers, however, because there is no Fortran data type 
that has the necessary range and precision.  I don't think this is a major 
problem, but it could be challenging if Fortran programmers have to develop 
algorithms that support the full range of 64-bit unsigned values.

3)  On the practicality of using 64-bit floating point variables as a 
substitute for long integers:

Some have argued that there is little need for 64-bit integers because 
64-bit floating point values (with about 52 bits of dynamic range) are 
sufficient. The main worry I have with this is that floating point 
computational results may not be an exact integer.  In general, it is not a 
good idea to rely on the exact equality between computed floating point 
values because there may be platform-dependent differences in the least 
significant 1 or 2 bits of the value.   While it does seem fairly safe to 
assume that, e.g.,  6.0 divided by 3.0 will always be exactly 2.0, in 
principle, the result is only guaranteed to be accurate to about 15 decimal 
places, so the result could be 1.99999999999999 instead.  This becomes more 
problematic when doing a long sequence of operations on 'integer' values 
that are represented as floating point variables.  Thus, if I had a choice, 
I would always want to do integer arithmetic with 64-bit integer variables 
rather than  64-bit floating points, even if the latter has in principle 
more than enough dynamic range for my application.  If I have computed 
64-bit integer values, then I would also want to store them directly in a 
FITS file, rather than again have to worry about possible loss of precision 
if they have to be converted to 64-bit floating point values.

Bill Pence
-
____________________________________________________________________
Dr. William Pence                          William.D.Pence at nasa.gov
NASA/GSFC Code 662         HEASARC         +1-301-286-4599 (voice)
Greenbelt MD 20771                         +1-301-286-1684 (fax)