[fitsbits] 16-bit floats {External}

Mohammad Akhlaghi mohammad at akhlaghi.org
Wed Jul 23 09:31:20 EDT 2025


Thanks for the good suggestion!

In many cases (both in images/cubes and catalogs), the 32-bit float 
precision is indeed too much; I agree.

I guess the only issue is its portability. For example in the GNU C 
Library section on "Floating point representations" [1], regarding the 
16-bit float it says: "GNU C supports the 16-bit floating point type 
_Float16 on some platforms". I haven't had a chance to check other C 
libraries or compilers, or the details of the supported platforms here. 
But if the non-supported platforms are not common in astronomy, it 
should be no problem.

Cheers,
Mohammad

[1] 
https://www.gnu.org/software/c-intro-and-ref/manual/html_node/Floating-Representations.html

On 7/23/25 3:24 PM, peter teuben via fitsbits wrote:
> We also need them for BINTABLE, where D and E are for 64 and 32 bit 
> floats.  Sadly can't expand the alphabet here. F is available, but C is 
> not.
> 
>   I guess L is available for 128-bit integer.
> 
> 
> FITS format code Description                     8-bit bytes
> 
> L                        logical (Boolean)               1
> X                        bit                             *
> B                        Unsigned byte                   1
> I                        16-bit integer                  2
> J                        32-bit integer                  4
> K                        64-bit integer                  8
> A                        character                       1
> E                        single precision float (32-bit) 4
> D                        double precision float (64-bit) 8
> C                        single precision complex        8
> M                        double precision complex        16
> P                        array descriptor                8
> Q                        array descriptor                16
> 
> On 7/23/25 08:34, Thomas Robitaille via fitsbits wrote:
>> Hi everyone,
>>
>> As far as I understand, IEEE 754-2008 standardized the representation 
>> of 16-bit floats (as well as 128-bit floats). I was curious whether 
>> there is any interest in extending the FITS format to allow BITPIX=-16 
>> and BITPIX=-128?
>>
>> I am aware of some modern projects that would benefit from having 16- 
>> bit floats, since they consider it to be sufficient in precision to 
>> store very large datasets, and using 16-bit floats would perform a lot 
>> better than using compression on 32-bit floats for example, and 16-bit 
>> floats would allow a larger dynamic range than using 16-bit ints with 
>> BSCALE/BZERO.
>>
>> I'm curious to hear if this has been discussed before!
>>
>> Thanks,
>> Tom
>>
>> _______________________________________________
>> fitsbits mailing list
>> fitsbits at listmgr.nrao.edu
>> https://listmgr.nrao.edu/mailman/listinfo/fitsbits
> 
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/fitsbits



More information about the fitsbits mailing list