[fitsbits] [EXT]Re: Output array type when BZERO is an integer {External} {External}

Tue Mar 12 01:39:41 EDT 2024

Part of what we're seeing is a consequence of a shift from programming languages and styles where the caller typically was responsible for allocating memory for and determining the datatype of the array into which to read the data, to languages like Python and Julia that have dynamic typing, where it is just as natural for the caller to allow a library to allocate the array and return an appropriate datatype -- unless the user _explicitly_ decides to override that choice.  (For instance, for the reason Bill suggested, that although the input data is integral, the client expects to immediately modify it in floating-point ways.)

This shift has been obscured in the Python and Julia worlds by the initial implementation choice to call through to CFITSIO, but in a from-scratch native interpretation, the need to clarify the point becomes very apparent.

I strongly suspect that if the standard had been written in an era in which dynamic typing was a common paradigm, the language around integer storage models and BZERO/BSCALE would have been more explicit about how the data publisher could signal to such a library whether it wanted the array treated by default as an integer array, exactly matching the BITPIX and the on-disk data values, or rescaled to fractional values requiring floating-point storage.

In trying to decide what such a library should do, we should consider two viewpoints (at least):
  * data publishers of the past, without dynamically-typed libraries in mind, may not have made any systematic decisions about how to signal this, or imagined it mattered, so a client library that tries to over-interpret things like "BSCALE is missing" vs. "BSCALE is reported as '1.0'" may be processing noise, as it were;
  * data publishers of the present, with petabytes of new images soon to be released in FITS from numerous wide-area surveys, may very much want to send reliable and interoperable typing signals to such libraries.  I know I do, and I'm personally responsible for the data representations of two such surveys.

I share your implied concern, Bill, that we may find it very hard to turn around a document in a reasonable time, but such things are somewhat self-fulfilling -- if we expect it to take forever, it surely will.  If we try to make the system work for us, maybe we'll succeed.  I don't think this will be the last such issue for FITS as the community moves to new languages and programming paradigms.

The conversation we've had on this thread, and similar ones I've had privately with Paul and others, have made clear to me that an existing CFITSIO-like library in which the output datatype is always user-specified simply doesn't care about this clarification as long as the revision doesn't change the required behavior in any such case.  Maybe that will make it easier to reach an agreement.

Whether the clarification makes it into the FITS standard itself, or whether it ends up as, say, a registered convention, is certainly debatable. I'd like to see it on a path to getting into the standard, but perhaps we can try out the language as a convention, at first, just to get something written down and referenceable?

Finally, I strongly support the FITS governing bodies formally tracking errata.

Gregory

________________________________________
From: fitsbits <fitsbits-bounces at listmgr.nrao.edu> on behalf of Barrett, Paul via fitsbits <fitsbits at listmgr.nrao.edu>
Sent: Monday, March 11, 2024 15:48
To: William Pence
Cc: Seaman, Robert Lewis - (rseaman); fitsbits at listmgr.nrao.edu
Subject: Re: [fitsbits] [EXT]Re: Output array type when BZERO is an integer {External}

Okay, I understand that the default behaviour for BZERO and BSCALE creates a floating point array because of the typical upconversion rules. However, I'm not clear about the data type for the special case where BZERO is an integer. In this case, it appears that BZERO is added first to the integer array before converting it to a floating point array, because BSCALE = 1.0 implies upconversion. Is this correct?

I do still think that the FITS standards document should be updated to account for the use of modern dynamically typed languages such as Python and Julia because they will be used more often in the future. It should be noted that users don't care what the data type of the array is as long as no precision is lost. If an image can be stored as a Float32 without losing precision, then it should. The user can convert it to a Float64 if needed or just let the code do the upconversion during a calculation. This is a more general programming approach than assuming that all FITS files are accessed and all calculations are done using a statically typed language. That is not the future.

 -- Paul

On Mon, Mar 11, 2024 at 5:41 PM William Pence <wdpence2000 at yahoo.com<mailto:wdpence2000 at yahoo.com>> wrote:
I attempted to reply to Paul’s original question a couple days ago but it failed to get CCed back to FITSBITS. Here is an updated reply:

The FITS standard explicitly defines the datatype of every keyword in the document, and in particular the BZERO keyword “SHALL contain a floating point number” (section 4.4.2.5).  Note also that FITS follows the Fortran numerical notation, so in the case of floating point numbers the decimal point is optional if there is no fractional part to the number.  In that case a floating point value is indistinguishable from an integer value (section 4.2.4).

CFITSIO was designed to allow the calling C or Fortran program to specify whatever datatype it wants when reading data from a FITS file.  Regardless of the intrinsic datatype of the data in the FITS file, CFITSIO will convert it on the fly to whatever the program requests.  For example, even if a FITS image has an intrinsic 16-bit integer datatype, the application program may want the pixel values returned in a floating point array for further data processing.  Similarly when writing data to a FITS file, CFITSIO will convert the datatype of the input data if necessary to conform to the FITS datatype.

IMHO, it is unnecessary (and maybe even  inappropriate) to make recommendations in the Standards document about how integer values should be converted to single or double floats in various circumstances.

In regard to a couple suggestions that it may be time to update the FITS Standard document, note that the last update was recently completed in 2018, after a painstaking 2 year process lead by Malcolm Currie and Lucio Chiappetti. A summary of the mainly typographical changes is given in Appendix H.4.

Since the time of the last update at least 3 relatively minor factual discrepancies in the document have been pointed out. Perhaps the FITS governing bodies should establish and maintain an Official Errata List.

Bill

On Mar 11, 2024, at 1:39 PM, Seaman, Robert Lewis - (rseaman) via fitsbits <fitsbits at listmgr.nrao.edu<mailto:fitsbits at listmgr.nrao.edu>> wrote:

Yes, don’t messed with unsigned integer support.

No, don’t muck about with CFITSIO, but files generated by CFITSIO (and other packages) should be readable using your Julia code, and vice versa.

Rob

On 3/11/24, 10:26 AM, "Barrett, Paul" wrote:

I'm not sure I understand what you mean about UInt16 and UInt32 remaining integers. Do you mean for the special case? If yes, then they will remain integers.

I'm sorry, but I am not writing an interface to CFITSIO. If you want such an interface, then use the wrapped version.

 -- Paul

On Mon, Mar 11, 2024 at 12:55 PM Seaman, Robert Lewis - (rseaman) <rseaman at arizona.edu<mailto:rseaman at arizona.edu>> wrote:
Hi Paul,

Just to be clear, the UInt16 and UInt32 would remain integers, right?

Whether or not you review CFITSIO code, applications layered on your code versus on CFITSIO should interoperate. This is particularly true for tile compression (which your package should support).

Rob

On 3/11/24, 9:01 AM, "Barrett, Paul" wrote:

If the input array is a UInt8, then a Float64 output array is overkill. A Float32 array should work just fine. This provides sufficient precision while saving memory. So using this line of thought, I would think these mappings from integer input arrays to floating point output arrays would be reasonable.

UInt8 => Float32
Int16 => Float32
Int32 => Float64
Int64 => Float64

By explicitly specifying this in the documentation, you would save future developers some time. They would not have to decipher how best to implement the code as I have done. Note that Julia does type inference, like Python, unless the developer specifies otherwise. If BSCALE looks like an integer, because there is no decimal point, then Julia will interpret this as an integer, unless I coerce it to a float. I'm just confirming that this is what I should do using the above mappings.

I'm trying to support whatever the standard specifies. Being explicit would make this easier for me. I would prefer not to have to wade into the CFITSIO code to do so and I don't think other developers should have to do so too.

 -- Paul

On Sun, Mar 10, 2024 at 12:17 PM Seaman, Robert Lewis - (rseaman) <rseaman at arizona.edu<mailto:rseaman at arizona.edu>> wrote:
Hi Paul,

I'm writing a FITS package for the Julia programming language. I have a question about the output type of the image when BZERO is an integer value. The documentation implies that the output image should be a floating point type because the BSCALE value is a float. Is this correct? If yes, then I recommend stating this explicitly in the FITS standard documentation. I also recommend suggesting the appropriate output type depending on the input type, e.g., UInt8 => Float32, Int16 => Float32, etc.

Yes, that is the ancient implication of BSCALE. This carries over to lossy tile compression, too, which is a very fancy BSCALE operation if you look at it sideways.

Undoubtedly there are other pending tweaks to the docs. This is hard to avoid with esoteric standards, especially perhaps if they survive through multiple generations of other contingent computer technologies.

I’m not sure I understand your last sentence. Could you provide a table of the mappings you think should apply?

To first order, other languages and libraries should start with the CFITSIO source for appropriate usage (or to suggest differently). One recommendation is that all FITS packages support tile compression.

Rob Seaman
Lunar and Planetary Laboratory
University of Arizona

_______________________________________________
fitsbits mailing list
fitsbits at listmgr.nrao.edu<mailto:fitsbits at listmgr.nrao.edu>
https://listmgr.nrao.edu/mailman/listinfo/fitsbits