[fitsbits] Output array type when BZERO is an integer {External}

Mon Mar 11 13:11:27 EDT 2024

Let me add one more thing here: just as in the IVOA, there's no pool of effort out there that has a standing obligation to take action on improving the standard when an issue is raised, and still less is there a specific person on this list, as far as I know, whose job description includes their being a point person for such a thing.

So the process that is most likely to yield an improvement is for a proponent of an improvement (that's you, I'm sorry to say) to draft some language and circulate it here; I suspect the rest of the readership would be more likely to engage at that point.  Because I care about this specific issue myself (from a Firefly perspective) I would be happy to assist you with these first steps.  

Note also that the standard (again, much like the IVOA standards) does not concern itself with actual language bindings, so the wording of the clarification has to be fairly careful in order to avoid over-specifying behavior that is library-specific and thereby opening up a huge project.

Gregory

________________________________________
From: Dubois-Felsmann, Gregory P. <gpdf at ipac.caltech.edu>
Sent: Monday, March 11, 2024 10:01
To: Barrett, Paul
Cc: fitsbits at listmgr.nrao.edu
Subject: Re: [fitsbits] Output array type when BZERO is an integer {External}

Hi, Paul,

I agree entirely with you that these matters should be formally clarified.  I don't have a lot of experience turning the crank of the FITS-standard engine, however, so I'd be hard-pressed to give you an estimate on how long this might take.  I do note that the history on https://fits.gsfc.nasa.gov/fits_standard.html indicates that the last time a point release was created was 19 years ago, and there's no recent history of "errata" or similar clarifications.

I may be over-interpreting what either you or Rob said in this thread, but I didn't think Rob was suggesting in "other languages and libraries should start with the CFITSIO source for appropriate usage" that you wrap CFITSIO in Julia, but rather that you use it as a de-facto guide to the resolution of ambiguities.  I can't whole-heartedly endorse that, because I don't think the pressure should be taken off the standard to evolve to become more precise, but it is a realistic suggestion for you to use in order to keep working on your implementation: CFITSIO is very heavily used and I would judge it to be relatively unlikely that the community would choose a clarification to the standard that was inconsistent with its behavior in any mainstream situation.

Gregory

P.S.  I certainly agree with Rob in strongly encouraging you to implement tile compression.  Both of my projects (Rubin and SPHEREx) will generate public data products with compressed image extensions.

--
Gregory Dubois-Felsmann | Senior Staff Scientist | Caltech/IPAC
Science Platform Scientist, Vera C. Rubin Observatory
Pipeline System Designer, NASA SPHEREx mission
Mail Code MR 100-22 | Pasadena, CA 91125-2200 | gpdf at ipac.caltech.edu

________________________________________
From: Barrett, Paul <pebarrett at email.gwu.edu>
Sent: Monday, March 11, 2024 09:34
To: Dubois-Felsmann, Gregory P.
Cc: fitsbits at listmgr.nrao.edu
Subject: Re: [fitsbits] Output array type when BZERO is an integer {External}

Greg,

Thanks for clarifying the impact of this issue. You have clarified several points. It appears that my understanding of the document is different from what you have described for some keyword cases.

By explicitly specifying the type of the output array or the minimum type of the output array in the document, it makes it much easier to implement libraries for new languages. In addition, by specifying the behaviour when a keyword is present or absent would also make it easier to implement.

Julia is a very concise, yet high performance, programming language, so it doesn't require a lot of code to implement the FITS standard. Because of this, I spend more time trying to understand the FITS standard than I do writing the code. This should not be the case. There are two reasons for having a native Julia FITS package. First, Julia has an excellent package manager, which makes it easy to install and maintain libraries or packages. It can be cumbersome to have to manage and maintain packages that wrap libraries in other languages. Second, Julia is a concise high performance array language, so its performance is comparably and likely to be better than C/C++ or FORTRAN with fewer lines of code. Because development of the package is in its early stages, I have not benchmarked it against CFITSIO, but I would not be surprised to see that it is faster in most cases.

 -- Paul

On Sun, Mar 10, 2024 at 5:41 PM Dubois-Felsmann, Gregory P. <gpdf at ipac.caltech.edu<mailto:gpdf at ipac.caltech.edu>> wrote:
This is a bit of a logical/legal hole in the FITS standard, for a couple of reasons given below.  I agree with Paul that the best solution would be to issue a clarification.

I think it's an excellent moment for getting this right, when implementing a new client library from scratch instead of adiabatically as most of our others have been.

These questions also arise in the behavior of client applications -- for instance, IPAC Firefly, which I have some responsibility for -- so this isn't purely an issue for theoretical discussion.

1) Lack of clarity about the interpretation of missing headers and when the scaling should be applied at all

The FITS 4.00 standard specifies that BZERO and BSCALE both "shall" be floating point values, with defaults of "0.0" and "1.0", respectively -- and there's no discussion of the absence of BZERO and BSCALE being treated as a special case.  Thus, if you take this completely literally, it means that the inherently floating-point scaling operation is *always* performed (even when it's mathematically a no-op) and that the result should therefore *always* be a floating-point array.  That is obviously not the spirit of the standard!  It has to be possible to deliver an integer array as the output, which means we need a specification of how to trigger that.

Even then we are left to ask the question: if the absence of BZERO and BSCALE is supposed to trigger returning the array as its specified integer format, what is *explicit* specification of BZERO = 0.0 and BSCALE = 1.0 supposed to do?  The most legalistic reading of the standard is that the absence of a keyword and the explicit presence of the default value for that keyword are supposed to be treated the same way, but then we have define what it means for the explicitly specified value to be the same as the default.  The text says "1.0"; which of the following are equivalent: "1", "1.", "+1.0D0", "1.000000001" (equal to 1 in float32), "1.0000000000000000000000001", "0.99999999999999999999999999"?

If, as I do, you feel queasy about testing for floating-point equality in this situation, and you think "OK, the rule we should publish is:

'if either BZERO or BSCALE are present in the header, even if they are exactly 0.0 and 1.0 respectively, return the array in a floating-point format in which all possible values of the input are distinct in the output, if possible'"

(meaning that BITPIX 32 would yield a float64 output, and BITPIX 64 would have to be annotated as an exception, since we can't rely on a float128 being available), you have something that sounds defensible, but you still have another problem:

2) The unsigned-integer (and signed byte) special case

The standard also recommends the use of special values of BZERO to allow representation of unsigned 16/32/64-bit integers (and also signed 8-bit integers):

"... the BZERO keyword is also used when storing unsigned-integer values in the FITS array. In this special case the BSCALE keyword *shall* have the default value of 1.0, and the BZERO keyword *shall* have one of the integer values shown in Table 11."

Again, the spirit of this is obviously that if BSCALE is present with the value 1.0, and BITPIX is 32, say, and BZERO is 2147483648, the returned data should have type uint32, not some floating-point type.  What is a client library supposed to do if BZERO is 2147483648.0?  The same?  What if it's 2147483648.0000000000000000000000000001?  (In other words, is OK if the client library reads in the RHS of the BZERO header into an internal float64 first, before interpreting it, or is it supposed to handle the RHS of the BZERO header as a string and compare it only to exactly the value in Table 11?

Note that in this case the standard actually appears to say point blank that a BSCALE of 1.0 *shall* be supplied; it certainly doesn't say, e.g., "the BSCALE keyword shall be omitted".

NB: I have not tried to search the fitsbits archive -- I would not be at all surprised if this had come up before.

Gregory

________________________________________
From: fitsbits <fitsbits-bounces at listmgr.nrao.edu<mailto:fitsbits-bounces at listmgr.nrao.edu>> on behalf of Barrett, Paul via fitsbits <fitsbits at listmgr.nrao.edu<mailto:fitsbits at listmgr.nrao.edu>>
Sent: Saturday, March 9, 2024 11:20
To: fitsbits at listmgr.nrao.edu<mailto:fitsbits at listmgr.nrao.edu>
Subject: [fitsbits] Output array type when BZERO is an integer {External}

I'm writing a FITS package for the Julia programming language. I have a question about the output type of the image when BZERO is an integer value. The documentation implies that the output image should be a floating point type because the BSCALE value is a float. Is this correct? If yes, then I recommend stating this explicitly in the FITS standard documentation. I also recommend suggesting the appropriate output type depending on the input type, e.g., UInt8 => Float32, Int16 => Float32, etc.

Thanks,
Paul