<div dir="ltr">The reason that I asked this question is because I had a similar problem with the radio astronomy's Measurement Set (MS) data format. I spent a lot of time trying to decipher the C++ code in order to understand the format, because there was no document that formalized the standard. The standard is the code. In several cases, I was able to decipher the hundred or so lines of C++ code to find out that I could provide the same functionality in a single line of code. I eventually gave up. So my point is that code should not be the standard: a clear and concise document should be. If you want people to use it, then it is incumbent on the proponents to produce such a document. Just as it is incumbent on me as a proponent of Julia to provide the necessary software to do astronomy. I am sorry to say that I am not a proponent of FITS, so I don't believe that it is incumbent on me to produce such a document. I'm writing FITS.jl out of necessity so that I and others can do science. Personally, I would prefer a more modern data format.<div><br></div><div>FITS.jl will eventually support tile compression, assuming that it is documented in the standard and I can understand it.</div><div><br></div><div>Here is another suggestion for improving the document. Group keywords, both required and optional, as they might appear in a heterogeneous or composite data type (i.e., a C struct). All modern languages have such data types. By discussing them as a group and showing how their presence or absence and values affect their behaviour, it will make it easier to understand the standard and provide hints to the developer. This is what I do when reading through the standard. I try to determine which keywords belong together in a composite type and how keywords modify the behaviour of that type. This will make it easy for any computer scientist to implement new code, because that is the way they are trained to think.</div><div><br></div><div> -- Paul</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Mar 11, 2024 at 1:01 PM Dubois-Felsmann, Gregory P. <<a href="mailto:gpdf@ipac.caltech.edu">gpdf@ipac.caltech.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi, Paul,<br>
<br>
I agree entirely with you that these matters should be formally clarified. I don't have a lot of experience turning the crank of the FITS-standard engine, however, so I'd be hard-pressed to give you an estimate on how long this might take. I do note that the history on <a href="https://fits.gsfc.nasa.gov/fits_standard.html" rel="noreferrer" target="_blank">https://fits.gsfc.nasa.gov/fits_standard.html</a> indicates that the last time a point release was created was 19 years ago, and there's no recent history of "errata" or similar clarifications.<br>
<br>
I may be over-interpreting what either you or Rob said in this thread, but I didn't think Rob was suggesting in "other languages and libraries should start with the CFITSIO source for appropriate usage" that you wrap CFITSIO in Julia, but rather that you use it as a de-facto guide to the resolution of ambiguities. I can't whole-heartedly endorse that, because I don't think the pressure should be taken off the standard to evolve to become more precise, but it is a realistic suggestion for you to use in order to keep working on your implementation: CFITSIO is very heavily used and I would judge it to be relatively unlikely that the community would choose a clarification to the standard that was inconsistent with its behavior in any mainstream situation.<br>
<br>
Gregory<br>
<br>
P.S. I certainly agree with Rob in strongly encouraging you to implement tile compression. Both of my projects (Rubin and SPHEREx) will generate public data products with compressed image extensions.<br>
<br>
--<br>
Gregory Dubois-Felsmann | Senior Staff Scientist | Caltech/IPAC<br>
Science Platform Scientist, Vera C. Rubin Observatory<br>
Pipeline System Designer, NASA SPHEREx mission<br>
Mail Code MR 100-22 | Pasadena, CA 91125-2200 | <a href="mailto:gpdf@ipac.caltech.edu" target="_blank">gpdf@ipac.caltech.edu</a><br>
<br>
<br>
<br>
________________________________________<br>
From: Barrett, Paul <<a href="mailto:pebarrett@email.gwu.edu" target="_blank">pebarrett@email.gwu.edu</a>><br>
Sent: Monday, March 11, 2024 09:34<br>
To: Dubois-Felsmann, Gregory P.<br>
Cc: <a href="mailto:fitsbits@listmgr.nrao.edu" target="_blank">fitsbits@listmgr.nrao.edu</a><br>
Subject: Re: [fitsbits] Output array type when BZERO is an integer {External}<br>
<br>
Greg,<br>
<br>
Thanks for clarifying the impact of this issue. You have clarified several points. It appears that my understanding of the document is different from what you have described for some keyword cases.<br>
<br>
By explicitly specifying the type of the output array or the minimum type of the output array in the document, it makes it much easier to implement libraries for new languages. In addition, by specifying the behaviour when a keyword is present or absent would also make it easier to implement.<br>
<br>
Julia is a very concise, yet high performance, programming language, so it doesn't require a lot of code to implement the FITS standard. Because of this, I spend more time trying to understand the FITS standard than I do writing the code. This should not be the case. There are two reasons for having a native Julia FITS package. First, Julia has an excellent package manager, which makes it easy to install and maintain libraries or packages. It can be cumbersome to have to manage and maintain packages that wrap libraries in other languages. Second, Julia is a concise high performance array language, so its performance is comparably and likely to be better than C/C++ or FORTRAN with fewer lines of code. Because development of the package is in its early stages, I have not benchmarked it against CFITSIO, but I would not be surprised to see that it is faster in most cases.<br>
<br>
-- Paul<br>
<br>
<br>
<br>
On Sun, Mar 10, 2024 at 5:41 PM Dubois-Felsmann, Gregory P. <<a href="mailto:gpdf@ipac.caltech.edu" target="_blank">gpdf@ipac.caltech.edu</a><mailto:<a href="mailto:gpdf@ipac.caltech.edu" target="_blank">gpdf@ipac.caltech.edu</a>>> wrote:<br>
This is a bit of a logical/legal hole in the FITS standard, for a couple of reasons given below. I agree with Paul that the best solution would be to issue a clarification.<br>
<br>
I think it's an excellent moment for getting this right, when implementing a new client library from scratch instead of adiabatically as most of our others have been.<br>
<br>
These questions also arise in the behavior of client applications -- for instance, IPAC Firefly, which I have some responsibility for -- so this isn't purely an issue for theoretical discussion.<br>
<br>
1) Lack of clarity about the interpretation of missing headers and when the scaling should be applied at all<br>
<br>
The FITS 4.00 standard specifies that BZERO and BSCALE both "shall" be floating point values, with defaults of "0.0" and "1.0", respectively -- and there's no discussion of the absence of BZERO and BSCALE being treated as a special case. Thus, if you take this completely literally, it means that the inherently floating-point scaling operation is *always* performed (even when it's mathematically a no-op) and that the result should therefore *always* be a floating-point array. That is obviously not the spirit of the standard! It has to be possible to deliver an integer array as the output, which means we need a specification of how to trigger that.<br>
<br>
Even then we are left to ask the question: if the absence of BZERO and BSCALE is supposed to trigger returning the array as its specified integer format, what is *explicit* specification of BZERO = 0.0 and BSCALE = 1.0 supposed to do? The most legalistic reading of the standard is that the absence of a keyword and the explicit presence of the default value for that keyword are supposed to be treated the same way, but then we have define what it means for the explicitly specified value to be the same as the default. The text says "1.0"; which of the following are equivalent: "1", "1.", "+1.0D0", "1.000000001" (equal to 1 in float32), "1.0000000000000000000000001", "0.99999999999999999999999999"?<br>
<br>
If, as I do, you feel queasy about testing for floating-point equality in this situation, and you think "OK, the rule we should publish is:<br>
<br>
'if either BZERO or BSCALE are present in the header, even if they are exactly 0.0 and 1.0 respectively, return the array in a floating-point format in which all possible values of the input are distinct in the output, if possible'"<br>
<br>
(meaning that BITPIX 32 would yield a float64 output, and BITPIX 64 would have to be annotated as an exception, since we can't rely on a float128 being available), you have something that sounds defensible, but you still have another problem:<br>
<br>
2) The unsigned-integer (and signed byte) special case<br>
<br>
The standard also recommends the use of special values of BZERO to allow representation of unsigned 16/32/64-bit integers (and also signed 8-bit integers):<br>
<br>
"... the BZERO keyword is also used when storing unsigned-integer values in the FITS array. In this special case the BSCALE keyword *shall* have the default value of 1.0, and the BZERO keyword *shall* have one of the integer values shown in Table 11."<br>
<br>
Again, the spirit of this is obviously that if BSCALE is present with the value 1.0, and BITPIX is 32, say, and BZERO is 2147483648, the returned data should have type uint32, not some floating-point type. What is a client library supposed to do if BZERO is 2147483648.0? The same? What if it's 2147483648.0000000000000000000000000001? (In other words, is OK if the client library reads in the RHS of the BZERO header into an internal float64 first, before interpreting it, or is it supposed to handle the RHS of the BZERO header as a string and compare it only to exactly the value in Table 11?<br>
<br>
Note that in this case the standard actually appears to say point blank that a BSCALE of 1.0 *shall* be supplied; it certainly doesn't say, e.g., "the BSCALE keyword shall be omitted".<br>
<br>
NB: I have not tried to search the fitsbits archive -- I would not be at all surprised if this had come up before.<br>
<br>
Gregory<br>
<br>
________________________________________<br>
From: fitsbits <<a href="mailto:fitsbits-bounces@listmgr.nrao.edu" target="_blank">fitsbits-bounces@listmgr.nrao.edu</a><mailto:<a href="mailto:fitsbits-bounces@listmgr.nrao.edu" target="_blank">fitsbits-bounces@listmgr.nrao.edu</a>>> on behalf of Barrett, Paul via fitsbits <<a href="mailto:fitsbits@listmgr.nrao.edu" target="_blank">fitsbits@listmgr.nrao.edu</a><mailto:<a href="mailto:fitsbits@listmgr.nrao.edu" target="_blank">fitsbits@listmgr.nrao.edu</a>>><br>
Sent: Saturday, March 9, 2024 11:20<br>
To: <a href="mailto:fitsbits@listmgr.nrao.edu" target="_blank">fitsbits@listmgr.nrao.edu</a><mailto:<a href="mailto:fitsbits@listmgr.nrao.edu" target="_blank">fitsbits@listmgr.nrao.edu</a>><br>
Subject: [fitsbits] Output array type when BZERO is an integer {External}<br>
<br>
I'm writing a FITS package for the Julia programming language. I have a question about the output type of the image when BZERO is an integer value. The documentation implies that the output image should be a floating point type because the BSCALE value is a float. Is this correct? If yes, then I recommend stating this explicitly in the FITS standard documentation. I also recommend suggesting the appropriate output type depending on the input type, e.g., UInt8 => Float32, Int16 => Float32, etc.<br>
<br>
Thanks,<br>
Paul<br>
<br>
</blockquote></div>