[fitsbits] Output array type when BZERO is an integer {External} {External}

Mark Taylor m.b.taylor at bristol.ac.uk
Tue Mar 12 06:02:05 EDT 2024


Looking at the other comments in this thread I may be in a minority, 
but I don't see much problem with the existing text around BZERO 
and BSCALE (or TZEROn and TSCALn, which I've spent more time 
implementing against).  I would say it's the job of the FITS standard 
to explain how values are represented in the serialization, and the 
job of implementors to make language-appropriate choices about how 
to decode that serialization into language-appropriate data structures.

If the standard says that under certain circumstances you should decode
an array as a 32-bit IEEE754 datatype, and your language doesn't have
such a datatype, then you can't strictly speaking write a FITS decoder;
I think that would be regrettable.

To me it's clear that if BZERO is (implicitly or explicitly) zero
and BSCALE is (implicitly or explicitly) one for a given positive 
BITPIX then I understand the range of values that can be represented 
in the data and I will make an appropriate choice about how to 
represent those values to the user.
And if an author really writes BSCALE=0.99999999999999999999999999
I don't much care if they end up with a suboptimal representation
of their data from my library.

If we really want clarifications along the lines suggested in this
thread I would much prefer to see them in some kind of ancillary 
document like an implementation note, where they can be used as an 
aid to implementors for as long as they make sense, rather than 
included in the standard where they may inhibit future development 
of the standard or confuse implementors who come with different 
language-based constraints or expectations in the future.  
FITS has been stable over a very long period, and I wouldn't like 
to see assumptions built into the text that make sense today but 
may inhibit that longevity if such assumptions become invalid in
the future.

(I also agree that existing code should not define the format;
I found implementing from scratch against the FITS 3 standard quite
straightforward, while like Paul I gave up attempting to implement 
a reader for the code-defined Measurement Set format).

Mark

On Mon, 11 Mar 2024, Barrett, Paul via fitsbits wrote:

> The reason that I asked this question is because I had a similar problem
> with the radio astronomy's Measurement Set (MS) data format. I spent a lot
> of time trying to decipher the C++ code in order to understand the format,
> because there was no document that formalized the standard. The standard is
> the code. In several cases, I was able to decipher the hundred or so lines
> of C++ code to find out that I could provide the same functionality in a
> single line of code. I eventually gave up. So my point is that code should
> not be the standard: a clear and concise document should be. If you want
> people to use it, then it is incumbent on the proponents to produce such a
> document. Just as it is incumbent on me as a proponent of Julia to provide
> the necessary software to do astronomy. I am sorry to say that I am not a
> proponent of FITS, so I don't believe that it is incumbent on me to produce
> such a document. I'm writing FITS.jl out of necessity so that I and others
> can do science. Personally, I would prefer a more modern data format.
> 
> FITS.jl will eventually support tile compression, assuming that it is
> documented in the standard and I can understand it.
> 
> Here is another suggestion for improving the document. Group keywords, both
> required and optional, as they might appear in a heterogeneous or composite
> data type (i.e., a C struct). All modern languages have such data types. By
> discussing them as a group and showing how their presence or absence and
> values affect their behaviour, it will make it easier to understand the
> standard and provide hints to the developer. This is what I do when reading
> through the standard. I try to determine which keywords belong together in
> a composite type and how keywords modify the behaviour of that type. This
> will make it easy for any computer scientist to implement new code, because
> that is the way they are trained to think.
> 
>  -- Paul
> 
> On Mon, Mar 11, 2024 at 1:01 PM Dubois-Felsmann, Gregory P. <
> gpdf at ipac.caltech.edu> wrote:
> 
> > Hi, Paul,
> >
> > I agree entirely with you that these matters should be formally
> > clarified.  I don't have a lot of experience turning the crank of the
> > FITS-standard engine, however, so I'd be hard-pressed to give you an
> > estimate on how long this might take.  I do note that the history on
> > https://fits.gsfc.nasa.gov/fits_standard.html indicates that the last
> > time a point release was created was 19 years ago, and there's no recent
> > history of "errata" or similar clarifications.
> >
> > I may be over-interpreting what either you or Rob said in this thread, but
> > I didn't think Rob was suggesting in "other languages and libraries should
> > start with the CFITSIO source for appropriate usage" that you wrap CFITSIO
> > in Julia, but rather that you use it as a de-facto guide to the resolution
> > of ambiguities.  I can't whole-heartedly endorse that, because I don't
> > think the pressure should be taken off the standard to evolve to become
> > more precise, but it is a realistic suggestion for you to use in order to
> > keep working on your implementation: CFITSIO is very heavily used and I
> > would judge it to be relatively unlikely that the community would choose a
> > clarification to the standard that was inconsistent with its behavior in
> > any mainstream situation.
> >
> > Gregory
> >
> > P.S.  I certainly agree with Rob in strongly encouraging you to implement
> > tile compression.  Both of my projects (Rubin and SPHEREx) will generate
> > public data products with compressed image extensions.
> >
> > --
> > Gregory Dubois-Felsmann | Senior Staff Scientist | Caltech/IPAC
> > Science Platform Scientist, Vera C. Rubin Observatory
> > Pipeline System Designer, NASA SPHEREx mission
> > Mail Code MR 100-22 | Pasadena, CA 91125-2200 | gpdf at ipac.caltech.edu
> >
> >
> >
> > ________________________________________
> > From: Barrett, Paul <pebarrett at email.gwu.edu>
> > Sent: Monday, March 11, 2024 09:34
> > To: Dubois-Felsmann, Gregory P.
> > Cc: fitsbits at listmgr.nrao.edu
> > Subject: Re: [fitsbits] Output array type when BZERO is an integer
> > {External}
> >
> > Greg,
> >
> > Thanks for clarifying the impact of this issue. You have clarified several
> > points. It appears that my understanding of the document is different from
> > what you have described for some keyword cases.
> >
> > By explicitly specifying the type of the output array or the minimum type
> > of the output array in the document, it makes it much easier to implement
> > libraries for new languages. In addition, by specifying the behaviour when
> > a keyword is present or absent would also make it easier to implement.
> >
> > Julia is a very concise, yet high performance, programming language, so it
> > doesn't require a lot of code to implement the FITS standard. Because of
> > this, I spend more time trying to understand the FITS standard than I do
> > writing the code. This should not be the case. There are two reasons for
> > having a native Julia FITS package. First, Julia has an excellent package
> > manager, which makes it easy to install and maintain libraries or packages.
> > It can be cumbersome to have to manage and maintain packages that wrap
> > libraries in other languages. Second, Julia is a concise high performance
> > array language, so its performance is comparably and likely to be better
> > than C/C++ or FORTRAN with fewer lines of code. Because development of the
> > package is in its early stages, I have not benchmarked it against CFITSIO,
> > but I would not be surprised to see that it is faster in most cases.
> >
> >  -- Paul
> >
> >
> >
> > On Sun, Mar 10, 2024 at 5:41 PM Dubois-Felsmann, Gregory P. <
> > gpdf at ipac.caltech.edu<mailto:gpdf at ipac.caltech.edu>> wrote:
> > This is a bit of a logical/legal hole in the FITS standard, for a couple
> > of reasons given below.  I agree with Paul that the best solution would be
> > to issue a clarification.
> >
> > I think it's an excellent moment for getting this right, when implementing
> > a new client library from scratch instead of adiabatically as most of our
> > others have been.
> >
> > These questions also arise in the behavior of client applications -- for
> > instance, IPAC Firefly, which I have some responsibility for -- so this
> > isn't purely an issue for theoretical discussion.
> >
> > 1) Lack of clarity about the interpretation of missing headers and when
> > the scaling should be applied at all
> >
> > The FITS 4.00 standard specifies that BZERO and BSCALE both "shall" be
> > floating point values, with defaults of "0.0" and "1.0", respectively --
> > and there's no discussion of the absence of BZERO and BSCALE being treated
> > as a special case.  Thus, if you take this completely literally, it means
> > that the inherently floating-point scaling operation is *always* performed
> > (even when it's mathematically a no-op) and that the result should
> > therefore *always* be a floating-point array.  That is obviously not the
> > spirit of the standard!  It has to be possible to deliver an integer array
> > as the output, which means we need a specification of how to trigger that.
> >
> > Even then we are left to ask the question: if the absence of BZERO and
> > BSCALE is supposed to trigger returning the array as its specified integer
> > format, what is *explicit* specification of BZERO = 0.0 and BSCALE = 1.0
> > supposed to do?  The most legalistic reading of the standard is that the
> > absence of a keyword and the explicit presence of the default value for
> > that keyword are supposed to be treated the same way, but then we have
> > define what it means for the explicitly specified value to be the same as
> > the default.  The text says "1.0"; which of the following are equivalent:
> > "1", "1.", "+1.0D0", "1.000000001" (equal to 1 in float32),
> > "1.0000000000000000000000001", "0.99999999999999999999999999"?
> >
> > If, as I do, you feel queasy about testing for floating-point equality in
> > this situation, and you think "OK, the rule we should publish is:
> >
> > 'if either BZERO or BSCALE are present in the header, even if they are
> > exactly 0.0 and 1.0 respectively, return the array in a floating-point
> > format in which all possible values of the input are distinct in the
> > output, if possible'"
> >
> > (meaning that BITPIX 32 would yield a float64 output, and BITPIX 64 would
> > have to be annotated as an exception, since we can't rely on a float128
> > being available), you have something that sounds defensible, but you still
> > have another problem:
> >
> > 2) The unsigned-integer (and signed byte) special case
> >
> > The standard also recommends the use of special values of BZERO to allow
> > representation of unsigned 16/32/64-bit integers (and also signed 8-bit
> > integers):
> >
> > "... the BZERO keyword is also used when storing unsigned-integer values
> > in the FITS array. In this special case the BSCALE keyword *shall* have the
> > default value of 1.0, and the BZERO keyword *shall* have one of the integer
> > values shown in Table 11."
> >
> > Again, the spirit of this is obviously that if BSCALE is present with the
> > value 1.0, and BITPIX is 32, say, and BZERO is 2147483648, the returned
> > data should have type uint32, not some floating-point type.  What is a
> > client library supposed to do if BZERO is 2147483648.0?  The same?  What if
> > it's 2147483648.0000000000000000000000000001?  (In other words, is OK if
> > the client library reads in the RHS of the BZERO header into an internal
> > float64 first, before interpreting it, or is it supposed to handle the RHS
> > of the BZERO header as a string and compare it only to exactly the value in
> > Table 11?
> >
> > Note that in this case the standard actually appears to say point blank
> > that a BSCALE of 1.0 *shall* be supplied; it certainly doesn't say, e.g.,
> > "the BSCALE keyword shall be omitted".
> >
> > NB: I have not tried to search the fitsbits archive -- I would not be at
> > all surprised if this had come up before.
> >
> > Gregory
> >
> > ________________________________________
> > From: fitsbits <fitsbits-bounces at listmgr.nrao.edu<mailto:
> > fitsbits-bounces at listmgr.nrao.edu>> on behalf of Barrett, Paul via
> > fitsbits <fitsbits at listmgr.nrao.edu<mailto:fitsbits at listmgr.nrao.edu>>
> > Sent: Saturday, March 9, 2024 11:20
> > To: fitsbits at listmgr.nrao.edu<mailto:fitsbits at listmgr.nrao.edu>
> > Subject: [fitsbits] Output array type when BZERO is an integer {External}
> >
> > I'm writing a FITS package for the Julia programming language. I have a
> > question about the output type of the image when BZERO is an integer value.
> > The documentation implies that the output image should be a floating point
> > type because the BSCALE value is a float. Is this correct? If yes, then I
> > recommend stating this explicitly in the FITS standard documentation. I
> > also recommend suggesting the appropriate output type depending on the
> > input type, e.g., UInt8 => Float32, Int16 => Float32, etc.
> >
> > Thanks,
> > Paul
> >
> >
> 

--
Mark Taylor  Astronomical Programmer  Physics, Bristol University, UK
m.b.taylor at bristol.ac.uk          https://www.star.bristol.ac.uk/mbt/


More information about the fitsbits mailing list