[fitsbits] Output array type when BZERO is an integer {External}
Jonas
jonas.repo at protonmail.com
Sun Mar 24 07:47:15 EDT 2024
Thank you for this info.
>
Hi Gregory and all,
As a matter of curiosity, do Rubin operations depend on 64-bit unsigned integers? What are example use cases for 64-bit integers (signed or unsigned) in the community? In the optical and infrared, I would hazard a guess that by far, the most prevalent raw and pipeline-reduced astronomical pixel data types are unsigned shorts, signed 32-bit integers, and 32-bit floating-point, but a greater diversity of data types must appear in binary tables.
Is there a more recent version of a FITS User’s Guide than https://archive.stsci.edu/fits/users_guide/ ? Or are there examples of such documentation tailored for particular observatories, projects, purposes, stakeholders?
Rob
On 3/12/24, 11:52 AM, "Dubois-Felsmann, Gregory P." wrote:
External Email
I think what we're hearing from the more experienced hands is that that the standard isn't at all concerned with issues like mandating upconversion. In general it's just specifying the mathematical, not the computational, operation, and it is never mandating a specific in-memory representation in a client.
But I think there is some guidance at the end of the BZERO section in 4.4.2.5: once the client software has made the (underspecified) determination that BZERO is from Table 11 and BSCALE has "the default value of 1.0", it says "the physical value is computed by adding the offset specified by the BZERO keyword to the native data type value that is stored in the FITS file". Note that BSCALE is explicitly left out of this; i.e., in this case the standard isn't just relying on the mathematical no-op of multiplying by one, but is saying explicitly that the client should ignore BSCALE in the calculation.
We haven't really talked about it, but all the same issues arise with regard to table columns, because of the similar definition of the currently very rarely used TSCALn and TZEROn keywords as having "default" values, rather than distinguishing between the "provided explicitly" and "not provided" cases.
Again, as a data publisher, I know the difference between "I am publishing an integer column that I expect users to see as integral" versus "I am publishing a generic numeric column and I'm trying to save space by packing it into a 16-bit integer and scaling it". It would be nice to have a spec that says somewhat more precisely what I'm supposed to do to convey the former message unambiguously, particularly because if it's a 64-bit integer I very much indeed want to send client software a signal that I don't want it to be accidentally "promoted" to 64-bit float.
Obviously as a data provider I'm not going to troll my users by fiddling with my melodramatic 0.999999999999999s. For signed integers I'm going to omit BZERO/TZEROn and BSCALE/TSCALn altogether and I'm confident that Paul's library will do what I want no matter what we say on this thread.
But if I'm trying to publish an unsigned integer, I am genuinely uncertain about whether I have to worry about whether a client's behavior will depend on whether I say "9223372036854775808" or "9223372036854775808.", and the consequence of it mattering is potentially information-destroying. I will err on the side of caution and write the former, of course, but I'd rather have the standard on my side here.
As a side remark, I hope we can discuss such things without the people who put so much work into what is already in the standard feeling denigrated, and avoiding value judgements about the quality of people's work.
Gregory
________________________________________
From: fitsbits <fitsbits-bounces at listmgr.nrao.edu> on behalf of Barrett, Paul via fitsbits <fitsbits at listmgr.nrao.edu>
Sent: Tuesday, March 12, 2024 07:57
To: Seaman, Robert Lewis - (rseaman)
Cc: fitsbits at listmgr.nrao.edu
Subject: Re: [fitsbits] Output array type when BZERO is an integer {External}
I'll ask this question one more time and then I'll let it go.
I understand that the default behaviour for BZERO and BSCALE creates a floating point array because of the typical upconversion rules. However, I'm not clear about the data type for the special case where BZERO is an integer. In this case, it appears that BZERO is added first to the integer array before converting it to a floating point array, because BSCALE = 1.0 implies upconversion. Is this correct?
As for your comments:
* I disagree with your first comment. FITS is used because of peer pressure. It is mandated by NASA. That means a large sector of the community HAS to use it.
* Yes, dynamic languages are dynamic enough. In the case of Julia, it can do everything that C/C++, FORTRAN, and Python can do. Think of Julia as Python with Numba built-in.
-- Paul
On Tue, Mar 12, 2024 at 9:39 AM Seaman, Robert Lewis - (rseaman) via fitsbits <fitsbits at listmgr.nrao.edu<mailto:fitsbits at listmgr.nrao.edu>> wrote:
Howdy,
It is always good to see a spirited FITS discussion! A few more peppy points:
* There is always an assertion that it would be preferable to use a “modern” format
* Yet projects often end up using FITS
* This choice does not result from peer pressure
* There is nothing magic about IEEE floating point or twos-complement integers
* Efficient (compressed) data representations may not even be binary (Rice is unary)
* Are dynamically typed languages dynamic enough?
* A tile-compressed image is a simple binary table
* My first encounter with FITS data (c. 1983) was writing a FITS image reader from scratch by consulting the original journal article(s) (possibly also my first encounter with C)
* I am confident young Rob could have written a reader for tile-compressed binary data with little more effort (or code) just from reading the current FITS standard
* FITS documentation is pretty good
* (Comments about other projects’ documentation omitted)
* Most FITS discussions/disagreements are about metadata
* Only a small minority of FITS metadata is strictly required to enforce the structure of each extension
* Science metadata (astronomical and computer science) would be legal (and trivial) to represent, using any schema you like, in a binary table structure, described in a convention or appendix or chapter of the standard
* Schemata could also include language-specific pragma, for data-typing purposes or otherwise
* It is perhaps peer pressure that pushes projects to use 80-char ASCII header keywords in 2880-byte records
* Consider, rather, what is the optimal tiled representation for your project, and separately
* How can your project’s (and community) metadata best be represented in a schema realized as a binary table?
Rob
_______________________________________________
fitsbits mailing list
fitsbits at listmgr.nrao.edu<mailto:fitsbits at listmgr.nrao.edu>
https://listmgr.nrao.edu/mailman/listinfo/fitsbits
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
https://www.vpnpalvelut.com/vpn-arvostelut/
<http://listmgr.nrao.edu/pipermail/fitsbits/attachments/20240312/3794d446/attachment-0001.html>
More information about the fitsbits
mailing list