[fitsbits] new FITS Compression value? {External}

Sat May 10 10:30:07 EDT 2025

Hi Josh and all,

Long time no chat. It’s good to meet you, Rithwik. I’m not sure who else is copied on this thread, but it should reach various FITS folk.

As the original author of FPACK, my best decision was to work through the FITS standard, and especially through a reference implementation using CFITSIO. I thoroughly support your apparent interest in doing the same. Many people, both before and after FPACK, have contributed to FITS tile compression, and there are manifold advantages to creating conforming FITS files. If astropy is one reference implementation for your project, CFITSIO could be the other, which would automatically also generate standalone FPACK support. Such a utility program is useful for innumerable purposes.

By design, FITS tile compression supports multiple codecs. The main reason RICE is FPACK’s default is speed, but also record-by-record simplicity. Back in the day, I could review an octal dump of a tile-compressed file and interpret the records one by one from first principles. That said, it was always envisioned that other algorithms could be added for either special or general purposes. Depending on the trade-offs between speed and size, FPACK’s default might even be updated.

I have only quickly reviewed your document. I see you cite both FPACK and our Paper I (“Lossless Astronomical Image Compression and the Effects of Noise”, PASP, 121(878):414), but not Paper II (“Optimal Compression of Floating-Point Astronomical Images Without Significant Loss of Information”, PASP 122(895):1065). Note that deploying the codec through CFITSIO would allow comparing any built-in JPEG-XL lossy mode with native lossy FITS tile compression using the same algorithm. (That is, if I understand correctly what you mean by “lossy context”.)

Regarding your benchmarking efforts, putting on my day job hat as the Catalina Sky Survey archivist, I’ll direct you to our many terabytes of imaging data (https://sbn.psi.edu/pds/resource/css.html). The raw images are in the *.fits.fz files, and calibrated images in *.arch.fz files. Please let me know if there’s any interest in extending your benchmarks.

I see your datasets don’t include DECam, which has been serving as a proxy for Rubin for many years. FPACK was positioned to support numerous ground-based projects during its early days, and there are several archives worth mining for benchmarks.

There are a couple of additional options to explore:

  1.  Astronomical data (including from cubesats) can be both imaging data (including data cubes) but also binary tables. Both may be packaged as multi-extension formats. FPACK also supports the FITS tiled table compression convention, and I’d recommend you consider doing the same. See the link in the third paragraph of https://heasarc.gsfc.nasa.gov/fitsio/fpack/

  1.  You say (section 4.6): “we defer the nuanced practical considerations of space-compatible hardware and runtime optimization to future investigations, noting that all presented algorithms leverage unoptimized reference implementations”. I would suggest that the use cases here extend to ground-based facilities, as well. Hardware-accelerated compression (that is, “efficient data representation”) could very productively be moved closer to the detector controllers. This is similar to cameras with built-in GPS modules for occultation work, or hardware time capture (e.g., https://arxiv.org/pdf/1807.01370). Which is to say, I wouldn’t defer such practicalities for very long.

I’m available to chat most weeks, and we should see who else can be looped in. This should be a simple addition to the FITS standard, though exactly what process needs to be followed these days would need to be settled through the IAU (Comm B2) FITS WG.

Best wishes!

Rob
rseaman at arizona.edu<mailto:rseaman at arizona.edu>

On 5/9/25, 8:31 AM, "William Pence" wrote:

________________________________

FYI

Begin forwarded message:
From: "Jaffe, Tess (GSFC-6601)" <tess.jaffe at nasa.gov>
Date: May 9, 2025 at 10:42:13 AM EDT
To: FTOOLS DIST <ftoolsdis at bigbang.gsfc.nasa.gov>
Subject: Fw: [EXTERNAL] Re: new FITS Compression value?

Anybody want to take on this discussion?  My name is on the site as responsible, but I'm not qualified to comment on the FITS standard, much less modify it.

________________________________
From: Perry Greenfield <perry at stsci.edu>
Sent: Friday, May 9, 2025 7:40 AM
To: Joshua S Bloom <joshbloom at berkeley.edu>
Cc: Jaffe, Tess (GSFC-6601) <tess.jaffe at nasa.gov>; Rithwik Sudharsan <rithwik at berkeley.edu>
Subject: [EXTERNAL] Re: new FITS Compression value?

CAUTION: This email originated from outside of NASA.  Please take care when clicking links or opening attachments.  Use the "Report Message" button to report suspicious messages to the NASA SOC.

Hi Joshua,

When you speak of lossy compression, are you retaining the current FITS scheme of tiling and quantizing, and then using JPEG-XL to compress the quantized tiles, or instead just applying the the compression to the whole image?

Rick White was heavily involved in this, more than I was, so if you don’t mind I will pass this along to him.

Thanks, Perry

> On May 8, 2025, at 6:21 PM, Joshua S Bloom <joshbloom at berkeley.edu> wrote:
>
> External Email - Use Caution
>
> Dear Tess (cc Perry),
>
> This is an inquiry to start a conversation about a potential addition to the FITS standard reserved value for compression.
>
> Background: My student Rithwik (cc’ed here) has been leading an effort to improve image compression (focused on space applications) and we’re launching a cubesat later this year using his results.
>
> After extensive testing we’ve found that the JPEG-XL codec offers significant improvements over RICE and HCOMPRESS for astronomical imaging both in the lossless and lossy contexts. (The RICE algorithm was mostly developed by Solomon Golomb in 1966, and later analyzed for astronomy purposes by William Pence in 2009. Two decades later, there have been few to no changes.)
>
> As part of our work, we recently published AstroCompress, a large benchmarking effort evaluating various compression codecs on 300 GB of diverse astronomy imagery: https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenreview.net%2Fpdf%3Fid%3DkQCHCkNk7s&data=05%7C02%7Ctess.jaffe%40nasa.gov%7C35cfb3af30e1479012c108dd8eee53b7%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C638823876444759243%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=l1xJzCNPPqvrJmwROEQ2ENIk%2FWFnMenFNYKv0sR6SV4%3D&reserved=0<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopenreview.net%2Fpdf%3Fid%3DkQCHCkNk7s&data=05%7C02%7Cftoolsdis%40bigbang.gsfc.nasa.gov%7Cbf1d02b12f5b432c56b108dd8f07aa71%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C638823985258528905%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=Mr1wTCUYCOOac1pMHdaPA43n7HX7nYg9xU80p8MC%2FHA%3D&reserved=0>
>
> We have conducted extensive testing across four datasets (James Webb, Hubble, SDSS, Keck) and across four codecs (JPEG-XL, JPEG-LS, JPEG-2000, RICE) and find JPEG-XL to be a clear ideal choice for the next generation of astronomy codecs.
>
> We show that JPEG-XL can regularly achieve 10-15% better lossless compression ratios relative to RICE. More astoundingly, when quantizing data (under the FITS astropy default settings for compressing floating-point data), JPEG-XL code-lengths are 20% superior to RICE. Finally, by passing a maxed-out “effort setting” to the JPEG-XL algorithm, these numbers increase by another 10%.
>
> During the pull request process to astropy to extend its fitsio compression capability, Rithwik encountered some hesitance to do this extension unless the official FITS header reserved values were also extended.
>
> Proposal: we’d like to propose reserving the ‘JPEGXL’ keyword value as an option for ‘ZCMPTYPE’ as outlined in table 36 of the FITS 4.0 standard: https://fits.gsfc.nasa.gov/standard40/fits_standard40aa-le.pdf<https://fits.gsfc.nasa.gov/standard40/fits_standard40aa-le.pdf>
> We would also add a parameter that denotes the ‘effort’ level of the JPEG-XL algorithm, which ranges from 0 to 9, and higher levels indicate improved compression performance.
>
>
> Keyword
> Newly Permitted Value(s)
> Default
> ZCMPTYPE
> JPEGXL
> -
> ZNAME1
> EFFORT
> -
> ZVAL1
> {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
> 7
>
> Would you be willing to have a chat about this, so we could present some further evidence and see if incorporating this into the FITS standard could be a possibility?
>
> Thanks in advance,
> Josh
>
> ________________________________________________________
>
> Joshua Bloom | Astronomy Dept | Professor | UC Berkeley | @profjsb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/fitsbits/attachments/20250510/449e3e80/attachment-0001.html>