[fitsbits] Associating ancillary data with primary HDU

Sat May 10 11:27:47 EDT 2014

On May 10, 2014, at 1:58 AM, Thierry.Forveille at ujf-grenoble.fr wrote:

> Quoting William Thompson <William.T.Thompson at nasa.gov>:
> 
>> To the general FITS community:
>> 
>> I've been asked if there are any specific conventions for associating ancillary data with primary data arrays.  The specific application is one where the exposure time differs from pixel to pixel...  The simplest and most obvious approach would be to store the actual data in the primary HDU, and then store the exposure times in an extension with the same dimensionality.

> ...one even simpler approach would be to store both in a datacube, with a NAXISi=2 describing the (data,time) doublets.  That could be NAXIS3=2 if your access patterns prefer one array then the other, or NAXIS1=2 if you'd rather keep the two quantities for one pixel close together.

Access will often be more efficient operating on vectors / arrays rather than per-pixel data structures.  In particular data compression will benefit from gathering like-minded quantities together, pixels with pixels, times with times.  For instance a large advantage is realized with database compression in rearranging the tables into column-major order.  This is what FPACK does when compressing binary tables, rather than retaining the complexity (and hence, entropy) of a per-row ordering.

> One obvious downside is that this forces the same BITPIX for both quantities, but I'd argue that data values are intrinsically just as much of a float as time :-) Both probably come out of your electronics as ints (from an ADC on one side and a counter on the other), and one probably wants comparable accuracies on the two quantities to compute the flux as their ratio.

There is nothing magic about IEEE floating point, and one could argue that the quantum nature of reality makes integer representations more "intrinsic" than mantissa+exponent :-)

A more basic issue than the logistics of ordering or datatype is the noise model.  The pixels will typically have a gaussian noise floor + poisson background and sparse signal (+ oddities like fixed-pattern noise) and for 16-bit data will range over 4-5 orders of magnitude (depending on the gain).  The nature of the exposure time map will vary, but one imagines most instruments will be constructed such that exposure is relatively flat, varying by just a factor of a few over the focal plane (or focal plane equivalent), perhaps with isolated areas of particularly low or high exposure (similar to FPN)?  In particular it's hard to see why there would be a shot noise (poisson) component to the timing variations, and generally the timing uncertainties should be more systematic.  How precisely are the exposure values known?  Would one want to keep a variance map for the timekeeping?  Which is to say that the familiar distinction between accuracy and precision might be entertainingly different between pixels and durations.

> This is mostly for completeness and I'd personally go with the binary table solution, but your mileage may vary :-)

With tile compression the distinction between image arrays and tables blurs.  All FITS files become structured and indexed tables.  Data may be scaled, datatypes may be remapped, output may be reordered from input.

Rob