[fitsbits] Associating ancillary data with primary HDU

Wed May 21 03:56:01 EDT 2014

Dear all,

[Terje: please read through to catch any inconsistencies/cut-and-past errors etc ;-]

As the originator of the original question, I'd like to elaborate a bit. Our situation is as follows:

We will need the capability to have multiple "data extensions" in a single file, allowing an arbitrary set of keywords to vary with any subset of the data dimensions. A typical example would be XPOSURE in observations where automatic exposure control is in effect.

I.e. we could have a data extension w/dimensions (x,y,time) = (120,120,500), with XPOSURE varying with time, but constant for each (x,y) plane.

Our current suggestion is to store the values either in a bintable extension or in a regular extension (we prefer regular extensions). The extension in this case would contain an array w/dimensionality (1,1,500).

Keywords that are constant in time but vary in the (x,y) plane would be in extensions with dimensions (120,120,1), etc. This seems like an "obvious" solution to us.

This must also work with non-mandatory keywords, though. So there needs to be a way to signal that "this keyword is not actually missing, but you can find it in a separate (bintable or regular) extension".

We do not wish to make it necessary to gobble up the entire file in order to search for such potential tabulated keywords. So we propose a keyword named TABULATE, TABULATD ("tabulated"), or TABULATK ("tabulated keywords") containing a comma-separated list of keywords that are handled with this mechanism.

Using a mechanism where the keyword value is actually equal to the name of the relevant extension would be a bit messy - especially for string-valued keywords!

So our idea is to have another keyword with a related name, such as TAB_EXTN or TABULATN containing a comma-separated list of extension names containing the *corresponding* extension names. Thus unique extensions *may* be specified for each data extension for some keywords, whereas a single extension may be specified for keywords that are identical for all data extensions (i.e. reused by multiple data extensions).

I assume that Paul's method uses 

      EXTNAME[keyword-ext]==KW_NAME    &&     EXTVER[keyword-ext]==EXTVER[data-ext]

to link the data extension keyword and the keyword extension.

However, this would *require* a separate keyword extension for all tabulated keywords in *each* data extension, with no mechanism to save space by "reusing" a keyword extension - since the combination of EXTNAME and EXTVER is required [isn't it?] to be unique throughout the file. 

In e.g. a fits file containing *many* spectral window extracts from a spectrometer, this could potentially mean *many* repetitions of the same keyword extension for e.g. XPOSURE, temperatures, etc!

I assume Paul's method requires EXTVER to be unique for each data extension, *requiring* a separate table for each data extension.

Our convention could of course be modified to use a *single* keyword by introducing a "syntax" for the TABULATE keyword, such as "KEYWORD1[EXTNAME1],KEYWORD2[EXTNAME2],...". Or it could be modified in other ways, I suppose.

Where linear interpolation of the keyword value is good enough, the convention could also be augmented by e.g. allowing an array with smaller dimensions than the data extensions (though *always* the same *number* of extensions). E.g. (x,y,time) = (2,2,1) in the above example for a keyword varying in the spatial plane but constant for each exposure.

All of this could, of course, be built into a standard routine to read keywords.

We believe that whatever we (a workgroup in an EU project called SOLARNET) adopt as a recommendation would "quite soon" be used by "quite a few" solar processing pipelines. Our recommendation is due in fall, but that won't necessarily be carved in stone, since the pipelines would not yet have been implemented and used.

Your thoughts?

Sincerely,
Stein Haugan

On 2014/05/09, at 22:24, William Thompson <William.T.Thompson at nasa.gov> wrote:

> To the general FITS community:
> 
> I've been asked if there are any specific conventions for associating ancillary data with primary data arrays.  The specific application is one where the exposure time differs from pixel to pixel (something that can be done with Active Pixel Sensors), but which could easily apply to other parameters which vary between pixels.
> 
> The simplest and most obvious approach would be to store the actual data in the primary HDU, and then store the exposure times in an extension with the same dimensionality.  For example, if the primary HDU had
> 
> SIMPLE  =                    T
> BITPIX  =                   16
> NAXIS   =                    2
> NAXIS1  =                 1024
> NAXIS2  =                 1024
> EXTEND  =                    T
> 
> The extension would have
> 
> XTENSION=              'IMAGE'
> BITPIX  =                  -32
> NAXIS   =                    2
> NAXIS1  =                 1024
> NAXIS2  =                 1024
> EXTNAME =            'XPOSURE'
> 
> In essence, this is similar to the Green Bank Convention, but applied to the individual pixels in a data array rather than to rows in a binary table.
> 
> Is this a commonly used method for associating ancillary data with primary images?  Are there any additional conventions that are appropriate to this situation?  I tried looking in
> 
> http://fits.gsfc.nasa.gov/fits_conventions.html
> 
> but couldn't find anything that seemed relevant.
> 
> One could also imagine binary tables where the primary data array is in one column, and the array of exposure times is in another column.  However, for the present application, the use of IMAGE extensions is far simpler, and more likely to be actually adopted.
> 
> Thank you,
> 
> Bill Thompson
> 
> 
> -- 
> William Thompson
> NASA Goddard Space Flight Center
> Code 671
> Greenbelt, MD  20771
> USA
> 
> 301-286-2040
> William.T.Thompson at nasa.gov