[fitsbits] Associating ancillary data with primary HDU

Fri May 23 10:31:02 EDT 2014

You say that you have a preference for regular extensions (which I assume means 'IMAGE' extension), but binary tables offer a very natural way of associating variables by writing the values in different columns of the table.  In your example the main column in the table would contain the (120, 120, 500) data cube as a vector, then the 'XPOSURE' column could contain the vector of 500 exposure values, and other columns could could contain (120, 120) vectors of values that vary in the X, Y plane.  This table would only have one row, but in principle you could store multiple observations in multiple rows (e.g., all the spectral window extracts from a single observation could be in one table).  With this arrangement it would then be easy for software to determine if a keyword such as XPOSURE has a scalar value (in which case it is written as a header keyword) or is a vector (in which case it is written as a vector column).  Everything associated with that observation is stored in a single binary table, so this completely eliminates the need to invent new complicated conventions for associating different extensions with each other.  As an additional enhancement, if you find that the vectors in multiple rows of the table have identical values, you could write the vector once into a variable length array column, then all the other multiple instances of that vector could point to that same vector, to save on disk space.

-Bill Pence

> On May 21, 2014, at 3:56 AM, Stein Vidar Hagfors Haugan <s.v.h.haugan at astro.uio.no> wrote:
> 
> Dear all,
> 
> [Terje: please read through to catch any inconsistencies/cut-and-past errors etc ;-]
> 
> As the originator of the original question, I'd like to elaborate a bit. Our situation is as follows:
> 
> We will need the capability to have multiple "data extensions" in a single file, allowing an arbitrary set of keywords to vary with any subset of the data dimensions. A typical example would be XPOSURE in observations where automatic exposure control is in effect.
> 
> I.e. we could have a data extension w/dimensions (x,y,time) = (120,120,500), with XPOSURE varying with time, but constant for each (x,y) plane.
> 
> Our current suggestion is to store the values either in a bintable extension or in a regular extension (we prefer regular extensions). The extension in this case would contain an array w/dimensionality (1,1,500).
> 
> Keywords that are constant in time but vary in the (x,y) plane would be in extensions with dimensions (120,120,1), etc. This seems like an "obvious" solution to us.
> 
> This must also work with non-mandatory keywords, though. So there needs to be a way to signal that "this keyword is not actually missing, but you can find it in a separate (bintable or regular) extension".
> 
> We do not wish to make it necessary to gobble up the entire file in order to search for such potential tabulated keywords. So we propose a keyword named TABULATE, TABULATD ("tabulated"), or TABULATK ("tabulated keywords") containing a comma-separated list of keywords that are handled with this mechanism.
> 
> Using a mechanism where the keyword value is actually equal to the name of the relevant extension would be a bit messy - especially for string-valued keywords!
> 
> So our idea is to have another keyword with a related name, such as TAB_EXTN or TABULATN containing a comma-separated list of extension names containing the *corresponding* extension names. Thus unique extensions *may* be specified for each data extension for some keywords, whereas a single extension may be specified for keywords that are identical for all data extensions (i.e. reused by multiple data extensions).
> 
> I assume that Paul's method uses 
> 
>      EXTNAME[keyword-ext]==KW_NAME    &&     EXTVER[keyword-ext]==EXTVER[data-ext]
> 
> to link the data extension keyword and the keyword extension.
> 
> However, this would *require* a separate keyword extension for all tabulated keywords in *each* data extension, with no mechanism to save space by "reusing" a keyword extension - since the combination of EXTNAME and EXTVER is required [isn't it?] to be unique throughout the file. 
> 
> In e.g. a fits file containing *many* spectral window extracts from a spectrometer, this could potentially mean *many* repetitions of the same keyword extension for e.g. XPOSURE, temperatures, etc!
> 
> I assume Paul's method requires EXTVER to be unique for each data extension, *requiring* a separate table for each data extension.
> 
> Our convention could of course be modified to use a *single* keyword by introducing a "syntax" for the TABULATE keyword, such as "KEYWORD1[EXTNAME1],KEYWORD2[EXTNAME2],...". Or it could be modified in other ways, I suppose.
> 
> Where linear interpolation of the keyword value is good enough, the convention could also be augmented by e.g. allowing an array with smaller dimensions than the data extensions (though *always* the same *number* of extensions). E.g. (x,y,time) = (2,2,1) in the above example for a keyword varying in the spatial plane but constant for each exposure.
> 
> All of this could, of course, be built into a standard routine to read keywords.
> 
> We believe that whatever we (a workgroup in an EU project called SOLARNET) adopt as a recommendation would "quite soon" be used by "quite a few" solar processing pipelines. Our recommendation is due in fall, but that won't necessarily be carved in stone, since the pipelines would not yet have been implemented and used.
> 
> Your thoughts?
> 
> Sincerely,
> Stein Haugan
> 
>> On 2014/05/09, at 22:24, William Thompson <William.T.Thompson at nasa.gov> wrote:
>> 
>> To the general FITS community:
>> 
>> I've been asked if there are any specific conventions for associating ancillary data with primary data arrays.  The specific application is one where the exposure time differs from pixel to pixel (something that can be done with Active Pixel Sensors), but which could easily apply to other parameters which vary between pixels.
>> 
>> The simplest and most obvious approach would be to store the actual data in the primary HDU, and then store the exposure times in an extension with the same dimensionality.  For example, if the primary HDU had
>> 
>> SIMPLE  =                    T
>> BITPIX  =                   16
>> NAXIS   =                    2
>> NAXIS1  =                 1024
>> NAXIS2  =                 1024
>> EXTEND  =                    T
>> 
>> The extension would have
>> 
>> XTENSION=              'IMAGE'
>> BITPIX  =                  -32
>> NAXIS   =                    2
>> NAXIS1  =                 1024
>> NAXIS2  =                 1024
>> EXTNAME =            'XPOSURE'
>> 
>> In essence, this is similar to the Green Bank Convention, but applied to the individual pixels in a data array rather than to rows in a binary table.
>> 
>> Is this a commonly used method for associating ancillary data with primary images?  Are there any additional conventions that are appropriate to this situation?  I tried looking in
>> 
>> http://fits.gsfc.nasa.gov/fits_conventions.html
>> 
>> but couldn't find anything that seemed relevant.
>> 
>> One could also imagine binary tables where the primary data array is in one column, and the array of exposure times is in another column.  However, for the present application, the use of IMAGE extensions is far simpler, and more likely to be actually adopted.
>> 
>> Thank you,
>> 
>> Bill Thompson
>> 
>> 
>> -- 
>> William Thompson
>> NASA Goddard Space Flight Center
>> Code 671
>> Greenbelt, MD  20771
>> USA
>> 
>> 301-286-2040
>> William.T.Thompson at nasa.gov
> 
> 
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.cv.nrao.edu
> http://listmgr.cv.nrao.edu/mailman/listinfo/fitsbits