[fitsbits] BINTABLE convention for >999 columns

Francois-Xavier PINEAU francois-xavier.pineau at astro.unistra.fr
Fri Jul 7 10:46:37 EDT 2017


Dear all,

First, thank you very much for this mail Mark.

As you wrote, I 'favour a different mechanism for encoding the extended 
column metadata'.

Here my suggestion:

I propose to use a mechanism like HIERARCH, maybe the HIERACH keyword 
itself or an OVERLOAD keyword (or something similar).

Often, when a software reads metadata, it puts FITS cards into a map 
having for key the card keyword.
I propose something like:

TFIELDS = 999           / For HIERARCH unaware readers
HIERARCH  TFIELDS 1204  / For HIERARCH aware readers
...
TTYPE999= 'XT_MORECOLS' / For HIERARCH unaware readers
TFORM999= '813I    '    / For HIERARCH unaware readers
HIERARCH  TTYPE999 'var_min_u_2'   / For HIERARCH aware readers
HIERARCH  TFORM999 'D'             / For HIERARCH aware readers
HIERARCH  TTYPE1000 'var_prob_h_2' / For HIERARCH aware readers
HIERARCH  TFORM1000 'D'            / For HIERARCH aware readers
...


Then, the keyword of the card
HIERARCH  TFIELDS 1204
is TFIELDS and, if put in a map, the (object containing the) value 1204 
will overload the previous value (i.e. 999).
Similarly, the keyword of the card
HIERARCH  TTYPE999 'var_min_u_2'
is TTYPE999 and, if put in map, the value 'var_min_u_2' will overload 
the previous value of TTYPE999 (i.e. 'XT_MORECOLS').

*If a FITS reader uses maps to store FITS cards and is HIERARCH aware, 
then **
**it should be able to load tables with more than 999 rows without 
changing or adding a single line of code!*



Le 07/07/2017 à 13:09, Mark Taylor a écrit :
> Dear fitsbits,
>
> I am considering a convention for storing table data in FITS files
> where the number of columns exceeds the 999 limit implicitly imposed
> by the standard BINTABLE extension type.  I have running code for
> this (available on request) and plan to incorporate it in future
> releases of STIL/STILTS/TOPCAT so that people can work with wide
> tables in FITS while using those tools.  People using software
> that is unaware of this convention would still see a legal BINTABLE
> but not the later columns.
>
> I'm posting the details here in case people want to comment,
> or point out some major problem with the idea that I might have
> overlooked, or tell me that there's already a convention for
> this out there that I should be using instead.  Otherwise, please
> feel free to ignore this post.  I'm not requesting that any
> other software implements this, though if anyone wants to I
> certainly don't object.
>
> Mark
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>
> Extended column convention for FITS BINTABLE
> --------------------------------------------
>
> The BINTABLE extension type as described in the FITS Standard
> (FITS Standard v3.0, sec 7.3) requires table column metadata
> to be described using 8-character keywords of the form XXXXXnnn,
> where XXXXX represents one of an open set of mandatory, reserved
> or user-defined root keywords up to five characters in length,
> for instance TFORM (mandatory), TUNIT (reserved), TUCD (user-defined).
> The nnn part is an integer between 1 and 999 indicating the
> index of the column to which the keyword in question refers.
> Since the header syntax confines this indexed part of the keyword
> to three digits, there is an upper limit of 999 columns in
> BINTABLE extensions.
>
> Note that the FITS/BINTABLE format does not entail any restriction on
> the storage of column *data* beyond the 999 column limit in the data
> part of the HDU, the problem is just that client software
> cannot be informed about the layout of this data using the
> header cards in the usual way.
>
> In some cases it is desirable to store FITS tables with a column
> count greater than 999.  Whether that's a good idea is not within
> the scope of this discussion.
>
> To achieve this, I propose the following convention.
>
> Definitions:
>
>   - 'BINTABLE columns' are those columns defined using the
>        FITS BINTABLE standard
>
>   - 'Data columns' are the columns to be encoded
>
>   - N_TOT is the total number of data columns to be stored
>
>   - Data columns with (1-based) indexes from 999 to N_TOT inclusive
>        are known as 'extended' columns.  Their data is stored
>        within the 'container' column.
>
>   - BINTABLE column 999 is known as the 'container' column
>        It contains the byte data for all the 'extended' columns.
>   
> Convention:
>
>   - All column data (for columns 1 to N_TOT) is laid out in the data part
>        of the HDU in exactly the same way as if there were no 999-column
>        limit.
>
>   - The TFIELDS header is declared with the value 999.
>
>   - The container column is declared in the header with some
>        TFORM999 value corresponding to the total field length required
>        by all the extended columns ('B' is the obvious data type, but
>        any legal TFORM value that gives the right width MAY be used).
>        The byte count implied by TFORM999 MUST be equal to the
>        total byte count implied by all extended columns.
>
>   - Other XXXXX999 headers MAY optionally be declared to describe
>        the container column in accordance with the usual rules,
>        e.g. TTYPE999 to give it a name.
>
>   - The NAXIS1 header is declared in the usual way to give the width
>        of a table row in bytes.  This is equal to the sum of
>        all the BINTABLE columns as usual.  It is also equal to
>        the sum of all the data columns, which has the same value.
>
>   - Headers for Data columns 1-998 are declared as usual,
>        corresponding to BINTABLE columns 1-998.
>
>   - Keyword XT_ICOL indicates the index of the container column.
>        It MUST be present with the integer value 999 to indicate
>        that this convention is in use.
>
>   - Keyword XT_NCOL indicates the total number of data columns encoded.
>        It MUST be present with an integer value equal to N_TOT.
>
>   - Metadata for each extended column is encoded with keywords
>        of the form XXXXXaaa, where XXXXX are the same keyword roots
>        as used for normal BINTABLE extensions, and aaa is a 3-digit
>        value in base 26 using the characters 'A' (0 in base 26) to
>        'Z' (25 in base 26), and giving the 1-based data column index
>        minus 999.  The sequence aaa MUST be exactly three characters
>        long (leading 'A's are required).  Thus the formats for data
>        columns 999, 1000, 1001, etc are declared with the keywords
>        TFORMAAA, TFORMAAB, TFORMAAC etc.
>
>   - This convention MUST NOT be used for N_TOT<=999.
>
> The resulting HDU is a completely legal FITS BINTABLE extension.
> Readers aware of this convention may use it to extract column
> data and metadata beyond the 999-column limit.
> Readers unaware of this convention will see 998 columns in their
> intended form, and an additional (possibly large) column 999
> which contains byte data but which cannot be easily interpreted.
>
> This convention can therefore allow encoding of tables with data
> column counts N_TOT up to 998+26^3 = 18574.
>
> An example header might look like this:
>
>     XTENSION= 'BINTABLE'           /  binary table extension
>     BITPIX  =                    8 /  8-bit bytes
>     NAXIS   =                    2 /  2-dimensional table
>     NAXIS1  =                 9229 /  width of table in bytes
>     NAXIS2  =                   26 /  number of rows in table
>     PCOUNT  =                    0 /  size of special data area
>     GCOUNT  =                    1 /  one data group
>     TFIELDS =                  999 /  number of columns
>     XT_ICOL =                  999 /  index of container column
>     XT_NCOL =                 1204 /  total columns including extended
>     TTYPE1  = 'posid_1 '           /  label for column 1
>     TFORM1  = 'J       '           /  format for column 1
>     TTYPE2  = 'instrument_1'       /  label for column 2
>     TFORM2  = '4A      '           /  format for column 2
>     TTYPE3  = 'edge_code_1'        /  label for column 3
>     TFORM3  = 'I       '           /  format for column 3
>     TUCD3   = 'meta.code.qual'
>      ...
>     TTYPE998= 'var_min_s_2'        /  label for column 998
>     TFORM998= 'D       '           /  format for column 998
>     TUNIT998= 'counts/s'           /  units for column 998
>     TTYPE999= 'XT_MORECOLS'        /  label for column 999
>     TFORM999= '813I    '           /  format for column 999
>     TTYPEAAA= 'var_min_u_2'        /  label for column 999
>     TFORMAAA= 'D       '           /  format for column 999
>     TUNITAAA= 'counts/s'           /  units for column 999
>     TTYPEAAB= 'var_prob_h_2'       /  label for column 1000
>     TFORMAAB= 'D       '           /  format for column 1000
>      ...
>     TTYPEAHW= 'var_prob_w_2'       /  label for column 1203
>     TFORMAHW= 'D       '           /  format for column 1203
>     TTYPEAHX= 'var_sigma_w_2'      /  label for column 1204
>     TFORMAHX= 'D       '           /  format for column 1204
>     TUNITAHX= 'counts/s'           /  units for column 1204
>     END
>
> This general approach was suggested by William Pence on the FITSBITS
> list in June 2012
> (https://listmgr.nrao.edu/pipermail/fitsbits/2012-June/002367.html),
> and by Francois-Xavier Pineau (CDS) in private conversation in 2016.
> The details have been filled in by Mark Taylor (Bristol).
> (F-X favours a different mechanism for encoding the extended
> column metadata).
>
> --
> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
>
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/fitsbits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/fitsbits/attachments/20170707/64914140/attachment-0001.html>


More information about the fitsbits mailing list