[fitsbits] BINTABLE convention for >999 columns

Rob Seaman seaman at lpl.arizona.edu
Mon Jul 10 12:32:52 EDT 2017


I should comment that there's no reason this wouldn't compress as normal
using fpack, but the container column would not generally compress
efficiently because of the mixed data types. A future update to fpack
could become wide-table aware if deemed desirable.

Rob

--


On 7/10/17 9:25 AM, Rob Seaman wrote:
>
> Thanks for the info about usage context. Separating the tables into
> multiple files or extensions still seems a reasonable way to address
> these cases, but since Mark T's proposed convention (apparently
> originally from Bill) is legal or near-legal FITS usage, the main
> question is how best to discourage a diversity of keyword encodings, etc.
>
> Also agree with Mark C's encoding, though would suggest mono-case will
> be less of a confusing change than lower case. Mark C's option avoids
> confusing usage like TFORM0AA or whatever interrupting the sort order.
> A digit in character 6 would require digits in #7 and 8.
>
> Nobody has mentioned extremely wide table use cases (millions of
> columns), and 34695 columns is enough to cover all the wide table DB
> options listed in a previous email.
>
> Rob
> --
>
> On 7/10/17 8:34 AM, Arnold Rots wrote:
>> From all the suggestions offered so far, Mark's is by far the most
>> sensible in my opinion since it provides a significant expansion
>> while preserving full backward compatibility.
>>
>>   - Arnold
>>
>> -------------------------------------------------------------------------------------------------------------
>> Arnold H. Rots                                          Chandra X-ray
>> Science Center
>> Smithsonian Astrophysical Observatory                   tel:  +1 617
>> 496 7701
>> 60 Garden Street, MS 67                                      fax:  +1
>> 617 495 7356
>> Cambridge, MA 02138                                        
>> arots at cfa.harvard.edu <mailto:arots at cfa.harvard.edu>
>> USA                                                  
>> http://hea-www.harvard.edu/~arots/ <http://hea-www.harvard.edu/%7Earots/>
>> --------------------------------------------------------------------------------------------------------------
>>
>>
>> On Fri, Jul 7, 2017 at 8:51 PM, Mark Calabretta
>> <mark at calabretta.id.au <mailto:mark at calabretta.id.au>> wrote:
>>
>>     Taking into consideration what others have said on this thread, I
>>     would
>>     like to point out that up to 34695 bintable columns may easily be
>>     accomodated, with full backward compatibility, via a simple extension
>>     to the FITS standard.  Namely,
>>
>>     1. When encoding bintable-related keywords such as ijPCna, allow
>>        lower-case letters to represent digits in a base-36 counting
>>     system.
>>
>>     2. Number bintable columns 1 to 999, followed by a00 to zzz, where an
>>        offset (-11960) is applied to make a00 column 1000.  The total
>>     number
>>        of columns is then 999 + 26*36*36 = 34695.  (Alternatively,
>>     the full
>>        range of three-digit base-36 counting, namely 46656, could be
>>        recovered with a more elaborate ordering.)
>>
>>     Regards,
>>     Mark Calabretta
>>
>>
>>     On Fri, 7 Jul 2017 12:09:15 +0100 (BST)
>>     Mark Taylor <M.B.Taylor at bristol.ac.uk
>>     <mailto:M.B.Taylor at bristol.ac.uk>> wrote:
>>
>>     Dear fitsbits,
>>
>>     I am considering a convention for storing table data in FITS files
>>     where the number of columns exceeds the 999 limit implicitly imposed
>>     by the standard BINTABLE extension type.  I have running code for
>>     this (available on request) and plan to incorporate it in future
>>     releases of STIL/STILTS/TOPCAT so that people can work with wide
>>     tables in FITS while using those tools.  People using software
>>     that is unaware of this convention would still see a legal BINTABLE
>>     but not the later columns.
>>
>>     I'm posting the details here in case people want to comment,
>>     or point out some major problem with the idea that I might have
>>     overlooked, or tell me that there's already a convention for
>>     this out there that I should be using instead.  Otherwise, please
>>     feel free to ignore this post.  I'm not requesting that any
>>     other software implements this, though if anyone wants to I
>>     certainly don't object.
>>
>>     Mark
>>
>>     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>>
>>     Extended column convention for FITS BINTABLE
>>     --------------------------------------------
>>
>>     The BINTABLE extension type as described in the FITS Standard
>>     (FITS Standard v3.0, sec 7.3) requires table column metadata
>>     to be described using 8-character keywords of the form XXXXXnnn,
>>     where XXXXX represents one of an open set of mandatory, reserved
>>     or user-defined root keywords up to five characters in length,
>>     for instance TFORM (mandatory), TUNIT (reserved), TUCD
>>     (user-defined).
>>     The nnn part is an integer between 1 and 999 indicating the
>>     index of the column to which the keyword in question refers.
>>     Since the header syntax confines this indexed part of the keyword
>>     to three digits, there is an upper limit of 999 columns in
>>     BINTABLE extensions.
>>
>>     Note that the FITS/BINTABLE format does not entail any restriction on
>>     the storage of column *data* beyond the 999 column limit in the data
>>     part of the HDU, the problem is just that client software
>>     cannot be informed about the layout of this data using the
>>     header cards in the usual way.
>>
>>     In some cases it is desirable to store FITS tables with a column
>>     count greater than 999.  Whether that's a good idea is not within
>>     the scope of this discussion.
>>
>>     To achieve this, I propose the following convention.
>>
>>     Definitions:
>>
>>      - 'BINTABLE columns' are those columns defined using the
>>           FITS BINTABLE standard
>>
>>      - 'Data columns' are the columns to be encoded
>>
>>      - N_TOT is the total number of data columns to be stored
>>
>>      - Data columns with (1-based) indexes from 999 to N_TOT inclusive
>>           are known as 'extended' columns.  Their data is stored
>>           within the 'container' column.
>>
>>      - BINTABLE column 999 is known as the 'container' column
>>           It contains the byte data for all the 'extended' columns.
>>
>>     Convention:
>>
>>      - All column data (for columns 1 to N_TOT) is laid out in the
>>     data part
>>           of the HDU in exactly the same way as if there were no
>>     999-column
>>           limit.
>>
>>      - The TFIELDS header is declared with the value 999.
>>
>>      - The container column is declared in the header with some
>>           TFORM999 value corresponding to the total field length required
>>           by all the extended columns ('B' is the obvious data type, but
>>           any legal TFORM value that gives the right width MAY be used).
>>           The byte count implied by TFORM999 MUST be equal to the
>>           total byte count implied by all extended columns.
>>
>>      - Other XXXXX999 headers MAY optionally be declared to describe
>>           the container column in accordance with the usual rules,
>>           e.g. TTYPE999 to give it a name.
>>
>>      - The NAXIS1 header is declared in the usual way to give the width
>>           of a table row in bytes.  This is equal to the sum of
>>           all the BINTABLE columns as usual.  It is also equal to
>>           the sum of all the data columns, which has the same value.
>>
>>      - Headers for Data columns 1-998 are declared as usual,
>>           corresponding to BINTABLE columns 1-998.
>>
>>      - Keyword XT_ICOL indicates the index of the container column.
>>           It MUST be present with the integer value 999 to indicate
>>           that this convention is in use.
>>
>>      - Keyword XT_NCOL indicates the total number of data columns
>>     encoded.
>>           It MUST be present with an integer value equal to N_TOT.
>>
>>      - Metadata for each extended column is encoded with keywords
>>           of the form XXXXXaaa, where XXXXX are the same keyword roots
>>           as used for normal BINTABLE extensions, and aaa is a 3-digit
>>           value in base 26 using the characters 'A' (0 in base 26) to
>>           'Z' (25 in base 26), and giving the 1-based data column index
>>           minus 999.  The sequence aaa MUST be exactly three characters
>>           long (leading 'A's are required).  Thus the formats for data
>>           columns 999, 1000, 1001, etc are declared with the keywords
>>           TFORMAAA, TFORMAAB, TFORMAAC etc.
>>
>>      - This convention MUST NOT be used for N_TOT<=999.
>>
>>     The resulting HDU is a completely legal FITS BINTABLE extension.
>>     Readers aware of this convention may use it to extract column
>>     data and metadata beyond the 999-column limit.
>>     Readers unaware of this convention will see 998 columns in their
>>     intended form, and an additional (possibly large) column 999
>>     which contains byte data but which cannot be easily interpreted.
>>
>>     This convention can therefore allow encoding of tables with data
>>     column counts N_TOT up to 998+26^3 = 18574.
>>
>>     An example header might look like this:
>>
>>        XTENSION= 'BINTABLE'           /  binary table extension
>>        BITPIX  =                    8 /  8-bit bytes
>>        NAXIS   =                    2 /  2-dimensional table
>>        NAXIS1  =                 9229 /  width of table in bytes
>>        NAXIS2  =                   26 /  number of rows in table
>>        PCOUNT  =                    0 /  size of special data area
>>        GCOUNT  =                    1 /  one data group
>>        TFIELDS =                  999 /  number of columns
>>        XT_ICOL =                  999 /  index of container column
>>        XT_NCOL =                 1204 /  total columns including extended
>>        TTYPE1  = 'posid_1 '           /  label for column 1
>>        TFORM1  = 'J       '           /  format for column 1
>>        TTYPE2  = 'instrument_1'       /  label for column 2
>>        TFORM2  = '4A      '           /  format for column 2
>>        TTYPE3  = 'edge_code_1'        /  label for column 3
>>        TFORM3  = 'I       '           /  format for column 3
>>        TUCD3   = 'meta.code.qual'
>>         ...
>>        TTYPE998= 'var_min_s_2'        /  label for column 998
>>        TFORM998= 'D       '           /  format for column 998
>>        TUNIT998= 'counts/s'           /  units for column 998
>>        TTYPE999= 'XT_MORECOLS'        /  label for column 999
>>        TFORM999= '813I    '           /  format for column 999
>>        TTYPEAAA= 'var_min_u_2'        /  label for column 999
>>        TFORMAAA= 'D       '           /  format for column 999
>>        TUNITAAA= 'counts/s'           /  units for column 999
>>        TTYPEAAB= 'var_prob_h_2'       /  label for column 1000
>>        TFORMAAB= 'D       '           /  format for column 1000
>>         ...
>>        TTYPEAHW= 'var_prob_w_2'       /  label for column 1203
>>        TFORMAHW= 'D       '           /  format for column 1203
>>        TTYPEAHX= 'var_sigma_w_2'      /  label for column 1204
>>        TFORMAHX= 'D       '           /  format for column 1204
>>        TUNITAHX= 'counts/s'           /  units for column 1204
>>        END
>>
>>     This general approach was suggested by William Pence on the FITSBITS
>>     list in June 2012
>>     (https://listmgr.nrao.edu/pipermail/fitsbits/2012-June/002367.html
>>     <https://listmgr.nrao.edu/pipermail/fitsbits/2012-June/002367.html>),
>>     and by Francois-Xavier Pineau (CDS) in private conversation in 2016.
>>     The details have been filled in by Mark Taylor (Bristol).
>>     (F-X favours a different mechanism for encoding the extended
>>     column metadata).
>>
>>     --
>>     Mark Taylor   Astronomical Programmer   Physics, Bristol
>>     University, UK
>>     m.b.taylor at bris.ac.uk <mailto:m.b.taylor at bris.ac.uk>
>>     +44-117-9288776 <tel:%2B44-117-9288776> 
>>     http://www.star.bris.ac.uk/~mbt/ <http://www.star.bris.ac.uk/%7Embt/>
>>
>>     _______________________________________________
>>     fitsbits mailing list
>>     fitsbits at listmgr.nrao.edu <mailto:fitsbits at listmgr.nrao.edu>
>>     https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>>     <https://listmgr.nrao.edu/mailman/listinfo/fitsbits>
>>
>>     _______________________________________________
>>     fitsbits mailing list
>>     fitsbits at listmgr.nrao.edu <mailto:fitsbits at listmgr.nrao.edu>
>>     https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>>     <https://listmgr.nrao.edu/mailman/listinfo/fitsbits>
>>
>>
>>
>>
>> _______________________________________________
>> fitsbits mailing list
>> fitsbits at listmgr.nrao.edu
>> https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/fitsbits/attachments/20170710/5b8fbef5/attachment-0001.html>


More information about the fitsbits mailing list