[fitsbits] BINTABLE convention for >999 columns

Phil Hodge hodge at stsci.edu
Fri Jul 7 15:55:08 EDT 2017


Some of those base-36 "numbers" could be the same as parts of words.  
You could have keyword TFORMAT, for example, as a keyword completely 
independent of the TFORMXXX sequence, or it could be TFORM for column AT.

If you could wave a magic wand, why not just allow keyword names to be 
much longer, e.g. up to 50 characters?

Phil

On 07/07/2017 03:35 PM, Richard Shaw wrote:
> Hi Mark,
>
> I can't help but wonder why the 'nnn' part of 'XXXXXnnn' was not 
> allowed to be a non-negative base-36 number in the first place. Maybe 
> there was some reason, perhaps even voiced at the time this extension 
> type was being defined, but I can't think of what it would be. That 
> would have allowed for 46656 columns, which is obviously still a 
> limitation but less so than what we currently have. Were I to waive a 
> magic wand to fold this idea into the current (4.0 Draft) Standard, I 
> think what would need to change is the definitions of the TFIELDS 
> keyword (non-negative base-36 integer), and the TFORMn keyword (allow 
> for 'n' to be non-sequential but still increasing within the file). Or 
> am I missing something?
>
> -Dick Shaw
>
> On Fri, Jul 7, 2017 at 9:45 AM, <fitsbits-request at listmgr.nrao.edu 
> <mailto:fitsbits-request at listmgr.nrao.edu>> wrote:
>
>     Send fitsbits mailing list submissions to
>     fitsbits at listmgr.nrao.edu <mailto:fitsbits at listmgr.nrao.edu>
>
>     To subscribe or unsubscribe via the World Wide Web, visit
>     https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>     <https://listmgr.nrao.edu/mailman/listinfo/fitsbits>
>     or, via email, send a message with subject or body 'help' to
>     fitsbits-request at listmgr.nrao.edu
>     <mailto:fitsbits-request at listmgr.nrao.edu>
>
>     You can reach the person managing the list at
>     fitsbits-owner at listmgr.nrao.edu
>     <mailto:fitsbits-owner at listmgr.nrao.edu>
>
>     When replying, please edit your Subject line so it is more specific
>     than "Re: Contents of fitsbits digest..."
>
>
>     Today's Topics:
>
>        1. BINTABLE convention for >999 columns (Mark Taylor)
>        2. Re: BINTABLE convention for >999 columns (Rob Seaman)
>        3. Re: BINTABLE convention for >999 columns (Demitri Muna)
>
>
>     ----------------------------------------------------------------------
>
>     Message: 1
>     Date: Fri, 7 Jul 2017 12:09:15 +0100 (BST)
>     From: Mark Taylor <M.B.Taylor at bristol.ac.uk
>     <mailto:M.B.Taylor at bristol.ac.uk>>
>     To: fitsbits at listmgr.nrao.edu <mailto:fitsbits at listmgr.nrao.edu>
>     Subject: [fitsbits] BINTABLE convention for >999 columns
>     Message-ID:
>            
>     <alpine.LRH.2.20.1707071059380.5525 at andromeda.star.bris.ac.uk
>     <mailto:alpine.LRH.2.20.1707071059380.5525 at andromeda.star.bris.ac.uk>>
>     Content-Type: text/plain; charset=US-ASCII
>
>     Dear fitsbits,
>
>     I am considering a convention for storing table data in FITS files
>     where the number of columns exceeds the 999 limit implicitly imposed
>     by the standard BINTABLE extension type.  I have running code for
>     this (available on request) and plan to incorporate it in future
>     releases of STIL/STILTS/TOPCAT so that people can work with wide
>     tables in FITS while using those tools.  People using software
>     that is unaware of this convention would still see a legal BINTABLE
>     but not the later columns.
>
>     I'm posting the details here in case people want to comment,
>     or point out some major problem with the idea that I might have
>     overlooked, or tell me that there's already a convention for
>     this out there that I should be using instead. Otherwise, please
>     feel free to ignore this post.  I'm not requesting that any
>     other software implements this, though if anyone wants to I
>     certainly don't object.
>
>     Mark
>
>     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>
>     Extended column convention for FITS BINTABLE
>     --------------------------------------------
>
>     The BINTABLE extension type as described in the FITS Standard
>     (FITS Standard v3.0, sec 7.3) requires table column metadata
>     to be described using 8-character keywords of the form XXXXXnnn,
>     where XXXXX represents one of an open set of mandatory, reserved
>     or user-defined root keywords up to five characters in length,
>     for instance TFORM (mandatory), TUNIT (reserved), TUCD (user-defined).
>     The nnn part is an integer between 1 and 999 indicating the
>     index of the column to which the keyword in question refers.
>     Since the header syntax confines this indexed part of the keyword
>     to three digits, there is an upper limit of 999 columns in
>     BINTABLE extensions.
>
>     Note that the FITS/BINTABLE format does not entail any restriction on
>     the storage of column *data* beyond the 999 column limit in the data
>     part of the HDU, the problem is just that client software
>     cannot be informed about the layout of this data using the
>     header cards in the usual way.
>
>     In some cases it is desirable to store FITS tables with a column
>     count greater than 999.  Whether that's a good idea is not within
>     the scope of this discussion.
>
>     To achieve this, I propose the following convention.
>
>     Definitions:
>
>      - 'BINTABLE columns' are those columns defined using the
>           FITS BINTABLE standard
>
>      - 'Data columns' are the columns to be encoded
>
>      - N_TOT is the total number of data columns to be stored
>
>      - Data columns with (1-based) indexes from 999 to N_TOT inclusive
>           are known as 'extended' columns.  Their data is stored
>           within the 'container' column.
>
>      - BINTABLE column 999 is known as the 'container' column
>           It contains the byte data for all the 'extended' columns.
>
>     Convention:
>
>      - All column data (for columns 1 to N_TOT) is laid out in the
>     data part
>           of the HDU in exactly the same way as if there were no
>     999-column
>           limit.
>
>      - The TFIELDS header is declared with the value 999.
>
>      - The container column is declared in the header with some
>           TFORM999 value corresponding to the total field length required
>           by all the extended columns ('B' is the obvious data type, but
>           any legal TFORM value that gives the right width MAY be used).
>           The byte count implied by TFORM999 MUST be equal to the
>           total byte count implied by all extended columns.
>
>      - Other XXXXX999 headers MAY optionally be declared to describe
>           the container column in accordance with the usual rules,
>           e.g. TTYPE999 to give it a name.
>
>      - The NAXIS1 header is declared in the usual way to give the width
>           of a table row in bytes.  This is equal to the sum of
>           all the BINTABLE columns as usual.  It is also equal to
>           the sum of all the data columns, which has the same value.
>
>      - Headers for Data columns 1-998 are declared as usual,
>           corresponding to BINTABLE columns 1-998.
>
>      - Keyword XT_ICOL indicates the index of the container column.
>           It MUST be present with the integer value 999 to indicate
>           that this convention is in use.
>
>      - Keyword XT_NCOL indicates the total number of data columns encoded.
>           It MUST be present with an integer value equal to N_TOT.
>
>      - Metadata for each extended column is encoded with keywords
>           of the form XXXXXaaa, where XXXXX are the same keyword roots
>           as used for normal BINTABLE extensions, and aaa is a 3-digit
>           value in base 26 using the characters 'A' (0 in base 26) to
>           'Z' (25 in base 26), and giving the 1-based data column index
>           minus 999.  The sequence aaa MUST be exactly three characters
>           long (leading 'A's are required).  Thus the formats for data
>           columns 999, 1000, 1001, etc are declared with the keywords
>           TFORMAAA, TFORMAAB, TFORMAAC etc.
>
>      - This convention MUST NOT be used for N_TOT<=999.
>
>     The resulting HDU is a completely legal FITS BINTABLE extension.
>     Readers aware of this convention may use it to extract column
>     data and metadata beyond the 999-column limit.
>     Readers unaware of this convention will see 998 columns in their
>     intended form, and an additional (possibly large) column 999
>     which contains byte data but which cannot be easily interpreted.
>
>     This convention can therefore allow encoding of tables with data
>     column counts N_TOT up to 998+26^3 = 18574.
>
>     An example header might look like this:
>
>        XTENSION= 'BINTABLE'           /  binary table extension
>        BITPIX  =                    8 /  8-bit bytes
>        NAXIS   =                    2 /  2-dimensional table
>        NAXIS1  =                 9229 /  width of table in bytes
>        NAXIS2  =                   26 /  number of rows in table
>        PCOUNT  =                    0 /  size of special data area
>        GCOUNT  =                    1 /  one data group
>        TFIELDS =                  999 /  number of columns
>        XT_ICOL =                  999 /  index of container column
>        XT_NCOL =                 1204 /  total columns including extended
>        TTYPE1  = 'posid_1 '           /  label for column 1
>        TFORM1  = 'J       '           /  format for column 1
>        TTYPE2  = 'instrument_1'       /  label for column 2
>        TFORM2  = '4A      '           /  format for column 2
>        TTYPE3  = 'edge_code_1'        /  label for column 3
>        TFORM3  = 'I       '           /  format for column 3
>        TUCD3   = 'meta.code.qual'
>         ...
>        TTYPE998= 'var_min_s_2'        /  label for column 998
>        TFORM998= 'D       '           /  format for column 998
>        TUNIT998= 'counts/s'           /  units for column 998
>        TTYPE999= 'XT_MORECOLS'        /  label for column 999
>        TFORM999= '813I    '           /  format for column 999
>        TTYPEAAA= 'var_min_u_2'        /  label for column 999
>        TFORMAAA= 'D       '           /  format for column 999
>        TUNITAAA= 'counts/s'           /  units for column 999
>        TTYPEAAB= 'var_prob_h_2'       /  label for column 1000
>        TFORMAAB= 'D       '           /  format for column 1000
>         ...
>        TTYPEAHW= 'var_prob_w_2'       /  label for column 1203
>        TFORMAHW= 'D       '           /  format for column 1203
>        TTYPEAHX= 'var_sigma_w_2'      /  label for column 1204
>        TFORMAHX= 'D       '           /  format for column 1204
>        TUNITAHX= 'counts/s'           /  units for column 1204
>        END
>
>     This general approach was suggested by William Pence on the FITSBITS
>     list in June 2012
>     (https://listmgr.nrao.edu/pipermail/fitsbits/2012-June/002367.html
>     <https://listmgr.nrao.edu/pipermail/fitsbits/2012-June/002367.html>),
>     and by Francois-Xavier Pineau (CDS) in private conversation in 2016.
>     The details have been filled in by Mark Taylor (Bristol).
>     (F-X favours a different mechanism for encoding the extended
>     column metadata).
>
>     --
>     Mark Taylor   Astronomical Programmer   Physics, Bristol
>     University, UK
>     m.b.taylor at bris.ac.uk <mailto:m.b.taylor at bris.ac.uk>
>     +44-117-9288776 <tel:%2B44-117-9288776>
>     http://www.star.bris.ac.uk/~mbt/ <http://www.star.bris.ac.uk/%7Embt/>
>
>
>
>     ------------------------------
>
>     Message: 2
>     Date: Fri, 7 Jul 2017 05:51:17 -0700
>     From: Rob Seaman <seaman at lpl.arizona.edu
>     <mailto:seaman at lpl.arizona.edu>>
>     To: fitsbits at listmgr.nrao.edu <mailto:fitsbits at listmgr.nrao.edu>
>     Subject: Re: [fitsbits] BINTABLE convention for >999 columns
>     Message-ID: <9cfd2764-bf73-4c2f-d111-a15e3a2ccaca at lpl.arizona.edu
>     <mailto:9cfd2764-bf73-4c2f-d111-a15e3a2ccaca at lpl.arizona.edu>>
>     Content-Type: text/plain; charset=utf-8
>
>     Why not simply split such tables into two+ extensions and join as
>     needed?
>
>     Rob
>
>     --
>
>
>     On 7/7/17 4:09 AM, Mark Taylor wrote:
>     > Dear fitsbits,
>     >
>     > I am considering a convention for storing table data in FITS files
>     > where the number of columns exceeds the 999 limit implicitly imposed
>     > by the standard BINTABLE extension type.  I have running code for
>     > this (available on request) and plan to incorporate it in future
>     > releases of STIL/STILTS/TOPCAT so that people can work with wide
>     > tables in FITS while using those tools.  People using software
>     > that is unaware of this convention would still see a legal BINTABLE
>     > but not the later columns.
>     >
>     > I'm posting the details here in case people want to comment,
>     > or point out some major problem with the idea that I might have
>     > overlooked, or tell me that there's already a convention for
>     > this out there that I should be using instead. Otherwise, please
>     > feel free to ignore this post.  I'm not requesting that any
>     > other software implements this, though if anyone wants to I
>     > certainly don't object.
>     >
>     > Mark
>     >
>     > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>     >
>     > Extended column convention for FITS BINTABLE
>     > --------------------------------------------
>     >
>     > The BINTABLE extension type as described in the FITS Standard
>     > (FITS Standard v3.0, sec 7.3) requires table column metadata
>     > to be described using 8-character keywords of the form XXXXXnnn,
>     > where XXXXX represents one of an open set of mandatory, reserved
>     > or user-defined root keywords up to five characters in length,
>     > for instance TFORM (mandatory), TUNIT (reserved), TUCD
>     (user-defined).
>     > The nnn part is an integer between 1 and 999 indicating the
>     > index of the column to which the keyword in question refers.
>     > Since the header syntax confines this indexed part of the keyword
>     > to three digits, there is an upper limit of 999 columns in
>     > BINTABLE extensions.
>     >
>     > Note that the FITS/BINTABLE format does not entail any
>     restriction on
>     > the storage of column *data* beyond the 999 column limit in the data
>     > part of the HDU, the problem is just that client software
>     > cannot be informed about the layout of this data using the
>     > header cards in the usual way.
>     >
>     > In some cases it is desirable to store FITS tables with a column
>     > count greater than 999.  Whether that's a good idea is not within
>     > the scope of this discussion.
>     >
>     > To achieve this, I propose the following convention.
>     >
>     > Definitions:
>     >
>     >  - 'BINTABLE columns' are those columns defined using the
>     >       FITS BINTABLE standard
>     >
>     >  - 'Data columns' are the columns to be encoded
>     >
>     >  - N_TOT is the total number of data columns to be stored
>     >
>     >  - Data columns with (1-based) indexes from 999 to N_TOT inclusive
>     >       are known as 'extended' columns.  Their data is stored
>     >       within the 'container' column.
>     >
>     >  - BINTABLE column 999 is known as the 'container' column
>     >       It contains the byte data for all the 'extended' columns.
>     >
>     > Convention:
>     >
>     >  - All column data (for columns 1 to N_TOT) is laid out in the
>     data part
>     >       of the HDU in exactly the same way as if there were no
>     999-column
>     >       limit.
>     >
>     >  - The TFIELDS header is declared with the value 999.
>     >
>     >  - The container column is declared in the header with some
>     >       TFORM999 value corresponding to the total field length
>     required
>     >       by all the extended columns ('B' is the obvious data type, but
>     >       any legal TFORM value that gives the right width MAY be used).
>     >       The byte count implied by TFORM999 MUST be equal to the
>     >       total byte count implied by all extended columns.
>     >
>     >  - Other XXXXX999 headers MAY optionally be declared to describe
>     >       the container column in accordance with the usual rules,
>     >       e.g. TTYPE999 to give it a name.
>     >
>     >  - The NAXIS1 header is declared in the usual way to give the width
>     >       of a table row in bytes.  This is equal to the sum of
>     >       all the BINTABLE columns as usual.  It is also equal to
>     >       the sum of all the data columns, which has the same value.
>     >
>     >  - Headers for Data columns 1-998 are declared as usual,
>     >       corresponding to BINTABLE columns 1-998.
>     >
>     >  - Keyword XT_ICOL indicates the index of the container column.
>     >       It MUST be present with the integer value 999 to indicate
>     >       that this convention is in use.
>     >
>     >  - Keyword XT_NCOL indicates the total number of data columns
>     encoded.
>     >       It MUST be present with an integer value equal to N_TOT.
>     >
>     >  - Metadata for each extended column is encoded with keywords
>     >       of the form XXXXXaaa, where XXXXX are the same keyword roots
>     >       as used for normal BINTABLE extensions, and aaa is a 3-digit
>     >       value in base 26 using the characters 'A' (0 in base 26) to
>     >       'Z' (25 in base 26), and giving the 1-based data column index
>     >       minus 999.  The sequence aaa MUST be exactly three characters
>     >       long (leading 'A's are required).  Thus the formats for data
>     >       columns 999, 1000, 1001, etc are declared with the keywords
>     >       TFORMAAA, TFORMAAB, TFORMAAC etc.
>     >
>     >  - This convention MUST NOT be used for N_TOT<=999.
>     >
>     > The resulting HDU is a completely legal FITS BINTABLE extension.
>     > Readers aware of this convention may use it to extract column
>     > data and metadata beyond the 999-column limit.
>     > Readers unaware of this convention will see 998 columns in their
>     > intended form, and an additional (possibly large) column 999
>     > which contains byte data but which cannot be easily interpreted.
>     >
>     > This convention can therefore allow encoding of tables with data
>     > column counts N_TOT up to 998+26^3 = 18574.
>     >
>     > An example header might look like this:
>     >
>     >    XTENSION= 'BINTABLE'           /  binary table extension
>     >    BITPIX  =                    8 /  8-bit bytes
>     >    NAXIS   =                    2 / 2-dimensional table
>     >    NAXIS1  =                 9229 /  width of table in bytes
>     >    NAXIS2  =                   26 /  number of rows in table
>     >    PCOUNT  =                    0 /  size of special data area
>     >    GCOUNT  =                    1 /  one data group
>     >    TFIELDS =                  999 /  number of columns
>     >    XT_ICOL =                  999 /  index of container column
>     >    XT_NCOL =                 1204 /  total columns including
>     extended
>     >    TTYPE1  = 'posid_1 '           /  label for column 1
>     >    TFORM1  = 'J       '           /  format for column 1
>     >    TTYPE2  = 'instrument_1'       /  label for column 2
>     >    TFORM2  = '4A      '           /  format for column 2
>     >    TTYPE3  = 'edge_code_1'        /  label for column 3
>     >    TFORM3  = 'I       '           /  format for column 3
>     >    TUCD3   = 'meta.code.qual'
>     >     ...
>     >    TTYPE998= 'var_min_s_2'        /  label for column 998
>     >    TFORM998= 'D       '           /  format for column 998
>     >    TUNIT998= 'counts/s'           /  units for column 998
>     >    TTYPE999= 'XT_MORECOLS'        /  label for column 999
>     >    TFORM999= '813I    '           /  format for column 999
>     >    TTYPEAAA= 'var_min_u_2'        /  label for column 999
>     >    TFORMAAA= 'D       '           /  format for column 999
>     >    TUNITAAA= 'counts/s'           /  units for column 999
>     >    TTYPEAAB= 'var_prob_h_2'       /  label for column 1000
>     >    TFORMAAB= 'D       '           /  format for column 1000
>     >     ...
>     >    TTYPEAHW= 'var_prob_w_2'       /  label for column 1203
>     >    TFORMAHW= 'D       '           /  format for column 1203
>     >    TTYPEAHX= 'var_sigma_w_2'      /  label for column 1204
>     >    TFORMAHX= 'D       '           /  format for column 1204
>     >    TUNITAHX= 'counts/s'           /  units for column 1204
>     >    END
>     >
>     > This general approach was suggested by William Pence on the FITSBITS
>     > list in June 2012
>     >
>     (https://listmgr.nrao.edu/pipermail/fitsbits/2012-June/002367.html
>     <https://listmgr.nrao.edu/pipermail/fitsbits/2012-June/002367.html>),
>     > and by Francois-Xavier Pineau (CDS) in private conversation in 2016.
>     > The details have been filled in by Mark Taylor (Bristol).
>     > (F-X favours a different mechanism for encoding the extended
>     > column metadata).
>     >
>     > --
>     > Mark Taylor   Astronomical Programmer  Physics, Bristol
>     University, UK
>     > m.b.taylor at bris.ac.uk <mailto:m.b.taylor at bris.ac.uk>
>     +44-117-9288776 <tel:%2B44-117-9288776>
>     http://www.star.bris.ac.uk/~mbt/ <http://www.star.bris.ac.uk/%7Embt/>
>     >
>     > _______________________________________________
>     > fitsbits mailing list
>     > fitsbits at listmgr.nrao.edu <mailto:fitsbits at listmgr.nrao.edu>
>     > https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>     <https://listmgr.nrao.edu/mailman/listinfo/fitsbits>
>
>
>
>     ------------------------------
>
>     Message: 3
>     Date: Fri, 7 Jul 2017 09:45:21 -0400
>     From: Demitri Muna <demitri.muna at gmail.com
>     <mailto:demitri.muna at gmail.com>>
>     To: Mark Taylor <m.b.taylor at bristol.ac.uk
>     <mailto:m.b.taylor at bristol.ac.uk>>
>     Cc: fitsbits at listmgr.nrao.edu <mailto:fitsbits at listmgr.nrao.edu>
>     Subject: Re: [fitsbits] BINTABLE convention for >999 columns
>     Message-ID: <8A1AC488-6528-4C3A-A0D2-1226C6390F21 at gmail.com
>     <mailto:8A1AC488-6528-4C3A-A0D2-1226C6390F21 at gmail.com>>
>     Content-Type: text/plain; charset="us-ascii"
>
>     Hi,
>
>     I agree with Rob here; the simplest solution is to spread the data
>     into two or more extensions. It's not a lot of work for the end
>     user to concatenate the columns into a single data structure if
>     that is preferable for some reason. Creating a new convention that
>     is not part of the FITS standard *does* create a lot of work for
>     many people. While you may be able to create a technically valid
>     FITS file, this proposal is not in the spirit of how FITS files
>     are to be read. This proposal literally redefines the meaning of
>     the mandatory "TFIELDS" header from the "number of columns in the
>     table" to "number of columns in the table, except if there are
>     other keywords, then in that case look elsewhere for this
>     information".
>
>     On Jul 7, 2017, at 7:09 AM, Mark Taylor <m.b.taylor at bristol.ac.uk
>     <mailto:m.b.taylor at bristol.ac.uk>> wrote:
>
>     > I'm posting the details here in case people want to comment,
>     > or point out some major problem with the idea that I might have
>     > overlooked, or tell me that there's already a convention for
>     > this out there that I should be using instead. Otherwise, please
>     > feel free to ignore this post.  I'm not requesting that any
>     > other software implements this, though if anyone wants to I
>     > certainly don't object.
>
>     I don't think it's as simple as that. It's one thing to implement
>     this in the software you support, but there are other FITS
>     viewers/readers (Astropy and cfitsio being the main ones, whatever
>     IDL routines there are, not to mention Nightlight). I think it
>     would be wrong for the other programs to implement this without it
>     being part of the standard, and I think it's a bad idea to fork
>     the standard with a custom implementation. Feelings about
>     standards aside, this provides for a bad user experience. It's a
>     legitimate question/frustration for a user to wonder why some
>     columns appear in one program and not many others, especially when
>     the file claims to be a FITS file.
>
>     I agree that there are limitations in the FITS format, but I
>     strongly suggest that the only way forward for this idea is to
>     propose it (or something similar) as part of the official FITS
>     format or else use multiple extensions.
>
>     Cheers,
>     Demitri
>
>
>     http://nightlightapp.io
>
>     -------------- next part --------------
>     An HTML attachment was scrubbed...
>     URL:
>     <http://listmgr.nrao.edu/pipermail/fitsbits/attachments/20170707/cd9361c6/attachment.html
>     <http://listmgr.nrao.edu/pipermail/fitsbits/attachments/20170707/cd9361c6/attachment.html>>
>
>     ------------------------------
>
>     Subject: Digest Footer
>
>     _______________________________________________
>     fitsbits mailing list
>     fitsbits at listmgr.nrao.edu <mailto:fitsbits at listmgr.nrao.edu>
>     https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>     <https://listmgr.nrao.edu/mailman/listinfo/fitsbits>
>
>
>     ------------------------------
>
>     End of fitsbits Digest, Vol 117, Issue 1
>     ****************************************
>
>
>
>
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/fitsbits



More information about the fitsbits mailing list