[fitsbits] BINTABLE convention for >999 columns

Richard Shaw rashaw.astro at gmail.com
Fri Jul 7 15:35:15 EDT 2017


Hi Mark,

I can't help but wonder why the 'nnn' part of 'XXXXXnnn' was not allowed to
be a non-negative base-36 number in the first place. Maybe there was some
reason, perhaps even voiced at the time this extension type was being
defined, but I can't think of what it would be. That would have allowed for
46656 columns, which is obviously still a limitation but less so than what
we currently have. Were I to waive a magic wand to fold this idea into the
current (4.0 Draft) Standard, I think what would need to change is the
definitions of the TFIELDS keyword (non-negative base-36 integer), and the
TFORMn keyword (allow for 'n' to be non-sequential but still increasing
within the file). Or am I missing something?

-Dick Shaw

On Fri, Jul 7, 2017 at 9:45 AM, <fitsbits-request at listmgr.nrao.edu> wrote:

> Send fitsbits mailing list submissions to
>         fitsbits at listmgr.nrao.edu
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://listmgr.nrao.edu/mailman/listinfo/fitsbits
> or, via email, send a message with subject or body 'help' to
>         fitsbits-request at listmgr.nrao.edu
>
> You can reach the person managing the list at
>         fitsbits-owner at listmgr.nrao.edu
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of fitsbits digest..."
>
>
> Today's Topics:
>
>    1. BINTABLE convention for >999 columns (Mark Taylor)
>    2. Re: BINTABLE convention for >999 columns (Rob Seaman)
>    3. Re: BINTABLE convention for >999 columns (Demitri Muna)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 7 Jul 2017 12:09:15 +0100 (BST)
> From: Mark Taylor <M.B.Taylor at bristol.ac.uk>
> To: fitsbits at listmgr.nrao.edu
> Subject: [fitsbits] BINTABLE convention for >999 columns
> Message-ID:
>         <alpine.LRH.2.20.1707071059380.5525 at andromeda.star.bris.ac.uk>
> Content-Type: text/plain; charset=US-ASCII
>
> Dear fitsbits,
>
> I am considering a convention for storing table data in FITS files
> where the number of columns exceeds the 999 limit implicitly imposed
> by the standard BINTABLE extension type.  I have running code for
> this (available on request) and plan to incorporate it in future
> releases of STIL/STILTS/TOPCAT so that people can work with wide
> tables in FITS while using those tools.  People using software
> that is unaware of this convention would still see a legal BINTABLE
> but not the later columns.
>
> I'm posting the details here in case people want to comment,
> or point out some major problem with the idea that I might have
> overlooked, or tell me that there's already a convention for
> this out there that I should be using instead.  Otherwise, please
> feel free to ignore this post.  I'm not requesting that any
> other software implements this, though if anyone wants to I
> certainly don't object.
>
> Mark
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>
> Extended column convention for FITS BINTABLE
> --------------------------------------------
>
> The BINTABLE extension type as described in the FITS Standard
> (FITS Standard v3.0, sec 7.3) requires table column metadata
> to be described using 8-character keywords of the form XXXXXnnn,
> where XXXXX represents one of an open set of mandatory, reserved
> or user-defined root keywords up to five characters in length,
> for instance TFORM (mandatory), TUNIT (reserved), TUCD (user-defined).
> The nnn part is an integer between 1 and 999 indicating the
> index of the column to which the keyword in question refers.
> Since the header syntax confines this indexed part of the keyword
> to three digits, there is an upper limit of 999 columns in
> BINTABLE extensions.
>
> Note that the FITS/BINTABLE format does not entail any restriction on
> the storage of column *data* beyond the 999 column limit in the data
> part of the HDU, the problem is just that client software
> cannot be informed about the layout of this data using the
> header cards in the usual way.
>
> In some cases it is desirable to store FITS tables with a column
> count greater than 999.  Whether that's a good idea is not within
> the scope of this discussion.
>
> To achieve this, I propose the following convention.
>
> Definitions:
>
>  - 'BINTABLE columns' are those columns defined using the
>       FITS BINTABLE standard
>
>  - 'Data columns' are the columns to be encoded
>
>  - N_TOT is the total number of data columns to be stored
>
>  - Data columns with (1-based) indexes from 999 to N_TOT inclusive
>       are known as 'extended' columns.  Their data is stored
>       within the 'container' column.
>
>  - BINTABLE column 999 is known as the 'container' column
>       It contains the byte data for all the 'extended' columns.
>
> Convention:
>
>  - All column data (for columns 1 to N_TOT) is laid out in the data part
>       of the HDU in exactly the same way as if there were no 999-column
>       limit.
>
>  - The TFIELDS header is declared with the value 999.
>
>  - The container column is declared in the header with some
>       TFORM999 value corresponding to the total field length required
>       by all the extended columns ('B' is the obvious data type, but
>       any legal TFORM value that gives the right width MAY be used).
>       The byte count implied by TFORM999 MUST be equal to the
>       total byte count implied by all extended columns.
>
>  - Other XXXXX999 headers MAY optionally be declared to describe
>       the container column in accordance with the usual rules,
>       e.g. TTYPE999 to give it a name.
>
>  - The NAXIS1 header is declared in the usual way to give the width
>       of a table row in bytes.  This is equal to the sum of
>       all the BINTABLE columns as usual.  It is also equal to
>       the sum of all the data columns, which has the same value.
>
>  - Headers for Data columns 1-998 are declared as usual,
>       corresponding to BINTABLE columns 1-998.
>
>  - Keyword XT_ICOL indicates the index of the container column.
>       It MUST be present with the integer value 999 to indicate
>       that this convention is in use.
>
>  - Keyword XT_NCOL indicates the total number of data columns encoded.
>       It MUST be present with an integer value equal to N_TOT.
>
>  - Metadata for each extended column is encoded with keywords
>       of the form XXXXXaaa, where XXXXX are the same keyword roots
>       as used for normal BINTABLE extensions, and aaa is a 3-digit
>       value in base 26 using the characters 'A' (0 in base 26) to
>       'Z' (25 in base 26), and giving the 1-based data column index
>       minus 999.  The sequence aaa MUST be exactly three characters
>       long (leading 'A's are required).  Thus the formats for data
>       columns 999, 1000, 1001, etc are declared with the keywords
>       TFORMAAA, TFORMAAB, TFORMAAC etc.
>
>  - This convention MUST NOT be used for N_TOT<=999.
>
> The resulting HDU is a completely legal FITS BINTABLE extension.
> Readers aware of this convention may use it to extract column
> data and metadata beyond the 999-column limit.
> Readers unaware of this convention will see 998 columns in their
> intended form, and an additional (possibly large) column 999
> which contains byte data but which cannot be easily interpreted.
>
> This convention can therefore allow encoding of tables with data
> column counts N_TOT up to 998+26^3 = 18574.
>
> An example header might look like this:
>
>    XTENSION= 'BINTABLE'           /  binary table extension
>    BITPIX  =                    8 /  8-bit bytes
>    NAXIS   =                    2 /  2-dimensional table
>    NAXIS1  =                 9229 /  width of table in bytes
>    NAXIS2  =                   26 /  number of rows in table
>    PCOUNT  =                    0 /  size of special data area
>    GCOUNT  =                    1 /  one data group
>    TFIELDS =                  999 /  number of columns
>    XT_ICOL =                  999 /  index of container column
>    XT_NCOL =                 1204 /  total columns including extended
>    TTYPE1  = 'posid_1 '           /  label for column 1
>    TFORM1  = 'J       '           /  format for column 1
>    TTYPE2  = 'instrument_1'       /  label for column 2
>    TFORM2  = '4A      '           /  format for column 2
>    TTYPE3  = 'edge_code_1'        /  label for column 3
>    TFORM3  = 'I       '           /  format for column 3
>    TUCD3   = 'meta.code.qual'
>     ...
>    TTYPE998= 'var_min_s_2'        /  label for column 998
>    TFORM998= 'D       '           /  format for column 998
>    TUNIT998= 'counts/s'           /  units for column 998
>    TTYPE999= 'XT_MORECOLS'        /  label for column 999
>    TFORM999= '813I    '           /  format for column 999
>    TTYPEAAA= 'var_min_u_2'        /  label for column 999
>    TFORMAAA= 'D       '           /  format for column 999
>    TUNITAAA= 'counts/s'           /  units for column 999
>    TTYPEAAB= 'var_prob_h_2'       /  label for column 1000
>    TFORMAAB= 'D       '           /  format for column 1000
>     ...
>    TTYPEAHW= 'var_prob_w_2'       /  label for column 1203
>    TFORMAHW= 'D       '           /  format for column 1203
>    TTYPEAHX= 'var_sigma_w_2'      /  label for column 1204
>    TFORMAHX= 'D       '           /  format for column 1204
>    TUNITAHX= 'counts/s'           /  units for column 1204
>    END
>
> This general approach was suggested by William Pence on the FITSBITS
> list in June 2012
> (https://listmgr.nrao.edu/pipermail/fitsbits/2012-June/002367.html),
> and by Francois-Xavier Pineau (CDS) in private conversation in 2016.
> The details have been filled in by Mark Taylor (Bristol).
> (F-X favours a different mechanism for encoding the extended
> column metadata).
>
> --
> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
>
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 7 Jul 2017 05:51:17 -0700
> From: Rob Seaman <seaman at lpl.arizona.edu>
> To: fitsbits at listmgr.nrao.edu
> Subject: Re: [fitsbits] BINTABLE convention for >999 columns
> Message-ID: <9cfd2764-bf73-4c2f-d111-a15e3a2ccaca at lpl.arizona.edu>
> Content-Type: text/plain; charset=utf-8
>
> Why not simply split such tables into two+ extensions and join as needed?
>
> Rob
>
> --
>
>
> On 7/7/17 4:09 AM, Mark Taylor wrote:
> > Dear fitsbits,
> >
> > I am considering a convention for storing table data in FITS files
> > where the number of columns exceeds the 999 limit implicitly imposed
> > by the standard BINTABLE extension type.  I have running code for
> > this (available on request) and plan to incorporate it in future
> > releases of STIL/STILTS/TOPCAT so that people can work with wide
> > tables in FITS while using those tools.  People using software
> > that is unaware of this convention would still see a legal BINTABLE
> > but not the later columns.
> >
> > I'm posting the details here in case people want to comment,
> > or point out some major problem with the idea that I might have
> > overlooked, or tell me that there's already a convention for
> > this out there that I should be using instead.  Otherwise, please
> > feel free to ignore this post.  I'm not requesting that any
> > other software implements this, though if anyone wants to I
> > certainly don't object.
> >
> > Mark
> >
> > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> >
> > Extended column convention for FITS BINTABLE
> > --------------------------------------------
> >
> > The BINTABLE extension type as described in the FITS Standard
> > (FITS Standard v3.0, sec 7.3) requires table column metadata
> > to be described using 8-character keywords of the form XXXXXnnn,
> > where XXXXX represents one of an open set of mandatory, reserved
> > or user-defined root keywords up to five characters in length,
> > for instance TFORM (mandatory), TUNIT (reserved), TUCD (user-defined).
> > The nnn part is an integer between 1 and 999 indicating the
> > index of the column to which the keyword in question refers.
> > Since the header syntax confines this indexed part of the keyword
> > to three digits, there is an upper limit of 999 columns in
> > BINTABLE extensions.
> >
> > Note that the FITS/BINTABLE format does not entail any restriction on
> > the storage of column *data* beyond the 999 column limit in the data
> > part of the HDU, the problem is just that client software
> > cannot be informed about the layout of this data using the
> > header cards in the usual way.
> >
> > In some cases it is desirable to store FITS tables with a column
> > count greater than 999.  Whether that's a good idea is not within
> > the scope of this discussion.
> >
> > To achieve this, I propose the following convention.
> >
> > Definitions:
> >
> >  - 'BINTABLE columns' are those columns defined using the
> >       FITS BINTABLE standard
> >
> >  - 'Data columns' are the columns to be encoded
> >
> >  - N_TOT is the total number of data columns to be stored
> >
> >  - Data columns with (1-based) indexes from 999 to N_TOT inclusive
> >       are known as 'extended' columns.  Their data is stored
> >       within the 'container' column.
> >
> >  - BINTABLE column 999 is known as the 'container' column
> >       It contains the byte data for all the 'extended' columns.
> >
> > Convention:
> >
> >  - All column data (for columns 1 to N_TOT) is laid out in the data part
> >       of the HDU in exactly the same way as if there were no 999-column
> >       limit.
> >
> >  - The TFIELDS header is declared with the value 999.
> >
> >  - The container column is declared in the header with some
> >       TFORM999 value corresponding to the total field length required
> >       by all the extended columns ('B' is the obvious data type, but
> >       any legal TFORM value that gives the right width MAY be used).
> >       The byte count implied by TFORM999 MUST be equal to the
> >       total byte count implied by all extended columns.
> >
> >  - Other XXXXX999 headers MAY optionally be declared to describe
> >       the container column in accordance with the usual rules,
> >       e.g. TTYPE999 to give it a name.
> >
> >  - The NAXIS1 header is declared in the usual way to give the width
> >       of a table row in bytes.  This is equal to the sum of
> >       all the BINTABLE columns as usual.  It is also equal to
> >       the sum of all the data columns, which has the same value.
> >
> >  - Headers for Data columns 1-998 are declared as usual,
> >       corresponding to BINTABLE columns 1-998.
> >
> >  - Keyword XT_ICOL indicates the index of the container column.
> >       It MUST be present with the integer value 999 to indicate
> >       that this convention is in use.
> >
> >  - Keyword XT_NCOL indicates the total number of data columns encoded.
> >       It MUST be present with an integer value equal to N_TOT.
> >
> >  - Metadata for each extended column is encoded with keywords
> >       of the form XXXXXaaa, where XXXXX are the same keyword roots
> >       as used for normal BINTABLE extensions, and aaa is a 3-digit
> >       value in base 26 using the characters 'A' (0 in base 26) to
> >       'Z' (25 in base 26), and giving the 1-based data column index
> >       minus 999.  The sequence aaa MUST be exactly three characters
> >       long (leading 'A's are required).  Thus the formats for data
> >       columns 999, 1000, 1001, etc are declared with the keywords
> >       TFORMAAA, TFORMAAB, TFORMAAC etc.
> >
> >  - This convention MUST NOT be used for N_TOT<=999.
> >
> > The resulting HDU is a completely legal FITS BINTABLE extension.
> > Readers aware of this convention may use it to extract column
> > data and metadata beyond the 999-column limit.
> > Readers unaware of this convention will see 998 columns in their
> > intended form, and an additional (possibly large) column 999
> > which contains byte data but which cannot be easily interpreted.
> >
> > This convention can therefore allow encoding of tables with data
> > column counts N_TOT up to 998+26^3 = 18574.
> >
> > An example header might look like this:
> >
> >    XTENSION= 'BINTABLE'           /  binary table extension
> >    BITPIX  =                    8 /  8-bit bytes
> >    NAXIS   =                    2 /  2-dimensional table
> >    NAXIS1  =                 9229 /  width of table in bytes
> >    NAXIS2  =                   26 /  number of rows in table
> >    PCOUNT  =                    0 /  size of special data area
> >    GCOUNT  =                    1 /  one data group
> >    TFIELDS =                  999 /  number of columns
> >    XT_ICOL =                  999 /  index of container column
> >    XT_NCOL =                 1204 /  total columns including extended
> >    TTYPE1  = 'posid_1 '           /  label for column 1
> >    TFORM1  = 'J       '           /  format for column 1
> >    TTYPE2  = 'instrument_1'       /  label for column 2
> >    TFORM2  = '4A      '           /  format for column 2
> >    TTYPE3  = 'edge_code_1'        /  label for column 3
> >    TFORM3  = 'I       '           /  format for column 3
> >    TUCD3   = 'meta.code.qual'
> >     ...
> >    TTYPE998= 'var_min_s_2'        /  label for column 998
> >    TFORM998= 'D       '           /  format for column 998
> >    TUNIT998= 'counts/s'           /  units for column 998
> >    TTYPE999= 'XT_MORECOLS'        /  label for column 999
> >    TFORM999= '813I    '           /  format for column 999
> >    TTYPEAAA= 'var_min_u_2'        /  label for column 999
> >    TFORMAAA= 'D       '           /  format for column 999
> >    TUNITAAA= 'counts/s'           /  units for column 999
> >    TTYPEAAB= 'var_prob_h_2'       /  label for column 1000
> >    TFORMAAB= 'D       '           /  format for column 1000
> >     ...
> >    TTYPEAHW= 'var_prob_w_2'       /  label for column 1203
> >    TFORMAHW= 'D       '           /  format for column 1203
> >    TTYPEAHX= 'var_sigma_w_2'      /  label for column 1204
> >    TFORMAHX= 'D       '           /  format for column 1204
> >    TUNITAHX= 'counts/s'           /  units for column 1204
> >    END
> >
> > This general approach was suggested by William Pence on the FITSBITS
> > list in June 2012
> > (https://listmgr.nrao.edu/pipermail/fitsbits/2012-June/002367.html),
> > and by Francois-Xavier Pineau (CDS) in private conversation in 2016.
> > The details have been filled in by Mark Taylor (Bristol).
> > (F-X favours a different mechanism for encoding the extended
> > column metadata).
> >
> > --
> > Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
> > m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
> >
> > _______________________________________________
> > fitsbits mailing list
> > fitsbits at listmgr.nrao.edu
> > https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>
>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 7 Jul 2017 09:45:21 -0400
> From: Demitri Muna <demitri.muna at gmail.com>
> To: Mark Taylor <m.b.taylor at bristol.ac.uk>
> Cc: fitsbits at listmgr.nrao.edu
> Subject: Re: [fitsbits] BINTABLE convention for >999 columns
> Message-ID: <8A1AC488-6528-4C3A-A0D2-1226C6390F21 at gmail.com>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi,
>
> I agree with Rob here; the simplest solution is to spread the data into
> two or more extensions. It's not a lot of work for the end user to
> concatenate the columns into a single data structure if that is preferable
> for some reason. Creating a new convention that is not part of the FITS
> standard *does* create a lot of work for many people. While you may be able
> to create a technically valid FITS file, this proposal is not in the spirit
> of how FITS files are to be read. This proposal literally redefines the
> meaning of the mandatory "TFIELDS" header from the "number of columns in
> the table" to "number of columns in the table, except if there are other
> keywords, then in that case look elsewhere for this information".
>
> On Jul 7, 2017, at 7:09 AM, Mark Taylor <m.b.taylor at bristol.ac.uk> wrote:
>
> > I'm posting the details here in case people want to comment,
> > or point out some major problem with the idea that I might have
> > overlooked, or tell me that there's already a convention for
> > this out there that I should be using instead.  Otherwise, please
> > feel free to ignore this post.  I'm not requesting that any
> > other software implements this, though if anyone wants to I
> > certainly don't object.
>
> I don't think it's as simple as that. It's one thing to implement this in
> the software you support, but there are other FITS viewers/readers (Astropy
> and cfitsio being the main ones, whatever IDL routines there are, not to
> mention Nightlight). I think it would be wrong for the other programs to
> implement this without it being part of the standard, and I think it's a
> bad idea to fork the standard with a custom implementation. Feelings about
> standards aside, this provides for a bad user experience. It's a legitimate
> question/frustration for a user to wonder why some columns appear in one
> program and not many others, especially when the file claims to be a FITS
> file.
>
> I agree that there are limitations in the FITS format, but I strongly
> suggest that the only way forward for this idea is to propose it (or
> something similar) as part of the official FITS format or else use multiple
> extensions.
>
> Cheers,
> Demitri
>
>
> http://nightlightapp.io
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://listmgr.nrao.edu/pipermail/fitsbits/
> attachments/20170707/cd9361c6/attachment.html>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>
>
> ------------------------------
>
> End of fitsbits Digest, Vol 117, Issue 1
> ****************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/fitsbits/attachments/20170707/66e53ab1/attachment-0001.html>


More information about the fitsbits mailing list