[fitsbits] BINTABLE/TABLE column count limitation

William Pence William.Pence at nasa.gov
Thu Jun 7 01:30:22 EDT 2012


Mark,

I can think of several serious compatibility issues for existing FITS 
reading software with your proposed convention which effectively tries 
to support having more "pseudo" columns that occupy space in the table 
following the "standard" 999 column limit.  Note that I'm assuming here 
that the proposal would only apply to binary tables, not to ASCII tables.

1.  In order to preserve compatibility with existing FITS readers, and 
to conform to the FITS Standard, the value of the TFIELDS keyword must 
be equal to the number of "standard" columns in the table and must have 
a value between 0 and 999, inclusive.  Even if an existing FITS reader 
doesn't immediately abort when it sees a TFIELDS value greater than 999, 
the reader would almost certainly fail when it could not find the 
TFORM1000 (and greater) keywords.  CFITSIO, as an example, when it first 
opens a FITS binary table, must construct an internal structure that, 
among other things, gives the byte offset in the row to the start of 
every column in the table.  Since CFITSIO would be unable to determine 
the widths of the columns beyond column 999, it would be forced to exit 
with a fatal file format error.  To get around this problem, it would be 
necessary, I think, to continue to place a maximum limit of 999 on the 
value of the TFIELDS keyword, and then define a new non-standard keyword 
to specify the number of additional pseudo columns in the table (i.e., 
the number of columns beyond the 999 standard columns).

2.  The NAXIS1 keyword must give the physical width of the table in 
bytes, which would necessarily have to include the width of all the 
pseudo columns.  However, the Standard also requires that the value of 
the NAXIS1 keyword be equal to the sum of the widths of all the 
individual standard columns (not including the widths of any pseudo 
columns). I suspect that many existing FITS readers perform a sanity 
check to ensure that this requirement is met, and if it isn't, abort 
with a fatal file format error (my CFITSIO code certainly does).  The 
only way (or at least one way) I can see to reconcile these 2 
requirements so that existing FITS readers can read the table is to 
reserve one of the standard columns (most likely column 999) as a 
fictitious placeholder column of type 'B' with a vector width that is 
equal to the sum of the width of all the pseudo columns.  In other 
words, this fake 999th standard column would reserve the total space 
needed by all the pseudo columns.  FITS readers that do not understand 
this new convention would just interpret the 999th column as a wide 'B' 
column (e.g., '8000B') whereas knowledgeable FITS readers would know 
that this space is actually filled with the values of all the pseudo 
columns, as defined by the TFORnnnn keywords.

Granted, a convention such as this could be defined, but it is not 
nearly as simple as implied in your proposal.  It seems to me that this 
additional complexity would be a big drawback to winning wide-spread 
acceptance of the proposal.

3.  Finally, as already mentioned by Arnold Rots, there are many more 
per-column keywords currently in use than the 9 listed in your proposal. 
  There are roughly 40 additional per-column WCS keywords defined in the 
FITS Standard.  In addition, there are an unknown number of other 
per-column keyword that have been defined in local conventions (the 
HEASARC's TLMINnnn and TLMAXnnn keywords are good examples).  It would 
be very difficult to come up with a complete list of all these keywords.

regards,
Bill

On 6/6/2012 6:57 AM, Mark Taylor wrote:
> Hi FITS,
>
> There is an acknowledged limitation of 999 on the maximum number of
> columns in a FITS table.
>
> The TABLE and BINTABLE extensions define the following per-column fields:
>
>     TBCOLn  (ASCII table only)
>     TDIMn   (Binary table only)
>     TDISPn
>     TFORMn
>     TNULLn
>     TSCALn
>     TTYPEn
>     TUNITn
>     TZEROn
>
> to describe per-column metadata for the encoded table.  Along with
> the 8-character limitation on header card keywords, this limits the
> number of columns that can be described to 999; TFORM999 is legal,
> but TFORM1000 is not.  The standard explicitly constrains the value of
> the TFIELDS keyword to<=999 in acknowledgement of this limitation.
>
> With column counts from the large surveys in the hundreds, a couple of
> table joins can have you hitting this restriction.  I don't know what
> the data from LSST etc will look like, but extrapolating survey
> column counts over time would suggest that single tables in the
> thousand-column range may be upon us soon.  A user of TOPCAT
> has recently reported encountering this in science use
> (https://sympa.bris.ac.uk/sympa/arc/topcat-user/2012-06/msg00003.html),
> so already it's not merely a theoretical problem.
>
> As a pragmatic solution, I suggest the following convention.  Columns
> before the 1000th in any table are described as per the existing
> standard, but for columns 1000-9999, the 5th alphabetic character
> of the Txxxx keyword (if present - no change reqired for TDIMn)
> is removed to make space for an additional digit.  The existing
> constraint that the TFIELDS value shall be<=999 is also of
> course relaxed to<=9999.  Thus:
>
>     XTENSION= 'BINTABLE'
>     ...
>     TFIELDS = 2112
>     ...
>     TFORM998= 'D'
>     TTYPE998= 'foo'
>     TFORM999= 'D'
>     TTYPE999= 'foo_err'
>     TFOR1000= 'D'
>     TTYP1000= 'bar'
>     TFOR1001= 'D'
>     TTYP1001= 'bar_err'
>     ...
>
> Under this rule, any table with fewer than 1000 columns looks exactly
> the same as it does now.  Columns>999 in wide tables will be unreadable
> by software which is not aware of the convention, but such software
> would be incapable of dealing with 1000+-column tables in any case.
> Depending on implementation, non-aware software may be able to make
> sense of the first 999 columns of wide tables.
>
> I am considering implementing this convention in the FITS I/O handlers
> used by STIL (the java table library used by TOPCAT and STILTS as well
> as some other client- and server-side applications).  If nothing else
> this will enable STIL users to generate syntactically legal FITS files
> (though containing illegal BINTABLE extensions) representing 1000+
> column tables which they can use within STIL (e.g. to save/load tables
> in TOPCAT), even if such files are not legible by other FITS table
> applications, while I/O of tables with<1000 columns will be unaffected.
>
> However, if others choose to implement the same convention it could
> become a de facto standard for wide tables, and possibly a candidate
> for an update of the BINTABLE convention in a future version of the
> FITS standard.
>
> Does anybody forsee problems with this suggestion, or want to suggest
> a better alternative?  The only possible backward compatibility issue
> or unintended consequence I can think of is if there are already
> keywords along the lines of TFORxxxx (x=[0-9]) in use in existing
> table headers, but it seems rather unlikely.  The other question is
> whether 9999 is enough.  1e5-column tables are probably a little
> way off, and extending this scheme to 5-digit column indices would
> be problematic since TDIMn and TDISPn would both degenerate to TDInnnnn,
> so I'd suggest punting that issue to future generations.
>
> Mark
>
> --
> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
>
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.cv.nrao.edu
> http://listmgr.cv.nrao.edu/mailman/listinfo/fitsbits
-- 
____________________________________________________________________
Dr. William Pence                       William.Pence at nasa.gov
NASA/GSFC Code 662       HEASARC        +1-301-286-4599 (voice)
Greenbelt MD 20771                      +1-301-286-1684 (fax)




More information about the fitsbits mailing list