[fitsbits] BINTABLE/TABLE column count limitation

Mark Taylor m.b.taylor at bristol.ac.uk
Wed Jun 6 06:57:03 EDT 2012


Hi FITS,

There is an acknowledged limitation of 999 on the maximum number of
columns in a FITS table.

The TABLE and BINTABLE extensions define the following per-column fields:

   TBCOLn  (ASCII table only)
   TDIMn   (Binary table only)
   TDISPn
   TFORMn
   TNULLn
   TSCALn
   TTYPEn
   TUNITn
   TZEROn

to describe per-column metadata for the encoded table.  Along with
the 8-character limitation on header card keywords, this limits the
number of columns that can be described to 999; TFORM999 is legal,
but TFORM1000 is not.  The standard explicitly constrains the value of
the TFIELDS keyword to <=999 in acknowledgement of this limitation.

With column counts from the large surveys in the hundreds, a couple of
table joins can have you hitting this restriction.  I don't know what
the data from LSST etc will look like, but extrapolating survey 
column counts over time would suggest that single tables in the 
thousand-column range may be upon us soon.  A user of TOPCAT
has recently reported encountering this in science use
(https://sympa.bris.ac.uk/sympa/arc/topcat-user/2012-06/msg00003.html),
so already it's not merely a theoretical problem.

As a pragmatic solution, I suggest the following convention.  Columns
before the 1000th in any table are described as per the existing
standard, but for columns 1000-9999, the 5th alphabetic character 
of the Txxxx keyword (if present - no change reqired for TDIMn)
is removed to make space for an additional digit.  The existing
constraint that the TFIELDS value shall be <=999 is also of
course relaxed to <=9999.  Thus:

   XTENSION= 'BINTABLE'
   ...
   TFIELDS = 2112
   ...
   TFORM998= 'D'
   TTYPE998= 'foo'
   TFORM999= 'D'
   TTYPE999= 'foo_err'
   TFOR1000= 'D'
   TTYP1000= 'bar'
   TFOR1001= 'D'
   TTYP1001= 'bar_err'
   ...

Under this rule, any table with fewer than 1000 columns looks exactly
the same as it does now.  Columns >999 in wide tables will be unreadable
by software which is not aware of the convention, but such software
would be incapable of dealing with 1000+-column tables in any case.
Depending on implementation, non-aware software may be able to make
sense of the first 999 columns of wide tables.

I am considering implementing this convention in the FITS I/O handlers
used by STIL (the java table library used by TOPCAT and STILTS as well
as some other client- and server-side applications).  If nothing else
this will enable STIL users to generate syntactically legal FITS files
(though containing illegal BINTABLE extensions) representing 1000+
column tables which they can use within STIL (e.g. to save/load tables
in TOPCAT), even if such files are not legible by other FITS table
applications, while I/O of tables with <1000 columns will be unaffected.

However, if others choose to implement the same convention it could
become a de facto standard for wide tables, and possibly a candidate
for an update of the BINTABLE convention in a future version of the
FITS standard.

Does anybody forsee problems with this suggestion, or want to suggest
a better alternative?  The only possible backward compatibility issue
or unintended consequence I can think of is if there are already
keywords along the lines of TFORxxxx (x=[0-9]) in use in existing
table headers, but it seems rather unlikely.  The other question is
whether 9999 is enough.  1e5-column tables are probably a little
way off, and extending this scheme to 5-digit column indices would
be problematic since TDIMn and TDISPn would both degenerate to TDInnnnn,
so I'd suggest punting that issue to future generations.

Mark

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/




More information about the fitsbits mailing list