[fitsbits] BINTABLE/TABLE column count limitation
Mark Taylor
m.b.taylor at bristol.ac.uk
Wed Jun 6 06:57:03 EDT 2012
There is an acknowledged limitation of 999 on the maximum number of
columns in a FITS table.
The TABLE and BINTABLE extensions define the following per-column fields:
TBCOLn (ASCII table only)
TDIMn (Binary table only)
to describe per-column metadata for the encoded table. Along with
the 8-character limitation on header card keywords, this limits the
number of columns that can be described to 999; TFORM999 is legal,
but TFORM1000 is not. The standard explicitly constrains the value of
the TFIELDS keyword to <=999 in acknowledgement of this limitation.
With column counts from the large surveys in the hundreds, a couple of
table joins can have you hitting this restriction. I don't know what
the data from LSST etc will look like, but extrapolating survey
column counts over time would suggest that single tables in the
thousand-column range may be upon us soon. A user of TOPCAT
has recently reported encountering this in science use
so already it's not merely a theoretical problem.
As a pragmatic solution, I suggest the following convention. Columns
before the 1000th in any table are described as per the existing
standard, but for columns 1000-9999, the 5th alphabetic character
of the Txxxx keyword (if present - no change reqired for TDIMn)
is removed to make space for an additional digit. The existing
constraint that the TFIELDS value shall be <=999 is also of
course relaxed to <=9999. Thus:
TFIELDS = 2112
TFORM998= 'D'
TTYPE998= 'foo'
TFORM999= 'D'
TTYPE999= 'foo_err'
TFOR1000= 'D'
TTYP1000= 'bar'
TFOR1001= 'D'
TTYP1001= 'bar_err'
Under this rule, any table with fewer than 1000 columns looks exactly
the same as it does now. Columns >999 in wide tables will be unreadable
by software which is not aware of the convention, but such software
would be incapable of dealing with 1000+-column tables in any case.
Depending on implementation, non-aware software may be able to make
sense of the first 999 columns of wide tables.
I am considering implementing this convention in the FITS I/O handlers
used by STIL (the java table library used by TOPCAT and STILTS as well
as some other client- and server-side applications). If nothing else
this will enable STIL users to generate syntactically legal FITS files
(though containing illegal BINTABLE extensions) representing 1000+
column tables which they can use within STIL (e.g. to save/load tables
in TOPCAT), even if such files are not legible by other FITS table
applications, while I/O of tables with <1000 columns will be unaffected.
However, if others choose to implement the same convention it could
become a de facto standard for wide tables, and possibly a candidate
for an update of the BINTABLE convention in a future version of the
FITS standard.
Does anybody forsee problems with this suggestion, or want to suggest
a better alternative? The only possible backward compatibility issue
or unintended consequence I can think of is if there are already
keywords along the lines of TFORxxxx (x=[0-9]) in use in existing
table headers, but it seems rather unlikely. The other question is
whether 9999 is enough. 1e5-column tables are probably a little
way off, and extending this scheme to 5-digit column indices would
be problematic since TDIMn and TDISPn would both degenerate to TDInnnnn,
so I'd suggest punting that issue to future generations.
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
More information about the fitsbits
mailing list