[fitsbits] BINTABLE/TABLE column count limitation
Mark Taylor
m.b.taylor at bristol.ac.uk
Wed Jun 6 06:57:03 EDT 2012
Hi FITS,
There is an acknowledged limitation of 999 on the maximum number of
columns in a FITS table.
The TABLE and BINTABLE extensions define the following per-column fields:
TBCOLn (ASCII table only)
TDIMn (Binary table only)
TDISPn
TFORMn
TNULLn
TSCALn
TTYPEn
TUNITn
TZEROn
to describe per-column metadata for the encoded table. Along with
the 8-character limitation on header card keywords, this limits the
number of columns that can be described to 999; TFORM999 is legal,
but TFORM1000 is not. The standard explicitly constrains the value of
the TFIELDS keyword to <=999 in acknowledgement of this limitation.
With column counts from the large surveys in the hundreds, a couple of
table joins can have you hitting this restriction. I don't know what
the data from LSST etc will look like, but extrapolating survey
column counts over time would suggest that single tables in the
thousand-column range may be upon us soon. A user of TOPCAT
has recently reported encountering this in science use
(https://sympa.bris.ac.uk/sympa/arc/topcat-user/2012-06/msg00003.html),
so already it's not merely a theoretical problem.
As a pragmatic solution, I suggest the following convention. Columns
before the 1000th in any table are described as per the existing
standard, but for columns 1000-9999, the 5th alphabetic character
of the Txxxx keyword (if present - no change reqired for TDIMn)
is removed to make space for an additional digit. The existing
constraint that the TFIELDS value shall be <=999 is also of
course relaxed to <=9999. Thus:
XTENSION= 'BINTABLE'
...
TFIELDS = 2112
...
TFORM998= 'D'
TTYPE998= 'foo'
TFORM999= 'D'
TTYPE999= 'foo_err'
TFOR1000= 'D'
TTYP1000= 'bar'
TFOR1001= 'D'
TTYP1001= 'bar_err'
...
Under this rule, any table with fewer than 1000 columns looks exactly
the same as it does now. Columns >999 in wide tables will be unreadable
by software which is not aware of the convention, but such software
would be incapable of dealing with 1000+-column tables in any case.
Depending on implementation, non-aware software may be able to make
sense of the first 999 columns of wide tables.
I am considering implementing this convention in the FITS I/O handlers
used by STIL (the java table library used by TOPCAT and STILTS as well
as some other client- and server-side applications). If nothing else
this will enable STIL users to generate syntactically legal FITS files
(though containing illegal BINTABLE extensions) representing 1000+
column tables which they can use within STIL (e.g. to save/load tables
in TOPCAT), even if such files are not legible by other FITS table
applications, while I/O of tables with <1000 columns will be unaffected.
However, if others choose to implement the same convention it could
become a de facto standard for wide tables, and possibly a candidate
for an update of the BINTABLE convention in a future version of the
FITS standard.
Does anybody forsee problems with this suggestion, or want to suggest
a better alternative? The only possible backward compatibility issue
or unintended consequence I can think of is if there are already
keywords along the lines of TFORxxxx (x=[0-9]) in use in existing
table headers, but it seems rather unlikely. The other question is
whether 9999 is enough. 1e5-column tables are probably a little
way off, and extending this scheme to 5-digit column indices would
be problematic since TDIMn and TDISPn would both degenerate to TDInnnnn,
so I'd suggest punting that issue to future generations.
Mark
--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
More information about the fitsbits
mailing list