[fitsbits] BINTABLE/TABLE column count limitation
William Pence
William.Pence at nasa.gov
Thu Jun 7 01:30:22 EDT 2012
Mark,
I can think of several serious compatibility issues for existing FITS
reading software with your proposed convention which effectively tries
to support having more "pseudo" columns that occupy space in the table
following the "standard" 999 column limit. Note that I'm assuming here
that the proposal would only apply to binary tables, not to ASCII tables.
1. In order to preserve compatibility with existing FITS readers, and
to conform to the FITS Standard, the value of the TFIELDS keyword must
be equal to the number of "standard" columns in the table and must have
a value between 0 and 999, inclusive. Even if an existing FITS reader
doesn't immediately abort when it sees a TFIELDS value greater than 999,
the reader would almost certainly fail when it could not find the
TFORM1000 (and greater) keywords. CFITSIO, as an example, when it first
opens a FITS binary table, must construct an internal structure that,
among other things, gives the byte offset in the row to the start of
every column in the table. Since CFITSIO would be unable to determine
the widths of the columns beyond column 999, it would be forced to exit
with a fatal file format error. To get around this problem, it would be
necessary, I think, to continue to place a maximum limit of 999 on the
value of the TFIELDS keyword, and then define a new non-standard keyword
to specify the number of additional pseudo columns in the table (i.e.,
the number of columns beyond the 999 standard columns).
2. The NAXIS1 keyword must give the physical width of the table in
bytes, which would necessarily have to include the width of all the
pseudo columns. However, the Standard also requires that the value of
the NAXIS1 keyword be equal to the sum of the widths of all the
individual standard columns (not including the widths of any pseudo
columns). I suspect that many existing FITS readers perform a sanity
check to ensure that this requirement is met, and if it isn't, abort
with a fatal file format error (my CFITSIO code certainly does). The
only way (or at least one way) I can see to reconcile these 2
requirements so that existing FITS readers can read the table is to
reserve one of the standard columns (most likely column 999) as a
fictitious placeholder column of type 'B' with a vector width that is
equal to the sum of the width of all the pseudo columns. In other
words, this fake 999th standard column would reserve the total space
needed by all the pseudo columns. FITS readers that do not understand
this new convention would just interpret the 999th column as a wide 'B'
column (e.g., '8000B') whereas knowledgeable FITS readers would know
that this space is actually filled with the values of all the pseudo
columns, as defined by the TFORnnnn keywords.
Granted, a convention such as this could be defined, but it is not
nearly as simple as implied in your proposal. It seems to me that this
additional complexity would be a big drawback to winning wide-spread
acceptance of the proposal.
3. Finally, as already mentioned by Arnold Rots, there are many more
per-column keywords currently in use than the 9 listed in your proposal.
There are roughly 40 additional per-column WCS keywords defined in the
FITS Standard. In addition, there are an unknown number of other
per-column keyword that have been defined in local conventions (the
HEASARC's TLMINnnn and TLMAXnnn keywords are good examples). It would
be very difficult to come up with a complete list of all these keywords.
regards,
Bill
On 6/6/2012 6:57 AM, Mark Taylor wrote:
> Hi FITS,
>
> There is an acknowledged limitation of 999 on the maximum number of
> columns in a FITS table.
>
> The TABLE and BINTABLE extensions define the following per-column fields:
>
> TBCOLn (ASCII table only)
> TDIMn (Binary table only)
> TDISPn
> TFORMn
> TNULLn
> TSCALn
> TTYPEn
> TUNITn
> TZEROn
>
> to describe per-column metadata for the encoded table. Along with
> the 8-character limitation on header card keywords, this limits the
> number of columns that can be described to 999; TFORM999 is legal,
> but TFORM1000 is not. The standard explicitly constrains the value of
> the TFIELDS keyword to<=999 in acknowledgement of this limitation.
>
> With column counts from the large surveys in the hundreds, a couple of
> table joins can have you hitting this restriction. I don't know what
> the data from LSST etc will look like, but extrapolating survey
> column counts over time would suggest that single tables in the
> thousand-column range may be upon us soon. A user of TOPCAT
> has recently reported encountering this in science use
> (https://sympa.bris.ac.uk/sympa/arc/topcat-user/2012-06/msg00003.html),
> so already it's not merely a theoretical problem.
>
> As a pragmatic solution, I suggest the following convention. Columns
> before the 1000th in any table are described as per the existing
> standard, but for columns 1000-9999, the 5th alphabetic character
> of the Txxxx keyword (if present - no change reqired for TDIMn)
> is removed to make space for an additional digit. The existing
> constraint that the TFIELDS value shall be<=999 is also of
> course relaxed to<=9999. Thus:
>
> XTENSION= 'BINTABLE'
> ...
> TFIELDS = 2112
> ...
> TFORM998= 'D'
> TTYPE998= 'foo'
> TFORM999= 'D'
> TTYPE999= 'foo_err'
> TFOR1000= 'D'
> TTYP1000= 'bar'
> TFOR1001= 'D'
> TTYP1001= 'bar_err'
> ...
>
> Under this rule, any table with fewer than 1000 columns looks exactly
> the same as it does now. Columns>999 in wide tables will be unreadable
> by software which is not aware of the convention, but such software
> would be incapable of dealing with 1000+-column tables in any case.
> Depending on implementation, non-aware software may be able to make
> sense of the first 999 columns of wide tables.
>
> I am considering implementing this convention in the FITS I/O handlers
> used by STIL (the java table library used by TOPCAT and STILTS as well
> as some other client- and server-side applications). If nothing else
> this will enable STIL users to generate syntactically legal FITS files
> (though containing illegal BINTABLE extensions) representing 1000+
> column tables which they can use within STIL (e.g. to save/load tables
> in TOPCAT), even if such files are not legible by other FITS table
> applications, while I/O of tables with<1000 columns will be unaffected.
>
> However, if others choose to implement the same convention it could
> become a de facto standard for wide tables, and possibly a candidate
> for an update of the BINTABLE convention in a future version of the
> FITS standard.
>
> Does anybody forsee problems with this suggestion, or want to suggest
> a better alternative? The only possible backward compatibility issue
> or unintended consequence I can think of is if there are already
> keywords along the lines of TFORxxxx (x=[0-9]) in use in existing
> table headers, but it seems rather unlikely. The other question is
> whether 9999 is enough. 1e5-column tables are probably a little
> way off, and extending this scheme to 5-digit column indices would
> be problematic since TDIMn and TDISPn would both degenerate to TDInnnnn,
> so I'd suggest punting that issue to future generations.
>
> Mark
>
> --
> Mark Taylor Astronomical Programmer Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk +44-117-928-8776 http://www.star.bris.ac.uk/~mbt/
>
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.cv.nrao.edu
> http://listmgr.cv.nrao.edu/mailman/listinfo/fitsbits
--
____________________________________________________________________
Dr. William Pence William.Pence at nasa.gov
NASA/GSFC Code 662 HEASARC +1-301-286-4599 (voice)
Greenbelt MD 20771 +1-301-286-1684 (fax)
More information about the fitsbits
mailing list