[fitsbits] BINTABLE convention for >999 columns
Tom McGlynn (NASA/GSFC Code 660.1)
tom.mcglynn at nasa.gov
Mon Jul 10 16:26:20 EDT 2017
It's really interesting to hear the varied opinions that have been put forward here -- especially
opinions completely at variance with my own from others whom I respect.
My sense of solutions to the problem is:
1. A comprehensive and natural solution allowing essentially arbitrary numbers of columns (e.g., >
10^20) but incompatible with the current FITS standard: Allow more than 8 characters in keyword names.
-- Too big a change to be contemplated in the real world alas, but it attacks the real limitation
here. It would have lots of other consequences and benefits. Very substantial changes required in
lots of places.
2. A solution compatible with the current FITS standard that gives a couple of orders of magnitude
increase in range: Using base 36 indices in some format.
-- This is pretty backwards compatible, but there can be problems with any FITS file that might have
used keywords that happens to have the same first 5 characters as some column keyword. If would
actually make any existing FITS table that had, e.g., the keyword TFORMATS illegal ( Since for
TFORMnnn must have nnn < TFIELDS). This will require rather a lot of changes to code since parsing
out numeric indexes is quite a bit more complex than currently. However what I find most
unattractive about it is the clearly kludgy nature of it. The arbitrary offsets for column numbers
or the use of lower case values are clear signals of kludges to me. Why would we want to start using
base 36 numbers in FITS files after all of these years? To me this solution screams that FITS is
obsolete. Of course if this usage is to be restricted to essentially internal storage within some
one program, a kludge is fine. However if we're looking at a standard usage then something cleaner
seems desirable.
3. A compatible solution that gives essentially an arbitrary limit: Using Hierarch keywords
-- I think this is fully backwards compatible, i.e., it doesn't invalidate any existing (or
speculatively existing) FITS file. Effectively it's the same as #1, but using a non-standard
approach for allowing longer keyword values. For code that already implements HIERARCH keyword
support relatively little work might be required, though I could imagine hardwired 999 row limits
needing to be handled.
For code that doesn't it could be fair bit of work to implement -- especially if the 8 character
limit is deeply embedded in the header reading software. I am unclear how widespread support for
HIERARCH is. Both CFITSIO and the Java libraries support it -- and I presume it has some support
from Europe whence it came.
There is one kludge, the specialness of column 999. That's shared with the previous solution
(assuming #2 is backwards compatible though if you didn't require backwards compatibility in 2,
column 999 needn't be special), but it's not elegant. It has the same flavor as the use of PCOUNT
to reserve space for variable length records though so it's not unprecedented.
4. A compatible solution that supports an order of magnitude increase but destroys the table
structure: Splitting into multiple HDUs.
This isn't really a solution --it's a convention for saying here's how I transform the data when I
can't preserve the structure due to the FITS limit. The lack of support for streaming is fatal to
me. I think that's probably an issue even if you're looking at using it within a single program
since the whole idea is to be able to process the data efficiently.
Finally with regard to what is an appropriate limit? We don't see many tables with more than a few
hundred columns, but it's hard to know if this is a chicken and egg problem. If user's can't create
them then they don't think in terms of very wide tables. One can envisage very large tables when
doing things like data mining, e.g.,
generating a table which is the number of occurrences of each word in a dictionary in a given
document. A more astrophysical application might be involve matches to a large set of basis
vectors. There might be interesting approaches to analyzing photon lists where we transpose the
usual approach of having individual photons as
rows and make the photons the columns -- similarly for source lists.
Just my 2 cents..
Tom
More information about the fitsbits
mailing list