[fitsbits] BINTABLE convention for >999 columns

Tom McGlynn (NASA/GSFC Code 660.1) tom.mcglynn at nasa.gov
Mon Jul 10 16:26:20 EDT 2017


It's really interesting to hear the varied opinions that have been put forward here -- especially 
opinions completely at variance with my own from others whom I respect.

My sense of solutions to the problem is:

1.  A comprehensive and natural solution allowing essentially arbitrary numbers of columns (e.g., > 
10^20) but incompatible with the current FITS standard: Allow more than 8 characters in keyword names.
  -- Too big a change to be contemplated in the real world alas, but it attacks the real limitation 
here.  It would have lots of other consequences and benefits.  Very substantial changes required in 
lots of places.

2. A  solution compatible with the current FITS standard that gives a couple of orders of magnitude 
increase in range:  Using base 36 indices in some format.
-- This is pretty backwards compatible, but there can be problems with any FITS file that might have 
used keywords that happens to have the same first 5 characters as some column  keyword.  If would 
actually make any existing FITS table that had, e.g., the keyword TFORMATS illegal ( Since for 
TFORMnnn must have nnn < TFIELDS).  This will require rather a lot of changes to code since parsing 
out numeric indexes is quite a bit more complex than currently.    However what I find most 
unattractive about it is the clearly kludgy nature of it.  The arbitrary offsets for column numbers 
or the use of lower case values are clear signals of kludges to me. Why would we want to start using 
base 36 numbers in FITS files after all of these years?   To me this solution screams that FITS is 
obsolete.  Of course if this usage is to be restricted to essentially internal storage within some 
one program, a kludge is fine.  However if we're looking at a standard usage then something cleaner 
seems desirable.

3. A compatible solution that gives essentially an arbitrary limit: Using Hierarch keywords
-- I think this is fully backwards compatible, i.e., it doesn't invalidate any existing (or 
speculatively existing) FITS file. Effectively it's the same as #1, but using a non-standard 
approach for allowing longer keyword values.  For code that already implements HIERARCH keyword 
support relatively little work might be required, though I could imagine hardwired 999 row limits 
needing to be handled.

For code that doesn't it could be fair bit of work to implement -- especially if the 8 character 
limit is deeply embedded in the header reading software.   I am unclear how widespread support for 
HIERARCH is.  Both CFITSIO and the Java libraries support it -- and I presume it has some support 
from Europe whence it came.

There is one kludge, the specialness of column 999.  That's shared with the previous solution 
(assuming #2 is backwards compatible though if you didn't require backwards compatibility in 2, 
column 999 needn't be special), but it's not elegant.  It has the same flavor as the use of PCOUNT 
to reserve space for variable length records though so it's not unprecedented.

4. A compatible solution that supports an order of magnitude increase but destroys the table 
structure: Splitting into multiple HDUs.
This isn't really a solution --it's a convention for saying here's how I transform the data when I 
can't preserve the structure due to the FITS limit.  The lack of support for streaming is fatal to 
me.  I think that's probably an issue even if you're looking at using it within a single program 
since the whole idea is to be able to process the data efficiently.


Finally with regard to what is an appropriate limit?  We don't see many tables with more than a few 
hundred columns, but it's hard to know if this is a chicken and egg problem.  If user's can't create 
them then they don't think in terms of very wide tables.  One can envisage very large tables when 
doing things like data mining, e.g.,
generating a table which is the number of occurrences of each word in a dictionary in a given 
document.  A more astrophysical application might be involve matches to a large set of basis 
vectors.   There might be interesting approaches to analyzing photon lists where we transpose the 
usual approach of having individual photons as
rows and make the photons the columns  -- similarly for source lists.

Just my 2 cents..

     Tom



More information about the fitsbits mailing list