[fitsbits] BINTABLE convention for >999 columns

Rob Seaman seaman at lpl.arizona.edu
Tue Jul 11 10:04:45 EDT 2017


Rather than indicating that FITS is obsolete, Mark's proposal indicates
that FITS remains flexible and extensible.

My previous comments were not a critique of the proposed solution (legal
FITS is legal FITS), but raised the question of whether this was a
problem demanding a solution at all. I'm still not convinced myself, but
it seems others are. The proposal is conforming FITS depending on how
one reads section 7.3.1: "The TFORMn keywords must be present for all
values n = 1, ..., TFIELDS and for no other values of n." Is that a
constraint on three numeric digits or on three alphanumerics? In the
latter case Tom's example of TFORMATS is also non-conforming.

Assuming the consensus is that this problem requires a solution, Tom
lays out most of the options. My varied opinions on these:

1. E.g. define a new WIDE_TABLE extension type. Over recent
years/decades the community generally responds to such suggestions by
saying "just use a BINTABLE, we already have tools to handle BINTABLEs".
The opaque container column here is a good example of what is sacrificed
by repurposing existing extension types. The community should not be
afraid of new extension types, however...

2. Mark's solution is not a kludge. He (or Bill, if that's where the
suggestion originated) uses the standard to achieve a broader purpose
than it was designed for. The byte layout of the wide table is very
elegantly the same as a narrow table. The keyword metadata is a
relatively natural extension of current usage. There is nothing magic
about base-10.

3. Hierarch keywords are not universally supported and remain
non-standard. Tying wide-table functionality to these keywords would
slow adoption, will unnecessarily complicate code, and will create
additional opportunities for bugs. They are also unnecessary for #2.

4. Multiple tables is also completely legal FITS. A keyword convention
for asserting a schema that bridges multiple tables would be useful for
other purposes, not just for wide tables, e.g., for dumping an entire
SQL DB of several tables into a FITS file and later reading it back.
Enabling multi-table joins on input and tabular splits on output would
be useful features for many applications independently of wide-table
functionality. And users would benefit from software that provides an
option of choosing either multiple tables or Mark's container column
strategy.

Given the column limitations of commercial DB software, supporting
ultra-wide tables in FITS seems the least of people's problems. Suggest
the astronomical community address this issue, with or without FITS,
when a pertinent science use case is discovered. This is an egg the FITS
chicken need not lay.

Rob
--

On 7/10/17 1:26 PM, Tom McGlynn (NASA/GSFC Code 660.1) wrote:
> It's really interesting to hear the varied opinions that have been put
> forward here -- especially opinions completely at variance with my own
> from others whom I respect.
>
> My sense of solutions to the problem is:
>
> 1.  A comprehensive and natural solution allowing essentially
> arbitrary numbers of columns (e.g., > 10^20) but incompatible with the
> current FITS standard: Allow more than 8 characters in keyword names.
>  -- Too big a change to be contemplated in the real world alas, but it
> attacks the real limitation here.  It would have lots of other
> consequences and benefits.  Very substantial changes required in lots
> of places.
>
> 2. A  solution compatible with the current FITS standard that gives a
> couple of orders of magnitude increase in range:  Using base 36
> indices in some format.
> -- This is pretty backwards compatible, but there can be problems with
> any FITS file that might have used keywords that happens to have the
> same first 5 characters as some column  keyword.  If would actually
> make any existing FITS table that had, e.g., the keyword TFORMATS
> illegal ( Since for TFORMnnn must have nnn < TFIELDS).  This will
> require rather a lot of changes to code since parsing out numeric
> indexes is quite a bit more complex than currently.    However what I
> find most unattractive about it is the clearly kludgy nature of it. 
> The arbitrary offsets for column numbers or the use of lower case
> values are clear signals of kludges to me. Why would we want to start
> using base 36 numbers in FITS files after all of these years?   To me
> this solution screams that FITS is obsolete.  Of course if this usage
> is to be restricted to essentially internal storage within some one
> program, a kludge is fine.  However if we're looking at a standard
> usage then something cleaner seems desirable.
>
> 3. A compatible solution that gives essentially an arbitrary limit:
> Using Hierarch keywords
> -- I think this is fully backwards compatible, i.e., it doesn't
> invalidate any existing (or speculatively existing) FITS file.
> Effectively it's the same as #1, but using a non-standard approach for
> allowing longer keyword values.  For code that already implements
> HIERARCH keyword support relatively little work might be required,
> though I could imagine hardwired 999 row limits needing to be handled.
>
> For code that doesn't it could be fair bit of work to implement --
> especially if the 8 character limit is deeply embedded in the header
> reading software.   I am unclear how widespread support for HIERARCH
> is.  Both CFITSIO and the Java libraries support it -- and I presume
> it has some support from Europe whence it came.
>
> There is one kludge, the specialness of column 999.  That's shared
> with the previous solution (assuming #2 is backwards compatible though
> if you didn't require backwards compatibility in 2, column 999 needn't
> be special), but it's not elegant.  It has the same flavor as the use
> of PCOUNT to reserve space for variable length records though so it's
> not unprecedented.
>
> 4. A compatible solution that supports an order of magnitude increase
> but destroys the table structure: Splitting into multiple HDUs.
> This isn't really a solution --it's a convention for saying here's how
> I transform the data when I can't preserve the structure due to the
> FITS limit.  The lack of support for streaming is fatal to me.  I
> think that's probably an issue even if you're looking at using it
> within a single program since the whole idea is to be able to process
> the data efficiently.
>
>
> Finally with regard to what is an appropriate limit?  We don't see
> many tables with more than a few hundred columns, but it's hard to
> know if this is a chicken and egg problem.  If user's can't create
> them then they don't think in terms of very wide tables.  One can
> envisage very large tables when doing things like data mining, e.g.,
> generating a table which is the number of occurrences of each word in
> a dictionary in a given document.  A more astrophysical application
> might be involve matches to a large set of basis vectors.   There
> might be interesting approaches to analyzing photon lists where we
> transpose the usual approach of having individual photons as
> rows and make the photons the columns  -- similarly for source lists.
>
> Just my 2 cents..
>
>     Tom



More information about the fitsbits mailing list