[fitsbits] BINTABLE convention for >999 columns

Mark Taylor m.b.taylor at bristol.ac.uk
Mon Jul 10 08:22:05 EDT 2017


Bill and others,

On Fri, 7 Jul 2017, William Pence wrote:

> Where do these wide FITS tables (> 999 columns) that you are proposing to support come from in the first place?  Are you just trying to support conversion of other tabular formats that can support more than 999 columns into FITS format?  If so, I don't see the point since no other existing software will be able to read them properly. 

I wasn't planning to get into much discussion of the justification
for this, since given that I wasn't asking anybody else to do anything
about it, I didn't think it was necessary.  However, as the
question has arisen, let me explain the reason I am considering it.

Over the last few years, I have had a handful of complaints from
users of TOPCAT or STILTS that they wanted to save wide tables
as FITS but have been unable to because of the 999 column limit.
The reason for this is nearly always that they have performed
a join of two or more wide tables.  The example header that I posted
was the result of an internal match to find pairs of objects in
the obi_source catalogue from the Chandra Source Catalog TAP service.
That table has 602 columns, so an internal match gives 1204 columns.
That was just an example I generated for the purposes of this
discussion rather than a scientifically motivated operation,
but several people have hit this limit in the course of their
science work.

In most cases what they want is, I believe, some way to save the
table for subsequent work within topcat/stilts, rather than for
exchange with the wider community of FITS-compliant software tools.
If there was some kind of publicly understood convention which
allowed you to write >999 columns in FITS, well, that would be nice,
but the main problem I want to solve here is to allow read/write
cycles of such tables within TOPCAT/STILTS that lets me take
advantage of some of the benefits of the FITS BINTABLE format.

So as it stands, my options in responding to users who want to save
such tables are:

   1. tell them they have to remove some columns before save
   2. tell them to save in some other supported format, e.g. CSV or VOTable
   3. invent an entirely new binary format for wide tables
   4. lobby the FITS community to support >999 columns in the standard
   5. work out a way to shoehorn >999 columns into FITS BINTABLEs

Regarding Option 1, one might argue that users would be better
off restricting the number of columns of input tables before doing
such matches, since it's unlikely that they really need to retain
all that information.  But really, I don't see it as my job
as an application author to enforce that (and if I did, should
I then refuse to save tables with >999 columns in CSV and
VOTable as well?  And why stop at the arbitrary figure of 999,
should I decree that e.g. 500 is the widest any well-behaved
astronomer should want to work with and set the limit there?)

Option 2 is currently what I have to do.  But no other format I
currently support is nearly as good as FITS BINTABLE for efficient
data access; the reason I like BINTABLE is that the data is laid
out predictably on disk so that I can map the HDU and perform
random access on the relevant memory buffer(s) directly
(note this also precludes the use of compression).
I can also stream it from an HTTP URL.  So forcing users to use
e.g. VOTable makes their data interactions much slower; this
manifests itself as much slower load times in topcat for tables
that are long as well as wide, and additional usage of temporary
file space or heap memory.

Option 3 would really be a lot of work and I don't see what benefit
it would bring.

Option 4, I do not expect to deliver a solution in the short,
or even medium term.  I raised this question on FITSBITS in 2012
(with an alternative suggestion for how to tackle it, which got
an even less enthusiastic response than this one), and there has
been no progress on it since then.
I'm not even persuaded that the FITS community *should* come up
with a standards-based solution specifically targetted at this.
It is, at least for now, quite a niche problem, and I'm not
arguing that it warrants a change to the FITS standard
(which, for some good reasons, is a heavyweight process).
I tried to make clear in my initial posting that I was
not requesting or expecting any change in public standards or
external software in response to this suggestion.

Option 5 is what I'm suggesting here.  Honestly it looks like the
best of the bunch to me for the problem that I'm trying to address.
The idea of splitting a single table over multiple HDUs I find
unattractive for reasons outlined in this thread by myself and
by Tom McGlynn.

> Also, will TOPCAT have the ability to insert or delete columns within these wide FITS tables?  That is a rather complicated process. 

Yes, certainly.  I admit there are some GUI issues involved in
navigating a 1000+ column list, but otherwise I'm not sure what
complications you have in mind.

> The main issue I see with your convention is that it only provides a modest increase in the maximum number of columns from 999 to about 18000.  I'd prefer a convention that places no limit on the number of columns.   One of the previous posters suggested using the HIERARCH convention for encoding keywords like 'TFORM12345', which seems to me to be a more robust and easier to understand convention than using base 26 encoded strings. 

The motivating use case I present above suggests that
no more than a few thousand columns are likely to be required
in the forseeable future; the largest tables I see in VizieR or
GloTS are 400-600 columns, and joins of more than about ten tables
seem unlikely.  In practice I haven't come up against astronomy
use cases requiring more than 2000 columns.  That looks like 18k
is probably enough, though admittedly not by an obviously
future-proof margin.  It's always possible though that there may
be other use cases that would benefit from much wider tables.

My XXXXXaaa suggestion came from reluctance to require other
non-standard conventions (HIERARCH) in order to use this one.
However, since the consensus among those here who have given
this suggestion a sympathetic hearing seems to be against
the base-26 approach, I will give some more thought to something
along those lines.

Mark

--
Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/



More information about the fitsbits mailing list