[fitsbits] BINTABLE/TABLE column count limitation
Rob Seaman
seaman at noao.edu
Thu Jun 7 11:05:01 EDT 2012
On Jun 7, 2012, at 3:02 AM, Mark Taylor wrote:
> I agree there is a good chance that existing FITS software would fail to make any sense of a wide (i.e. >999 column) table; my suggestion that in some cases it might be able to do it was more in the nature of an added bonus than a serious selling point.
I'm concerned that this is rapidly random walking away from the implied requirements.
On Jun 6, 2012, at 3:57 AM, Mark Taylor wrote:
>> With column counts from the large surveys in the hundreds, a couple of table joins can have you hitting this restriction.
Perhaps the needed change is a convention for logically joining tables that remain physically distinct? It's hard to see how this trend can be sustainable otherwise.
>> A user of TOPCAT has recently reported encountering this in science use
>> (https://sympa.bris.ac.uk/sympa/arc/topcat-user/2012-06/msg00003.html),
But this single user appears to be the only one ever to run into the issue:
>>> From: Antonio Cava <acava at fis.ucm.es>
>>> To: Mark Taylor <m.b.taylor at bristol.ac.uk>
>>> Cc: topcat-user at sympa.bristol.ac.uk
>>> Subject: Re: Maximum number of columns
>>>
>>>> how interesting, as far as I know nobody has come up against this limit before.
>>>
>>> well, this could suggest that the problem is not so common after all and maybe I'm asking for something that should be approached in a different way (?)
and there's the question of what the table will be used for:
>>> The number of rows in the current case is not so large (>~10000-15000) but the problem is that after creating the Tables I use them in other scrips/programs (e.g. within IDL) and the FITS format is the most flexible format to work with.
Backwards compatibility sounds like a serious selling point to me. What's the point of joining them in TOPCAT if they won't be readable in IDL?
> Of the other suggestions raised in this thread, using multiple HDUs is a possibility, but it has significant disadvantages, for instance unsuitability for streaming I/O and the possibility of confusion when multiple tables are or may be stored in the same MEF.
Others have commented on ways to avoid confusion. What are the streaming requirements precisely? Streaming any complex data structure may require buffering of intermediate chunks.
A convention for denoting logically joined tables within a single file would require only a few per-HDU keywords. It would be scalable to any number of columns. The subtables would remain perfectly legal tables, even to software unaware of the convention. "Other scripts/programs" would retain the ability to read them and perform explicit joins downstream since they would match up row-by-row.
Also, the impact on advanced features like tile-compression for tables (http://fits.gsfc.nasa.gov/tiletable.pdf) should be considered. In fact, placing the table in column major format (with array typed output columns) could be another way to address the issue.
> The one I'm inclining towards is using non-decimal digits for >999, e.g. AAA=1000, AAB=1001, ABA=1026
Not sure there is an advantage to limiting this to alphas, as long as the first character is non-decimal.
> I'm starting to think that the complications may not make any of these schemes suitable for eventual incorporation into the FITS standard (though I'd be happy to be persuaded otherwise).
Whatever the "scheme", the process should be thoughtful and unhurried enough to be confident that the requirements are known and the solution is responsive to them.
> I may however implement one of them in STIL if I conclude that I need a FITS-like format to store wide tables; this would effectively be an internal data format with a close resemblance to FITS. In that case other software that needed to do something similar (perhaps there isn't any) would be free to implement the same unofficial extension.
I'm concerned that this issue came up two days ago and we're already talking about rejecting not only compliance with the FITS standard, but with registered FITS conventions. The entire point of the FITS WG (similar to the IVOA TCG) is to sort through the issues before adopting such changes, rather than hoping that some narrowly conceived local format will eventually become standardized. Historically the latter has not often occurred.
And more fundamentally yet, it is not clear whether any change at all is strongly required. In particular, should we be encouraging users to blindly join catalogs into tables with thousands of columns with no thought to ever winnow them down to include just those columns with utility for their own science? Isn't the obvious workaround for this particular user precisely to eliminate unnecessary columns before the join? With hundreds of columns to choose from, presumably many dozens are orthogonal to their purposes.
Rob
More information about the fitsbits
mailing list