[fitsbits] BINTABLE convention for >999 columns

Francois-Xavier PINEAU francois-xavier.pineau at astro.unistra.fr
Mon Jul 31 10:19:52 EDT 2017


Dear Mark, dear fitsbits,

The last version of the HIERARCH keyword conventions
(following the pointer Mark provided) seems to be:
https://fits.gsfc.nasa.gov/registry/hierarch/hierarch.pdf

I do not know why section 2 ("Generalized Conventions to Support Long 
Keyword
Names") of the previous version has been dropped:
https://fits.gsfc.nasa.gov/registry/hierarch/hierarch_20Aug2007.pdf

But I like the idea (in the 2007 version) of using HIERARCH as a possible
convention "to support keyword names that are longer than the 8-character
limit for a standard FITS keyword" (waiting for a possible updated 
version of the standard).

In section 2, there is no need for a 'name space' token.
Thus, one could consider 'XT' as unnecessary in the case of wide BINTABLEs.

When reading
HIERARCH XT TFORM999 = 'toto'
I expect a HIERARCH aware FITS parser to provide a keyword having the name
XT.TFORM999 (it is at least the behaviour of the FITS parser I have started
to write).

It means that a software has to be aware of the meaning of 'XT' to be able
to decode the last column of a wide BINTABLE.

Getting rid of 'XT', the keyword name would simply be TFORM999
(and TFORM1000, ...). After the metadata parsing phase, the FITS reader
will interpret wide BINTABLEs metadata exactly the same way as it
interpret and regular BINTABLEs metadata.

I support an "overloading" behaviour (like e.g. in CSS): same keywords
(TFIELDS, TXXXX999) in a more specialized context (HIERARCH) should be 
able to
overload "regular" keywords that ensure the legality of the FITS
file (again, waiting for an updated standard).


"The perception that it's too difficult to change the FITS standard" may
come from the fact that "there is no standard means for a FITS file to
communicate the formatting version it conforms to" (2015A&C....12..133T).
It may be the first point to address before possibly relaxing the 
8-character
and/or the upper case constraints on keyword names.



François-Xavier Pineau


Le 30/07/2017 à 23:21, Mark Taylor a écrit :
> Thanks all for your feedback and Bill for your summary.
> If some future version of FITS relaxes the 8-character limit I will
> certainly be happy to encode wide tables in that way.
> In the mean time, I will go ahead with the HIERARCH variant
> (which seems to be clearly more popular here than the base-26 variant)
> of the solution that I've described, with the expectation of its
> use only within TOPCAT/STIL, rather than as any kind of generally
> accepted FITS convention.
>
> Mark
>
> On Sun, 30 Jul 2017, William Pence wrote:
>
>> Mark,
>>
>> This seems to me to be a good solution to the particular use case you
>> outlined, namely to allow TOPCAT users to temporarily store the results from a
>> cross-correlation of 2 FITS tables for later analysis using TOPCAT.  This is
>> not intended to be a general solution for supporting very wide tables in FITS.
>> If the FITS community decided that this was a serious issue that should be
>> addressed, then I think a much better solution would be to just relax the
>> 8-character limit on the length of keyword names so that the column number
>> suffix on the keyword name can be longer than 3 digits.
>>
>> As an aside, I think this 8-character limit on keyword names is probably the
>> most serious current limitation in the FITS format.  Fixing this by allowing
>> free-format 80-character header records where the equals sign is no longer
>> required to be in byte 9 would not be difficult to implement and support.
>>
>> -Bill
>>
>> On 7/28/2017 10:05 AM, Mark Taylor wrote:
>>> Coming back to this after a bit of a breather:
>>>
>>> To summarise the dicussion, enthusiasm for my proposed
>>> convention for wide (>999 column) BINTABLES has not been
>>> universal, but I am still planning to implement something
>>> along these lines for my purposes (STIL/STILTS/TOPCAT).
>>> The possibility exists of other software deciding to recognise
>>> such a convention at some point in the future, but I'm not
>>> relying on that or even necessarily recommending it.
>>>
>>> In terms of the details, there was one main difference of opinion,
>>> namely how to store the column metadata for the 'extended'
>>> columns in the FITS header.  The suggestion I put forward was
>>> to use a base-26 number giving headers TFORMAAA - TFORMZZZ,
>>> which leads to a limit of 18574 columns.  Francois-Xavier
>>> Pineau suggested instead using the HIERARCH convention,
>>> which would allow a more or less unlimited column count.
>>>
>>> For concreteness, this HIERARCH-based variant differs from
>>> my original proposal
>>> (https://listmgr.nrao.edu/pipermail/fitsbits/2017-July/002967.html)
>>> in the following way:
>>>
>>>      - Metadata for each extended column is encoded with keywords
>>>        of the form HIERARCH XT XXXXXnnnnn, where XXXXX
>>>        are the same keyword roots as used for normal BINTABLE extensions,
>>>        and nnnnn is a decimal number written as usual (no leading zeros,
>>>        as many digits as required).  Thus the formats for data
>>>        columns 999, 1000, 1001 etc are declared with the keywords
>>>        HIERARCH XT TFORM999, HIERARCH XT TFORM1000, HIERARCH XT TFORM1001
>>>        etc.  Note this uses the ESO HIERARCH convention described at
>>>        https://fits.gsfc.nasa.gov/registry/hierarch_keyword.html.
>>>        The "name space" token has been chosen as "XT" (extended table).
>>>
>>> and the example header looks identical to my original example up
>>> to TFORM999, but the remaining entries differ:
>>>
>>>     TTYPE998= 'var_min_s_2'        /  label for column 998
>>>     TFORM998= 'D       '           /  format for column 998
>>>     TUNIT998= 'counts/s'           /  units for column 998
>>>     TTYPE999= 'XT_MORECOLS'        /  label for column 999
>>>     TFORM999= '813I    '           /  format for column 999
>>>     HIERARCH XT TTYPE999         = 'var_min_u_2' / label for column 999
>>>     HIERARCH XT TFORM999         = 'D' / format for column 999
>>>     HIERARCH XT TUNIT999         = 'counts/s' / units for column 999
>>>     HIERARCH XT TTYPE1000        = 'var_prob_h_2' / label for column 1000
>>>     HIERARCH XT TFORM1000        = 'D' / format for column 1000
>>>      ...
>>>     HIERARCH XT TTYPE1203        = 'var_prob_w_2' / label for column 1203
>>>     HIERARCH XT TFORM1203        = 'D' / format for column 1203
>>>     HIERARCH XT TTYPE1204        = 'var_sigma_w_2' / label for column 1204
>>>     HIERARCH XT TFORM1204        = 'D' / format for column 1204
>>>     HIERARCH XT TUNIT1204        = 'counts/s' / units for column 1204
>>>     END
>>>
>>> I have implemented and tested both variants, and they both work.
>>> The HIERARCH solution is a bit messier to do because it relies
>>> on a non-standard convention.
>>>
>>> Summarising the pros and cons of these two variants:
>>>
>>>     Base-26:
>>>      - limited to 18,000 columns ...
>>>        ... but nobody has come up with a plausible case to need more
>>>      - looks kludgy
>>>      - not very human readable
>>>
>>>     HIERARCH:
>>>      - requires non-FITS convention (HIERARCH)
>>>      - effectively no column count limit
>>>      - 13 or so fewer characters available for column keyword values
>>>      - easily human readable
>>>
>>> The balance of opinion in this thread of those who have expressed
>>> a preference between the two seems to have been in favour of the
>>> HIERARCH option (Francois-Xavier Pineau, Bill Pence, Tom McGlynn)
>>> as opposed to the Base-26 option (me, Rob Seaman, Arnold Rots?).
>>> In view of that, and the nagging worry that somebody might come
>>> up with some reason to store 20k+ columns, I think I'm just
>>> about coming down on the HIERARCH side, though it does look
>>> less FITSy to me.
>>>
>>> This message is to give a last chance for anybody to weigh in
>>> on one side or the other of the Base-26/HIERARCH question,
>>> in particular anybody who thinks they might end up one day
>>> wanting to implement support for this (which may be nobody!).
>>> If there is no more input on that question (which is fine by me),
>>> I'll decide one way or the other, implement and release it in
>>> STIL/STILTS/TOPCAT, and report back here.
>>>
>>> Thanks for reading and for the community input on this.
>>>
>>> Mark
>>>
>>> --
>>> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
>>> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
>>>
>>> _______________________________________________
>>> fitsbits mailing list
>>> fitsbits at listmgr.nrao.edu
>>> https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>>>
>>> ---
>>> This email has been checked for viruses by AVG.
>>> http://www.avg.com
>>>
>>
> --
> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
>
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/fitsbits



More information about the fitsbits mailing list