[fitsbits] BINTABLE convention for >999 columns

William Pence William.Pence at nasa.gov
Mon Jul 31 15:14:04 EDT 2017


A bit of history about the HIERARCH convention and CFITSIO:

When support for the HIERARCH keyword convention was added to the 
CFITSIO library back 1999, the convention was generalized beyond the 
original ESO usage to also support keywords longer than 8 characters, as in:

HIERARCH  MYLONGKEYWORD = 17

At the time I actively promoted this convention as a useful way to get 
around the 8-character keyword limitation in FITS, but it never really 
caught on and, as far as I know, it was not used very much, if at all. 
Since one of the requirements for documenting a convention in the FITS 
Registry is that it must be in actual use in the FITS community, the 
description of this more general use of the HIERARCH keyword was removed 
from the document in 2009.

None the less, CFITSIO continues to support this more general 
convention.  If a user attempts to write a keyword longer than 8 
characters the CFITSIO routines will silently encode it in the FITS 
header using the HIERARCH convention, and similarly will look for the 
appropriate HIERARCH keyword when reading a long keyword.

Getting back to Mark's BINTABLE convention, I agree with François-Xavier 
that eliminating the 'XT' domain name would simplify the BINTABLE 
convention a little bit.  Using CFITSIO, one could then simply read or 
write the "TFORM1234" keyword, instead of "XT TFORM1234".

-Bill


On 7/31/2017 10:19 AM, Francois-Xavier PINEAU wrote:
> Dear Mark, dear fitsbits,
>
> The last version of the HIERARCH keyword conventions
> (following the pointer Mark provided) seems to be:
> https://fits.gsfc.nasa.gov/registry/hierarch/hierarch.pdf
>
> I do not know why section 2 ("Generalized Conventions to Support Long
> Keyword
> Names") of the previous version has been dropped:
> https://fits.gsfc.nasa.gov/registry/hierarch/hierarch_20Aug2007.pdf
>
> But I like the idea (in the 2007 version) of using HIERARCH as a possible
> convention "to support keyword names that are longer than the 8-character
> limit for a standard FITS keyword" (waiting for a possible updated
> version of the standard).
>
> In section 2, there is no need for a 'name space' token.
> Thus, one could consider 'XT' as unnecessary in the case of wide BINTABLEs.
>
> When reading
> HIERARCH XT TFORM999 = 'toto'
> I expect a HIERARCH aware FITS parser to provide a keyword having the name
> XT.TFORM999 (it is at least the behaviour of the FITS parser I have started
> to write).
>
> It means that a software has to be aware of the meaning of 'XT' to be able
> to decode the last column of a wide BINTABLE.
>
> Getting rid of 'XT', the keyword name would simply be TFORM999
> (and TFORM1000, ...). After the metadata parsing phase, the FITS reader
> will interpret wide BINTABLEs metadata exactly the same way as it
> interpret and regular BINTABLEs metadata.
>
> I support an "overloading" behaviour (like e.g. in CSS): same keywords
> (TFIELDS, TXXXX999) in a more specialized context (HIERARCH) should be
> able to
> overload "regular" keywords that ensure the legality of the FITS
> file (again, waiting for an updated standard).
>
>
> "The perception that it's too difficult to change the FITS standard" may
> come from the fact that "there is no standard means for a FITS file to
> communicate the formatting version it conforms to" (2015A&C....12..133T).
> It may be the first point to address before possibly relaxing the
> 8-character
> and/or the upper case constraints on keyword names.
>
>
>
> François-Xavier Pineau
>
>
> Le 30/07/2017 à 23:21, Mark Taylor a écrit :
>> Thanks all for your feedback and Bill for your summary.
>> If some future version of FITS relaxes the 8-character limit I will
>> certainly be happy to encode wide tables in that way.
>> In the mean time, I will go ahead with the HIERARCH variant
>> (which seems to be clearly more popular here than the base-26 variant)
>> of the solution that I've described, with the expectation of its
>> use only within TOPCAT/STIL, rather than as any kind of generally
>> accepted FITS convention.
>>
>> Mark
>>
>> On Sun, 30 Jul 2017, William Pence wrote:
>>
>>> Mark,
>>>
>>> This seems to me to be a good solution to the particular use case you
>>> outlined, namely to allow TOPCAT users to temporarily store the
>>> results from a
>>> cross-correlation of 2 FITS tables for later analysis using TOPCAT.
>>> This is
>>> not intended to be a general solution for supporting very wide tables
>>> in FITS.
>>> If the FITS community decided that this was a serious issue that
>>> should be
>>> addressed, then I think a much better solution would be to just relax
>>> the
>>> 8-character limit on the length of keyword names so that the column
>>> number
>>> suffix on the keyword name can be longer than 3 digits.
>>>
>>> As an aside, I think this 8-character limit on keyword names is
>>> probably the
>>> most serious current limitation in the FITS format.  Fixing this by
>>> allowing
>>> free-format 80-character header records where the equals sign is no
>>> longer
>>> required to be in byte 9 would not be difficult to implement and
>>> support.
>>>
>>> -Bill
>>>
>>> On 7/28/2017 10:05 AM, Mark Taylor wrote:
>>>> Coming back to this after a bit of a breather:
>>>>
>>>> To summarise the dicussion, enthusiasm for my proposed
>>>> convention for wide (>999 column) BINTABLES has not been
>>>> universal, but I am still planning to implement something
>>>> along these lines for my purposes (STIL/STILTS/TOPCAT).
>>>> The possibility exists of other software deciding to recognise
>>>> such a convention at some point in the future, but I'm not
>>>> relying on that or even necessarily recommending it.
>>>>
>>>> In terms of the details, there was one main difference of opinion,
>>>> namely how to store the column metadata for the 'extended'
>>>> columns in the FITS header.  The suggestion I put forward was
>>>> to use a base-26 number giving headers TFORMAAA - TFORMZZZ,
>>>> which leads to a limit of 18574 columns.  Francois-Xavier
>>>> Pineau suggested instead using the HIERARCH convention,
>>>> which would allow a more or less unlimited column count.
>>>>
>>>> For concreteness, this HIERARCH-based variant differs from
>>>> my original proposal
>>>> (https://listmgr.nrao.edu/pipermail/fitsbits/2017-July/002967.html)
>>>> in the following way:
>>>>
>>>>      - Metadata for each extended column is encoded with keywords
>>>>        of the form HIERARCH XT XXXXXnnnnn, where XXXXX
>>>>        are the same keyword roots as used for normal BINTABLE
>>>> extensions,
>>>>        and nnnnn is a decimal number written as usual (no leading
>>>> zeros,
>>>>        as many digits as required).  Thus the formats for data
>>>>        columns 999, 1000, 1001 etc are declared with the keywords
>>>>        HIERARCH XT TFORM999, HIERARCH XT TFORM1000, HIERARCH XT
>>>> TFORM1001
>>>>        etc.  Note this uses the ESO HIERARCH convention described at
>>>>        https://fits.gsfc.nasa.gov/registry/hierarch_keyword.html.
>>>>        The "name space" token has been chosen as "XT" (extended table).
>>>>
>>>> and the example header looks identical to my original example up
>>>> to TFORM999, but the remaining entries differ:
>>>>
>>>>     TTYPE998= 'var_min_s_2'        /  label for column 998
>>>>     TFORM998= 'D       '           /  format for column 998
>>>>     TUNIT998= 'counts/s'           /  units for column 998
>>>>     TTYPE999= 'XT_MORECOLS'        /  label for column 999
>>>>     TFORM999= '813I    '           /  format for column 999
>>>>     HIERARCH XT TTYPE999         = 'var_min_u_2' / label for column 999
>>>>     HIERARCH XT TFORM999         = 'D' / format for column 999
>>>>     HIERARCH XT TUNIT999         = 'counts/s' / units for column 999
>>>>     HIERARCH XT TTYPE1000        = 'var_prob_h_2' / label for column
>>>> 1000
>>>>     HIERARCH XT TFORM1000        = 'D' / format for column 1000
>>>>      ...
>>>>     HIERARCH XT TTYPE1203        = 'var_prob_w_2' / label for column
>>>> 1203
>>>>     HIERARCH XT TFORM1203        = 'D' / format for column 1203
>>>>     HIERARCH XT TTYPE1204        = 'var_sigma_w_2' / label for
>>>> column 1204
>>>>     HIERARCH XT TFORM1204        = 'D' / format for column 1204
>>>>     HIERARCH XT TUNIT1204        = 'counts/s' / units for column 1204
>>>>     END
>>>>
>>>> I have implemented and tested both variants, and they both work.
>>>> The HIERARCH solution is a bit messier to do because it relies
>>>> on a non-standard convention.
>>>>
>>>> Summarising the pros and cons of these two variants:
>>>>
>>>>     Base-26:
>>>>      - limited to 18,000 columns ...
>>>>        ... but nobody has come up with a plausible case to need more
>>>>      - looks kludgy
>>>>      - not very human readable
>>>>
>>>>     HIERARCH:
>>>>      - requires non-FITS convention (HIERARCH)
>>>>      - effectively no column count limit
>>>>      - 13 or so fewer characters available for column keyword values
>>>>      - easily human readable
>>>>
>>>> The balance of opinion in this thread of those who have expressed
>>>> a preference between the two seems to have been in favour of the
>>>> HIERARCH option (Francois-Xavier Pineau, Bill Pence, Tom McGlynn)
>>>> as opposed to the Base-26 option (me, Rob Seaman, Arnold Rots?).
>>>> In view of that, and the nagging worry that somebody might come
>>>> up with some reason to store 20k+ columns, I think I'm just
>>>> about coming down on the HIERARCH side, though it does look
>>>> less FITSy to me.
>>>>
>>>> This message is to give a last chance for anybody to weigh in
>>>> on one side or the other of the Base-26/HIERARCH question,
>>>> in particular anybody who thinks they might end up one day
>>>> wanting to implement support for this (which may be nobody!).
>>>> If there is no more input on that question (which is fine by me),
>>>> I'll decide one way or the other, implement and release it in
>>>> STIL/STILTS/TOPCAT, and report back here.
>>>>
>>>> Thanks for reading and for the community input on this.
>>>>
>>>> Mark
>>>>
>>>> --
>>>> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
>>>> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
>>>>
>>>> _______________________________________________
>>>> fitsbits mailing list
>>>> fitsbits at listmgr.nrao.edu
>>>> https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>>>>
>>>> ---
>>>> This email has been checked for viruses by AVG.
>>>> http://www.avg.com
>>>>
>>>
>> --
>> Mark Taylor   Astronomical Programmer   Physics, Bristol University, UK
>> m.b.taylor at bris.ac.uk +44-117-9288776  http://www.star.bris.ac.uk/~mbt/
>>
>> _______________________________________________
>> fitsbits mailing list
>> fitsbits at listmgr.nrao.edu
>> https://listmgr.nrao.edu/mailman/listinfo/fitsbits
>
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/fitsbits


-- 
____________________________________________________________________
Dr. William Pence    Astrophysicist     William.Pence at nasa.gov
NASA/GSFC Code 662     [Emeritus]       +1-301-286-4599 (voice)
Greenbelt MD 20771                      +1-301-286-1684 (fax)




More information about the fitsbits mailing list