[fitsbits] Proposed Changes to the FITS Standard

Fri Aug 17 15:32:44 EDT 2007

>>   1. Keywords that have a value shall not be repeated in a header.
>
> I have many examples (hundreds of thousands?) of files in which
> keywords are repeated.  Rather than the wording in the current
> proposal, I would replace the attempt at a requirement with a strong
> recommendation and a clarification that the final copy of any such
> repeated keyword should take precedence.
>
>>   2. PCOUNT and GCOUNT must immediately follow the last NAXISn
>>      keyword in all conforming extensions (as is already required
>>      in IMAGE, TABLE, and BINTABLE extensions).
>
> I guess I'd like to know if there are any such extensions.  If not,
> this is relatively safe.  If so, make it a strong recommendation for
> an explicit list of grandfathered extension types and an absolute
> requirement for any newly defined extensions.

It got me thinking, so I looked at the FITS parser in iSTB (the  
current version of save-the-bits deployed on three mountaintops and  
handling several terabytes of raw data annually).  And no, I don't  
currently require PCOUNT and GCOUNT to immediately follow NAXISn.  I  
do, however, throw an error if these particular keywords are  
duplicated :-)

Speaking of which, it is the duplicate keyword requirement that seems  
most onerous.  To implement this efficiently for all keywords, one  
would have to build a hash table or some such for each header.  Then  
one is left with the question of what to do upon detecting a  
duplicate.  The sense of a requirement is to simply throw an error  
and exit.  How helpful is that?  STB will toss a FITS file if any of  
the structural keywords (NAXISn, BITPIX, PCOUNT, GCOUNT, etc.) are  
questionable - precisely because this calls into question the  
possibility of handling the data appropriately.  The daemon needs to  
know the size of the file because it is reading it on the standard  
input, perhaps concatenated with other files.  The size of each  
extension must be known to find subsequent extensions.  Etc.

But am I to discard brand new data simply because some camera  
temperature keyword appears twice?  I spend a lot of time every week  
trying to convince a dozen different instrument teams to provide the  
archive with a reliable DATE-OBS, EXPTIME, FILTER, OBSTYPE, etcetera  
and so forth.  They'll rebel if I start tossing their data due to  
foibles with minor engineering keywords.

I really think enforcing #1 will prove impossible in practice.  I'm  
not going to build a hash table to search for duplicates for every  
keyword just so I can throw an error that will anger my stakeholders  
over trivial details.  And on the other hand, for pipeline reduced  
science data sets, no requirement is needed since there already is  
sufficient impetus for data providers to carefully tailor their data  
products, eliminating duplicate keywords as a matter of course.

Making it a strong recommendation is my own strong recommendation.

Rob