[fitsbits] reopening of Public Comment Period on the CHECKSUM convention

Rob Seaman seaman at noao.edu
Fri Dec 18 12:38:03 EST 2015


If there were a separate FITS users guide it might be appropriate to move my comments there - and, in fact, they would be appropriate for a general astronomical data users guide, separate from FITS. The haggling over the checksum language is going to be pretty tame compared to what we can expect from some others of the conventions. That being the case, I suggest this group consider how best to handle pointers regarding usage in the general case. Perhaps footnotes or endnotes? In this particular case, the gist of Bill’s comments should also be captured for the edification of future implementors.

My formal statement is that I make it a practice to never disagree with Bill.

Rob
—

> On Dec 18, 2015, at 10:18 AM, William Pence <William.Pence at nasa.gov> wrote:
> 
> While I support most of Robs's suggestions, I disagree with 2 of of them:
> 
> (minor quibble):  I think the sentence that gives an example of why one might not add the Datetime stamp to the keywords ("The Datetime may be omitted from the comment for some purposes...") is not necessary for the FITS Standard.   Adding the Datetime is only a recommendation, so FITS writers don't have to have a valid justification if they choose not to do so.
> 
> More importantly, I object to adding this statement "If the checksum handling described here will not be performed, the two keywords should either be deleted or be given null values as described in section 4.4.2.7."
> It is perfectly valid for the checksum keywords in a file to have an invalid value and there is no obligation for data processing software to update or delete the keywords.   Since the checksum keywords are often added when a FITS file is published on a public data archive site, the user can check the validity of the checksum keywords to determine whether his or her copy of that data file has been modified.  If the checksums are correct, then the user can have some confidence that this copy of the file is the same as the original published file.  If the checksum in a particular HDU is not correct, then this tells the user that that HDU has been modified (although I recognize that these keywords are not intended to guarantee 100% data integrity since they can be easily spoofed).
> 
> The checksum keywords are analogous to those annoying holographic stickers that merchants use to seal  a box.  If you see that the sticker is not intact, then you should be concerned that the contents of the box might have have been tampered with.  Similarly, if a checksum keyword has an invalid value, then this provides important information to the user that the particular HDU has probably been modified at some point in time after the checksum keywords were created.   So removing the sticker, or deleting the checksum keywords, defeats the propose of having them there in the first place.
> 
> I would agree however, that if a modified file is put back up on a public distribution site, then at that point the checksum keywords should be updated to contain valid values (as is already stated in the last sentence of the proposed  section 4.4.2.7).
> 
> -Bill
> 
> On 12/17/2015 3:00 PM, Rob Seaman wrote:
>> Just a few comments:
>> 
>> 1) section 4.4.2.7, para. 5, omit “for analogy”
>> 
>> 2) same section, para. 7, “on that case” should be “in that case”
>> 
>> 3) same paragraph, immediately before this change “also be written to every other HDU in the file” to "also be written to every other HDU in the file with values appropriate to each HDU in turn” (the point being that the values differ per-HDU, of course).
>> 
>> 4) same section, para. 3, change “Datetime when the value of this keyword record is created or updated is recommended.” to something like: “Datetime when the value of the DATASUM keyword record is created or updated is recommended. The Datetime may be omitted from the comment for some purposes, e.g., if repeating an identical workflow across multiple copies of an archive with the intent of generating identical output files. Note that if DATASUM is updated, so must the corresponding CHECKSUM keyword.”
>> 
>> 5) same section, para. 5, change “Datetime when the value of this keyword record is created or updated is recommended.” to something like: “Datetime when the value of the CHECKSUM keyword record is created or updated is recommended. The Datetime may be omitted from the comment for some purposes, e.g., if repeating an identical workflow across multiple copies of an archive with the intent of generating identical output files. Note that any timestamp on the CHECKSUM keyword should always be no earlier than the timestamp on the corresponding DATASUM keyword. (They may differ if the HDU’s header keywords are later updated, for instance.)”
>> 
>> 6) Suggest that the fact that the DATASUM (and any other keyword editing) must be done before the CHECKSUM be explicitly stated. Perhaps add a paragraph “0” (or renumber):
>> 
>> 	"0. The DATASUM keyword must be updated before the CHECKSUM keyword. In general updating the two checksum keywords should be the final step of any update to either data or header records in a FITS HDU. If the checksum handling described here will not be performed, the two keywords should either be deleted or be given null values as described in section 4.4.2.7."
>> 
>> 7) section J.1, para. currently tagged “1”, change “It is recommended that the current data and time be recorded in the comment field to document when the checksum was computed.” to "It is recommended that the current data and time be recorded in ISO-8601 format in the comment field to document when the checksum was computed. The same timestamp may be used for both DATASUM and CHECKSUM keywords if they are updated at the same time, or the CHECKSUM keyword may be later. Timestamps between these keywords in different HDUs may differ. The timestamps may be omitted from the comments for some purposes, e.g., if repeating an identical workflow across multiple copies of an archive with the intent of generating identical output files.”
>> 
>> 8) Just a reminder to indeed include the references cited in section J.4
>> 
>> 9) I have not reviewed the example C code. Note that this can be checked by zeroing the checksum of ASCII files of any sort, not just FITS header keywords. Just repeat the trick of replacing a string of 16 zeroes, e.g., cut-and-paste, taking care to align the string at a 4 byte boundary.
>> 
>> 
>>> Please accept my best wishes for the coming holidays and the new year.
>> Happy holidays!
>> 
>> Rob
>> 
>> 
>> _______________________________________________
>> fitsbits mailing list
>> fitsbits at listmgr.nrao.edu
>> https://listmgr.nrao.edu/mailman/listinfo/fitsbits




More information about the fitsbits mailing list