[fitsbits] reopening of Public Comment Period on the CHECKSUM convention

William Pence William.Pence at nasa.gov
Fri Dec 18 12:18:38 EST 2015


While I support most of Robs's suggestions, I disagree with 2 of of them:

(minor quibble):  I think the sentence that gives an example of why one 
might not add the Datetime stamp to the keywords ("The Datetime may be 
omitted from the comment for some purposes...") is not necessary for the 
FITS Standard.   Adding the Datetime is only a recommendation, so FITS 
writers don't have to have a valid justification if they choose not to 
do so.

More importantly, I object to adding this statement "If the checksum 
handling described here will not be performed, the two keywords should 
either be deleted or be given null values as described in section 4.4.2.7."
It is perfectly valid for the checksum keywords in a file to have an 
invalid value and there is no obligation for data processing software to 
update or delete the keywords.   Since the checksum keywords are often 
added when a FITS file is published on a public data archive site, the 
user can check the validity of the checksum keywords to determine 
whether his or her copy of that data file has been modified.  If the 
checksums are correct, then the user can have some confidence that this 
copy of the file is the same as the original published file.  If the 
checksum in a particular HDU is not correct, then this tells the user 
that that HDU has been modified (although I recognize that these 
keywords are not intended to guarantee 100% data integrity since they 
can be easily spoofed).

The checksum keywords are analogous to those annoying holographic 
stickers that merchants use to seal  a box.  If you see that the sticker 
is not intact, then you should be concerned that the contents of the box 
might have have been tampered with.  Similarly, if a checksum keyword 
has an invalid value, then this provides important information to the 
user that the particular HDU has probably been modified at some point in 
time after the checksum keywords were created.   So removing the 
sticker, or deleting the checksum keywords, defeats the propose of 
having them there in the first place.

I would agree however, that if a modified file is put back up on a 
public distribution site, then at that point the checksum keywords 
should be updated to contain valid values (as is already stated in the 
last sentence of the proposed  section 4.4.2.7).

-Bill

On 12/17/2015 3:00 PM, Rob Seaman wrote:
> Just a few comments:
>
> 1) section 4.4.2.7, para. 5, omit “for analogy”
>
> 2) same section, para. 7, “on that case” should be “in that case”
>
> 3) same paragraph, immediately before this change “also be written to every other HDU in the file” to "also be written to every other HDU in the file with values appropriate to each HDU in turn” (the point being that the values differ per-HDU, of course).
>
> 4) same section, para. 3, change “Datetime when the value of this keyword record is created or updated is recommended.” to something like: “Datetime when the value of the DATASUM keyword record is created or updated is recommended. The Datetime may be omitted from the comment for some purposes, e.g., if repeating an identical workflow across multiple copies of an archive with the intent of generating identical output files. Note that if DATASUM is updated, so must the corresponding CHECKSUM keyword.”
>
> 5) same section, para. 5, change “Datetime when the value of this keyword record is created or updated is recommended.” to something like: “Datetime when the value of the CHECKSUM keyword record is created or updated is recommended. The Datetime may be omitted from the comment for some purposes, e.g., if repeating an identical workflow across multiple copies of an archive with the intent of generating identical output files. Note that any timestamp on the CHECKSUM keyword should always be no earlier than the timestamp on the corresponding DATASUM keyword. (They may differ if the HDU’s header keywords are later updated, for instance.)”
>
> 6) Suggest that the fact that the DATASUM (and any other keyword editing) must be done before the CHECKSUM be explicitly stated. Perhaps add a paragraph “0” (or renumber):
>
> 	"0. The DATASUM keyword must be updated before the CHECKSUM keyword. In general updating the two checksum keywords should be the final step of any update to either data or header records in a FITS HDU. If the checksum handling described here will not be performed, the two keywords should either be deleted or be given null values as described in section 4.4.2.7."
>
> 7) section J.1, para. currently tagged “1”, change “It is recommended that the current data and time be recorded in the comment field to document when the checksum was computed.” to "It is recommended that the current data and time be recorded in ISO-8601 format in the comment field to document when the checksum was computed. The same timestamp may be used for both DATASUM and CHECKSUM keywords if they are updated at the same time, or the CHECKSUM keyword may be later. Timestamps between these keywords in different HDUs may differ. The timestamps may be omitted from the comments for some purposes, e.g., if repeating an identical workflow across multiple copies of an archive with the intent of generating identical output files.”
>
> 8) Just a reminder to indeed include the references cited in section J.4
>
> 9) I have not reviewed the example C code. Note that this can be checked by zeroing the checksum of ASCII files of any sort, not just FITS header keywords. Just repeat the trick of replacing a string of 16 zeroes, e.g., cut-and-paste, taking care to align the string at a 4 byte boundary.
>
>
>> Please accept my best wishes for the coming holidays and the new year.
> Happy holidays!
>
> Rob
>
>
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/fitsbits



More information about the fitsbits mailing list