[fitsbits] start of Public Comment Period on the CHECKSUM convention

William Pence William.Pence at nasa.gov
Wed Jul 1 19:51:57 EDT 2015


On 6/26/2015 4:16 PM, Tom McGlynn (NASA/GSFC Code 660.1) wrote:
> I'm generally pretty supportive of this proposal.  My only real
> substantive comment is that I think the algorithm should not be
> recommended but mandatory.  If someone else wants to use some other
> checksum, they can simply pick their own keyword[s].

On 6/26/2015 5:32 PM, Rob Seaman wrote:
> On review I think we’re talking apples and oranges.  The proposed text says:
>
> 	"it is recommended that the particular 16-character string generated by the algorithm described in Appendix J be used”
>
> This is referring to the ASCII-encoding, not to the underlying 1’s-complement checksum, which is a uniquely specified algorithm.  The ASCII-encoding is used only with the CHECKSUM keyword whereas DATASUM is a string expressing the integer checksum itself.  The effect of encoding CHECKSUM is to force the 1’s-complement checksum for that HDU (and ultimately, for the entire file) to be (negative) zero.  To decode the value simply accumulate the 1’s complement checksum on those bytes.  The underlying checksum is 32-bits or four bytes, but the string value of CHECKSUM is 16 bytes.  Restricting the string to printable ASCII still leaves many possible strings to represent each of the 2^32 checksum values.  It is immaterial which version is selected; the recommended heuristic simply divides each of the four 32-bit equivalent strings into approximately equal parts.
>

I agree.  It is required that the 1's-complement checksum of the HDU be 
equal to -0.  The thing that is only a recommendation is the clever 
algorithm for generating a 16-character ASCII string for the value of 
the CHECKSUM keyword that fulfills that requirement.  It really doesn't 
matter if someone prefers to use one of the almost infinite number of 
other strings (or some other technique) that would also cause the HDU 
checksum to equal -0,

> A second issue arises with:
>
> 	"It is recommended that the current date and time be written into the comment field of both keywords to document when the checksum was computed (or more precisely, the time that the checksum computation process was started).”
>
> The problem here is that one might want to reproduce a verbatim file at a later date and the timestamp makes this impossible since the checksum will differ precisely because of the timestamp.  For instance, one might (one has, in fact) generate a large number of files to ingest into one copy of an archive in one location, and regenerate the same files to ingest into a second copy.  Due to the large amount of data it is less expensive to duplicate the processing compared to copying the data remotely.  The timestamp should be optional.
>

Placing the date and time in the comment of the CHECKSUM keyword does 
provide additional information to the user.  If the checksum is still 
valid, then the user can conclude that the FITS file has not been 
modified since that date, or conversely, that the file was modified at 
some point after that specific date.   Including the date and time is 
only a recommendation however, so if an observatory like NOAO finds this 
to be inconvenient for operational reasons, then it can omit the date 
and time.

-Bill



More information about the fitsbits mailing list