[fitsbits] start of Public Comment Period on the CHECKSUM convention
Rob Seaman
seaman at noao.edu
Fri Jun 26 17:32:40 EDT 2015
On review I think we’re talking apples and oranges. The proposed text says:
"it is recommended that the particular 16-character string generated by the algorithm described in Appendix J be used”
This is referring to the ASCII-encoding, not to the underlying 1’s-complement checksum, which is a uniquely specified algorithm. The ASCII-encoding is used only with the CHECKSUM keyword whereas DATASUM is a string expressing the integer checksum itself. The effect of encoding CHECKSUM is to force the 1’s-complement checksum for that HDU (and ultimately, for the entire file) to be (negative) zero. To decode the value simply accumulate the 1’s complement checksum on those bytes. The underlying checksum is 32-bits or four bytes, but the string value of CHECKSUM is 16 bytes. Restricting the string to printable ASCII still leaves many possible strings to represent each of the 2^32 checksum values. It is immaterial which version is selected; the recommended heuristic simply divides each of the four 32-bit equivalent strings into approximately equal parts.
Note that the ASCII-encoding might conceivably be used with a different underlying algorithm such as a CRC. The same notion of decoding the string using the hash algorithm itself would apply. (Though the issue of the generating polynomial would also prove entertaining for CRCs.)
A second issue arises with:
"It is recommended that the current date and time be written into the comment field of both keywords to document when the checksum was computed (or more precisely, the time that the checksum computation process was started).”
The problem here is that one might want to reproduce a verbatim file at a later date and the timestamp makes this impossible since the checksum will differ precisely because of the timestamp. For instance, one might (one has, in fact) generate a large number of files to ingest into one copy of an archive in one location, and regenerate the same files to ingest into a second copy. Due to the large amount of data it is less expensive to duplicate the processing compared to copying the data remotely. The timestamp should be optional.
Rob
—
> On Jun 26, 2015, at 1:56 PM, Peter Weilbacher <pweilbacher at aip.de> wrote:
>
> On Tue, 23 Jun 2015, Lucio Chiappetti wrote:
>
>> This convention defines the CHECKSUM and DATASUM keywords that may be used
>> to verify the integrity of an HDU
>
> I have dealt with files containing DATASUM and CHECKSUM for so long that
> I had actually thought that they were already part of the standard.
> fitsverify knows to handle them, too...
>
> But I agree with Tom that the implementation should be made mandatory,
> for best possible interoperability.
>
> Peter.
> --
> Dr Peter Weilbacher http://www.aip.de/People/PWeilbacher
> Phone +49 331 74 99-667 encryption key ID 7D6B4AA0
> ------------------------------------------------------------------------
> Leibniz-Institut für Astrophysik Potsdam (AIP)
> An der Sternwarte 16, D-14482 Potsdam
>
> Vorstand: Prof. Dr. Matthias Steinmetz
> Stiftung bürgerlichen Rechts, Stiftungsverz. Brandenburg: 26 742-00/7026
>
> _______________________________________________
> fitsbits mailing list
> fitsbits at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/fitsbits
More information about the fitsbits
mailing list