[wfc] Re: Checksum proposal (Greisen)

Arnold Rots arots at head-cfa.harvard.edu
Tue Apr 30 18:01:04 EDT 2002


Following are some comments on the objections that Eric voiced last
week.

  - Arnold

Eric Greisen wrote:
> 
> I was mislead by the header From line in the original post: send
> messages simple to wfc at nrao.edu.
> 
> -------------------------------------------------------
> 
> I have read the Checksum proposal 2002-02-27 version and, on first
> reading, am inclined to vote NO.
> 
> 1. Unless the rules have changed, "standards" are to be published in
> A&A.  The present manuscript is not in a form for - or even properly
> written - for that journal.

I am somewhat puzzled by your mentioning this rule.  I am not aware of
its existence and, as far as I know, the Y2k standard, for instance,
was never published that way.  On the other hand, of course, it isn't
hard to convert the document, but that will not really change the
contents.

> 
> 2. Exactly what my software is supposed to do to implement this is
> unclear - not the encoding which is described in detail, but the
> principles.  This goes in part bak to point 1 - an HDU is what?  Is

The HDU is defined in the glossary of the NOST document.

> each attached extension a separate HDU with separate DATASUM and
> CHECKSUM keywords?  The manuscript does not say - and I ought to be
> one of the more knowledgable readers of the MS.

The answer is yes, as can be deduced from the MS, in combination with
the definition of HDU.

> 
> 3. Now, how do I implement this?  It appears that I read and translate
> my data to FITS form three times, actually writing it out on the 3rd
> pass
>     a. To get DATASUM
>     b. To get CHECKSUM
>     c. To write the output file.
> Computers are now faster, but this seems still rather a serious
> overhead.  Again - point 1 - I should not have to ask this question.

Given the algorithm, it should not be difficult for a software
professional to implement this standard.

Note that the checksum is not required.  All this standard says is:
If you prefer, you may ignore all checksum issues.  However, if you want
to embed a standard checksum into your FITS files, you may do that by
ensuring the 32-bit 1's complement over each HDU to be -0.  If your
header indicates that this checksum is included in the file, its
integrity can easily be checked.

There are many ways to implement this.  If one is able to update the
headers after the fact, or can keep the entire HDU in memory, there is
only a single write operation, plus one write, one read, and two
updates of the header:
  a. Write the header
  b. Write the data, while accumulating the DATASUM
  c. Update DATASUM
  d. Read header and develop CHECKSUM
  e. Update CHECKSUM
That's pretty straightforward and not very different from updating
NAXIS2.  For in-memory HDUs it's even simpler.

The overhead is minimal, the gain (a mechanism for detecting corrupted
data) considerable.

I'm not sure what your reference to point 1 means.

> 
> It appears to me that this is an internal convention for those systems
> that store their data internally in FITS format and are able to update
> single header keywords as they go.

Almost any system needs to be able to do that (or keep an entire HDU
in memory), since the NAXIS2 keyword may need to written upon
completion, at least for tables.

> 
> The FITS community also needs to look at another issue - whether all
> of the "conventions", even those that are widespread, ought to be
> "standards".  If something is a standard, then most widespread
> software systems ought to support it.  If it is a convention, we have
> a greater leeway.
> 
> I might change my mind if the paper were properly written.  I need to
> be convinced that I need this thing,

Your users may need it; it provides an internal integrity checking
mechanism that is currently, and deplorably, lacking.

I have seen to it that all FITS files in the RXTE and Chandra data
archives are fully checksummed (my rough guess: 20 million files) and
it has been well worth the effort, allowing us to do a decent job in
integrity checking with minimal effort.

Rob Seaman adds to this:
NOAO has about 3 million images (multi-extension HDUs, that is) stored
with checksums.  When these are extracted into individual classic FITS
files, the checksum is updated - a trivial and inexpensive operation.
If an astronomer receives a FITS tape from NOAO Save-the-bits, the
individual FITS files have checksums that can be independently
verified at the other end.  Far from being an internal convention,
this is being used in precisely the same way as everybody else's
checksums.  It does happen to also be used to verify the integrity of
the individual STB tapes.

> that my two's complement computer
> cares about negative zero,

The arithmetic used by the host computer is irrelevant.  The
robustness of the checksum algorithm _is_ relevant; this one happens
to use 1's complement - just because that provides a more robust
algorithm.  Implementing it on any kind of machine is fairly simple.

> that I might reasonably be able to
> implement it, etc.

If I can do it, you can do it!

> 
> Eric Greisen
> 
> _______________________________________________
> wfc mailing list
> wfc at listmgr.cv.nrao.edu
> http://listmgr.cv.nrao.edu/mailman/listinfo/wfc
> 

--------------------------------------------------------------------------
Arnold H. Rots                                Chandra X-ray Science Center
Smithsonian Astrophysical Observatory                tel:  +1 617 496 7701
60 Garden Street, MS 81                              fax:  +1 617 495 7356
Cambridge, MA 02138                             arots at head-cfa.harvard.edu
USA                                     http://hea-www.harvard.edu/~arots/
--------------------------------------------------------------------------



More information about the wfc mailing list