[bananas] Redhat 6.2 and Aips
James Robnett
jrobnett at aoc.nrao.edu
Wed Jun 7 12:28:26 EDT 2000
[Moderator's note: this is relevant to AIPS users with Linux systems,
especially if you purchased one from Dell with Linux pre-installed.
- Pat Murphy]
We have discovered a problem with stock Redhat 6.2 systems.
According to Redhat:
http://www.redhat.com/support/errata/RHBA-2000013-01.html
The default kernel, 2.2.14-5 which ships with Redhat 6.2 as well
as kernels 2.2.13-0.13 and 2.2.14-6.1 suffers from a context switch bug.
" Red Hat, Intel and Dell have uncovered a problem with the Red Hat Linux
6.2 for the x86 (Intel) processor. This problem has been duplicated and
confirmed in our lab, though we have had no reports from customers at
large. This problem affects all OEM system manufacturers shipping Red Hat
Linux 6.2 preinstalled on x86 processor-based systems.
Under extremely heavy load, data corruption can occur if a page fault
occurs during a task switch between kernel space and user space on x86
platforms. This problem may affect customer systems, particularly systems
that are swapping to disk very heavily (thrashing), or are otherwise very
heavily loaded. Lightly loaded servers are unlikely to be affected. This
issue affects all x86 compatible systems running the kernels listed below."
[Mod: be careful to check the README if you get the patches]
We have seen AIPS tasks, in particular IMAGR as well as the DDT and Y2K
benchmarks exhibit this bug on P-III 600Mhz systems with 128Mb of memory.
The bug was not apparent on 466Mhz Celeron systems with 128Mb of memory.
[Mod: the "Y2K" benchmark is a new, experimental version of the classic
DDT with an even larger (but not huge) data set; it's not yet ready for
release. Takes about an hour on reasonalbe PC's.]
If you haven't already (I think CV has on at least artemis) you should
upgrade your kernels.
AIPS and AIPS++ maintainers should probably be on the lookout for problem
reports from non-NRAO clients. Note this is a kernel issue, any software
package can exhibit this behavior, not just AIPS or AIPS++.
I'll leave it up to Eric Griesen to explain in more detail the symptoms
seen under AIPS since I'd probably just make a fool of myself if I tried.
The symptoms will be application specific, basically be on the lookout
for widely varying results.
James Robnett
[Mod: Eric tells me the symptoms were NaN's (the IEEE floating point
format's "not a number" magic value) for the most part, where they did
not belong. Running the DDT repeatedly (the old one) would sometimes
reveal a whole slew of UVSRT differences and each run might be slightly
different.]
More information about the Bananas
mailing list