[daip] [Mnj] bad Linux kernel

Eric Greisen egreisen at nrao.edu
Thu May 24 12:28:06 EDT 2007


Michael S. Sipior writes:
 > -----BEGIN PGP SIGNED MESSAGE-----
 > Hash: SHA1
 > 
 > Dr Greisen,
 > 
 > Apologies for adding to your undoubtedly extensive mail queue. I'm a
 > postdoc working at JIVE, and my supervisor passed along your email of
 > 23rd May regarding a potential race condition in the Red Hat Linux
 > kernel 2.6.9-55, which is apparently a backport of the vanilla 2.6.20
 > tree. This caused me a great deal of consternation, as I have at least
 > one machine running a variant of this kernel. I've a few questions to
 > help me better understand the nature of this issue and my potential
 > vulnerability to it:
 > 
 > 1) Have you an idea about which system calls are involved in the race
 > condition? Simply creat() and open()?
 > 
 > 2) What filesystem types were in use on systems that exhibited the race
 > condition?
 > 
 > 3) Could you provide pointers to AIPS code that triggers the race? I'm
 > looking at your changes to TASKWT now, but wondered if there were other
 > examples.
 > 

The short answer is - if you do not have a problem do not go looking
for one.  I was attempting to help people since every local NRAO
system that was converted to this redhat kernel immediately began
exhibiting bad behavior.  These bad behaviors were not limited to bad
error returns from open although that was the most common condition
and occurred primarily on files that were being heavily and recently
accessed by the process.  The users reported other mysterious "pauses"
followed by abort signals without accompanying aips messages.  If you
aips works well, don't worry.  I was unable to get through one Y2K
LARGE test until I made this patch to TASKWT.  But the TD file was not
the only location for the trouble.

i guess it would be better for me not to warn people since I am not a
systems expert and do not know the infinite jargon surrounding this
mess.  One expert claimed 2.6.20 is current 2.6.9 obseolete while
another (more expert) claimed that 2.22 or so is current.  Maybe I
should just let things fail and deal with user cases one at a time.
Of course, some big sites upgrade their OS and the poor astronomer has
no chance to protest or control after the fact what is done to his
machine. 

Maybe everyone should get Macs and control their own destiny

Eric Greisen





More information about the Daip mailing list