[evla-sw-discuss] problems with executor and weather station

Bryan Butler bbutler at nrao.edu
Wed Apr 24 11:52:52 EDT 2013


all,

we've had this problem a couple of times in the past week, so i thought 
i'd write down what's going on so that everybody is in the loop.

	-bryan


last friday, the operators got the following message:

      Executor is getting short of file descriptors.
      It has 929 open files out of a permitted 1024.
      Please restart it at your convenience.

after which michael was called.  he restarted the executor, but then had 
the following executor log message:

SEVERE: Unable to initialize RefpointingAgent  Address already in use

and had to kill the executor again.  he restarted again, and it worked 
properly after that (note - he waited a few minutes before starting 
another script, but i think that is unimportant [but i may be wrong!]).

at this point, the weather station MIB stopped communicating with the 
outside world.  we have seen this problem before, and hichem worked on 
it quite a bit, but we never figured out what was going on, and it 
stopped happening, so we just never closed it off.  in any event, it's 
happening again.  this persisted through the weekend, until monday 
morning when somebody went out and power cycled the MIB.

nearly exactly this same sequence of events occurred yesterday early 
evening.  the operator called me around 6:30pm to tell me that the 
executor had died.  (i don't know if it was the same file descriptor 
problem - i had a look in the log and couldn't pinpoint the time where 
it died.)  he had restarted it, but it was having the same 
RefpointingAgent error as above.  i told him to restart it again, and 
that fixed the executor, but killed the weather station again.

bb



More information about the evla-sw-discuss mailing list