[evla-sw-discuss] problems with executor and weather station
Bryan Butler
bbutler at nrao.edu
Wed Apr 24 11:52:52 EDT 2013
all,
we've had this problem a couple of times in the past week, so i thought
i'd write down what's going on so that everybody is in the loop.
-bryan
last friday, the operators got the following message:
Executor is getting short of file descriptors.
It has 929 open files out of a permitted 1024.
Please restart it at your convenience.
after which michael was called. he restarted the executor, but then had
the following executor log message:
SEVERE: Unable to initialize RefpointingAgent Address already in use
and had to kill the executor again. he restarted again, and it worked
properly after that (note - he waited a few minutes before starting
another script, but i think that is unimportant [but i may be wrong!]).
at this point, the weather station MIB stopped communicating with the
outside world. we have seen this problem before, and hichem worked on
it quite a bit, but we never figured out what was going on, and it
stopped happening, so we just never closed it off. in any event, it's
happening again. this persisted through the weekend, until monday
morning when somebody went out and power cycled the MIB.
nearly exactly this same sequence of events occurred yesterday early
evening. the operator called me around 6:30pm to tell me that the
executor had died. (i don't know if it was the same file descriptor
problem - i had a look in the log and couldn't pinpoint the time where
it died.) he had restarted it, but it was having the same
RefpointingAgent error as above. i told him to restart it again, and
that fixed the executor, but killed the weather station again.
bb
More information about the evla-sw-discuss
mailing list