[evlatests] 'Flux Densities' System Failures

Rick Perley rperley at nrao.edu
Mon May 6 12:14:19 EDT 2013


    The data from the Flux Densities run are split into eight 
consecutive files, due to seven failures of one kind or another.  I've 
trolled through all the data, and have made a log of these events.  
Given below are the times of the last valid record for each of the seven 
failures, along with the band at which the observation was taken.  Also 
included is the next scheduled band.  Below each is the note given in 
the operator's log. 

-------------------------------------------------------------------------------------------------------------------
Event 1:  Last record:  03:36:35.  Band = Ka, Next Band = P.  Operator's 
note below:

03May 03:37:28   03May 04:04:52   CORRELATOR        Other         26.00  
712.4
Antenna(s) All (Data: Lost):
Multiple correlator configuration failures.  Scans 240 - 274 affected.  
Contacted
M. Rupen. Michael advises to abort the script, clear the current 
configuration,
and restart the script.
-------------------------------------------------------------------------------------------------------------------
Event 2:  Last record:  05:52:29.  Band = X, Next band = C.  Operator's 
note:

03May 05:25:00   03May 05:59:30   CORRELATOR        Other         26.00  
897.0
Antenna(s) All (Data: Lost):
Multiple scans missing binary data files.  No associated correlator 
configuration
failures.  After 10 minutes, the problem has gotten worse, i.e. rate of 
missing
BDFs is increasing.  Contacting M. Rupen.  Aborted script, ran 
clearcorrelator,
and restarted.
----------------------------------------------------------------------------------------------------------------
Event 3:  Last record:  16:05:06.  Band = X.  Next band = C.  Operator's 
note:

03May 15:37:29   03May 16:08:00   CORRELATOR        Other         25.23  
769.9
Antenna(s) All (Data: Lost):
CM reports correlator configuration error.  Executor log indicates a 
socket timeout,
"protocol or I/O exception to Widar" precedes this.  Michael is out.  
Called Ken,
he investigated and consulted with Barry.  Correlator is rejecting new 
configurations
and remains on the last good one.  BDFs are missing from scan #749 onwards.
Aborted script, Ken cleared correlator configuration, then I restarted 
the script.
Ken believes that the failed communication between Executor and CM was for
a 'delete configuration', and subsequent reconfigurations and deletes were
rejected because one was still active.

NOTE:  The system spent 30 minutes on a single observation before the 
script was aborted.  This failure is different than all the others. 
-------------------------------------------------------------------------------------------------------------------
Event 4:  Last record:  20:53:00  Band = P.  Next band = L.  Operator's 
note:

03May 20:48:30   03May 20:55:33   CORRELATOR        Other         25.23  
177.9
Antenna(s) All (Data: Lost):
Executor socket timeout again, CM correlator configuration error and 
missing BDFs.
I aborted the script, ran clearCorrelator and restarted the script per 
Ken's
instructions.  After restart there was at first a CBE configuration 
failure, but
subsequent reconfigurations proceeded normally.  Ken checked, no issues 
found.
-----------------------------------------------------------------------------------------------------------------------
Event 5:  Last record:  03:58:44  Band = Q  Next band = Ka.  Operator's 
note:

04May 03:50:55   04May 03:58:08   CORRELATOR        Other         26.00  
187.6
Antenna(s) All (Data: Lost):
MCAF reported a missing BDF for scans 513, 514, 515, 518, 519, 522, 524, 
524, 526
Script aborted, running clearcorrelator and script restarted.
-------------------------------------------------------------------------------------------------------------------------
Event 6:  Last record:  04:05:49  Band = Q, Next band04.  Operator's notes:

May 03:58:08   04May 04:09:00   CORRELATOR        Other         26.00  
282.5
Antenna(s) All (Data: Lost):
MCAF reported a missing BDF for scans 3, 5, 6, 9, 10, 12. No CM correlator
configuration errors. Script aborted. Called Michael Rupen who is 
investigating.

04May 04:09:00  04May 04:30:07
Michael was able to reset the CBE. Script was restarted, everything 
seems to
be running normally so far. = Ka.

NOTE:  This failure, and the next, are likely connected. 
-------------------------------------------------------------------------------------------------------
Event 7:  Last record:  04:23:08  Band = L, next band = X 

NOTE:  This event, and the last, are probably a single failure.  There 
were only three successful scans taken between the two listed times 
(04:05:49 to 04:23:08). 









More information about the evlatests mailing list