[evlatests] 'Flux Densities' System Failures
Rick Perley
rperley at nrao.edu
Mon May 6 12:14:19 EDT 2013
The data from the Flux Densities run are split into eight
consecutive files, due to seven failures of one kind or another. I've
trolled through all the data, and have made a log of these events.
Given below are the times of the last valid record for each of the seven
failures, along with the band at which the observation was taken. Also
included is the next scheduled band. Below each is the note given in
the operator's log.
-------------------------------------------------------------------------------------------------------------------
Event 1: Last record: 03:36:35. Band = Ka, Next Band = P. Operator's
note below:
03May 03:37:28 03May 04:04:52 CORRELATOR Other 26.00
712.4
Antenna(s) All (Data: Lost):
Multiple correlator configuration failures. Scans 240 - 274 affected.
Contacted
M. Rupen. Michael advises to abort the script, clear the current
configuration,
and restart the script.
-------------------------------------------------------------------------------------------------------------------
Event 2: Last record: 05:52:29. Band = X, Next band = C. Operator's
note:
03May 05:25:00 03May 05:59:30 CORRELATOR Other 26.00
897.0
Antenna(s) All (Data: Lost):
Multiple scans missing binary data files. No associated correlator
configuration
failures. After 10 minutes, the problem has gotten worse, i.e. rate of
missing
BDFs is increasing. Contacting M. Rupen. Aborted script, ran
clearcorrelator,
and restarted.
----------------------------------------------------------------------------------------------------------------
Event 3: Last record: 16:05:06. Band = X. Next band = C. Operator's
note:
03May 15:37:29 03May 16:08:00 CORRELATOR Other 25.23
769.9
Antenna(s) All (Data: Lost):
CM reports correlator configuration error. Executor log indicates a
socket timeout,
"protocol or I/O exception to Widar" precedes this. Michael is out.
Called Ken,
he investigated and consulted with Barry. Correlator is rejecting new
configurations
and remains on the last good one. BDFs are missing from scan #749 onwards.
Aborted script, Ken cleared correlator configuration, then I restarted
the script.
Ken believes that the failed communication between Executor and CM was for
a 'delete configuration', and subsequent reconfigurations and deletes were
rejected because one was still active.
NOTE: The system spent 30 minutes on a single observation before the
script was aborted. This failure is different than all the others.
-------------------------------------------------------------------------------------------------------------------
Event 4: Last record: 20:53:00 Band = P. Next band = L. Operator's
note:
03May 20:48:30 03May 20:55:33 CORRELATOR Other 25.23
177.9
Antenna(s) All (Data: Lost):
Executor socket timeout again, CM correlator configuration error and
missing BDFs.
I aborted the script, ran clearCorrelator and restarted the script per
Ken's
instructions. After restart there was at first a CBE configuration
failure, but
subsequent reconfigurations proceeded normally. Ken checked, no issues
found.
-----------------------------------------------------------------------------------------------------------------------
Event 5: Last record: 03:58:44 Band = Q Next band = Ka. Operator's
note:
04May 03:50:55 04May 03:58:08 CORRELATOR Other 26.00
187.6
Antenna(s) All (Data: Lost):
MCAF reported a missing BDF for scans 513, 514, 515, 518, 519, 522, 524,
524, 526
Script aborted, running clearcorrelator and script restarted.
-------------------------------------------------------------------------------------------------------------------------
Event 6: Last record: 04:05:49 Band = Q, Next band04. Operator's notes:
May 03:58:08 04May 04:09:00 CORRELATOR Other 26.00
282.5
Antenna(s) All (Data: Lost):
MCAF reported a missing BDF for scans 3, 5, 6, 9, 10, 12. No CM correlator
configuration errors. Script aborted. Called Michael Rupen who is
investigating.
04May 04:09:00 04May 04:30:07
Michael was able to reset the CBE. Script was restarted, everything
seems to
be running normally so far. = Ka.
NOTE: This failure, and the next, are likely connected.
-------------------------------------------------------------------------------------------------------
Event 7: Last record: 04:23:08 Band = L, next band = X
NOTE: This event, and the last, are probably a single failure. There
were only three successful scans taken between the two listed times
(04:05:49 to 04:23:08).
More information about the evlatests
mailing list