[evlatests] An extraordinary event with 'Set and Remember'
Rick Perley
rperley at nrao.edu
Fri Feb 3 17:32:18 EST 2012
One of the purposes for the 'flux densities' run is to uncover odd
failure modes. One of the oddest ever found is described below.
I apologize for the length of this email, but the issue is rather
subtle.
Symptoms:
While editing the C-band data, it was found that nearly all
antennas/IFs showed a jump in power between 20:41 and 20:46 IAT, on
January 19. The jump was very large in many cases -- for some antennas,
it exceeded a factor of 50! There is no correlation in the jump
amplitude between the IFs -- all acted independently. The new power
level lasted until the end of the run, at 06:20 the following day. The
new power level was not always higher -- for some antenna/IFs, it was
lower, by up to a factor of 8.
Diagnosis:
To understand what happened, it is necessary to understand the 'set
and remember' mechanism. For the first time any particular tuning is
made, the executor is supposed to 'remember' the T304 attenuator values
which were determined to give the proper power levels to the sampler.
Although it takes only a few seconds to find these levels, it takes
about 40 seconds to 'remember' these levels. If the initial scan at any
particular frequency is not long enough, the relevant values are not
remembered, and the system will try remember the needed levels the next
time the tuning is encountered. If no scan is long enough, then each is
then at its own level. This is not a bad thing -- indeed there are
arguments (particularly relevant at high frequencies) where it can be
argued that this is a good thing (i.e., we shouldn't be remembering
anything). Gain changes associated with different attenuator settings
should be (and are !!!) corrected by the switched power monitoring
system. The only valid reason for the 'set and remember' mechanism is
to stabilize the bandpass shape, since this is known to vary with
attenuator settings. But I digress ...
In my file, no observation at any frequency lasted more than 20
seconds (!), except at L-band. To enable 'set and remember', the first
observation was extended to 40 seconds for each band. It was thought
that this should be sufficient. But I hadn't counted on system aborts ...
There were two restarts, both of which 'forgot' the attenuator
settings established at the beginning. This meant that all observations
were 'on their own', since there was never enough time to remember a
setting. And this was no problem at all bands, except at C-band, where
an extraordinary event occurred. For it seems that about four hours
after the second restart, the executor somehow established, and
remembered, the attenuator settings for X-band, then applied them to
C-band (and not to X-band). Once 'remembered', the system used these
erroneous values for the rest of the run.
The evidence to support this contention is circumstantial, but
pretty strong.
1) The largest power offsets found in the c-band data are from
those antennas which do not have new, wideband, X-band receivers. These
are well known to be noticeably underpowered, so the T304 attenuator
settings are turned down to raise the output power.
2) Both the X and C band observations prior to the first C-band
observation with the higher power levels are both missing from the
archive. The last valid scan before the 'event' is at L-band, after
which there is a four-minute gap (where X band referenced pointing,
X-band observation, and C-band observation should be), following which
is a valid S-band observation.
3) The referenced pointing solution which should be been determined
within that gap does not exist.
There is nothing in any log which calls attention to any event
during the period.
It appears that some confusion in the system took place during this
four minute gap, during which the referenced pointing duration (more
than 40 seconds) was somehow mistook by the system as a C-band
observation, which thus caused it to 'remember' the settings established
for X-band, and applied to all subsequent C-band observations. We were
certainly in a 'non-remembering' mode prior to this event (1 dB power
steps are seen in the PSum and PDif data), and we were certainly in a
'remembering' mode following it -- no 1 dB steps are seen in any antenna.
But we (Ken, Keith or I) have not discovered what actually caused
this to happen.
Lessons:
1) I would use this 'event' to emphasize the importance of getting
the outputs from all the receivers to be about the same (at the input to
the T304 modules), to minimize the effect of a problem like this in the
future. We should not count on software to get us out of these changes
in power.
2) There is one silver lining in all of this: The large power jump
allows me to estimate how well the PDif monitoring system can correct
for power changes seen by the sampler/requantizer. The good range is
rather less than you might expect:
Applying PDif should -- if the system is perfectly linear,
perfectly correct for power changes at the input. I found that all
those antenna-IFs whose power went down were corrected to within the
accuracy of the system (about 1%). And all antenna-IFs whose power went
up were corrected equally well, but only if the power jump was less than
a factor of two (3 dB). For antenna/IFs with larger jumps, the result
of applying the PDif monitoring undercorrected the visibilities -- the
apparent flux remained too high. When the power quadrupled, the
apparent flux is too high by 20%, when the power jumped by an order of
magnitude, the apparent flux is too high by a factor of two, and when
the power jumped by a factor of 50, the apparent flux of the sources is
too high by about a factor of 7.
The mechanism for the undercorrection is easy to identify. The
requantizer is overflowing -- voltages are being supplied which lie
outside the +/- 8 levels on each side of zero. These 'overflow'
voltages are not ignored, but are counted as the minimum or maximum
value -- always less (in the absolute value sense) than their true
values. So the 'digital' power, determined from the state counts, is
always being underrepresented, and this underestimate gets larger and
larger as the input power increases.
This is the same mechanism which is the primary (but probably not
the only) cause of the 'PDif Compression' that we see in the 3-bit system.
More information about the evlatests
mailing list