[evlatests] An extraordinary event with 'Set and Remember'

Fri Feb 3 17:32:18 EST 2012

    One of the purposes for the 'flux densities' run is to uncover odd 
failure modes.  One of the oddest ever found is described below.

    I apologize for the length of this email, but the issue is rather 
subtle. 

    Symptoms:

    While editing the C-band data, it was found that nearly all 
antennas/IFs showed a jump in power between 20:41 and 20:46 IAT, on 
January 19.  The jump was very large in many cases -- for some antennas, 
it exceeded a factor of 50!  There is no correlation in the jump 
amplitude between the IFs -- all acted independently.  The new power 
level lasted until the end of the run, at 06:20 the following day.  The 
new power level was not always higher -- for some antenna/IFs, it was 
lower, by up to a factor of 8. 

    Diagnosis: 

    To understand what happened, it is necessary to understand the 'set 
and remember' mechanism.  For the first time any particular tuning is 
made, the executor is supposed to 'remember' the T304 attenuator values 
which were determined to give the proper power levels to the sampler.  
Although it takes only a few seconds to find these levels, it takes 
about 40 seconds to 'remember' these levels.  If the initial scan at any 
particular frequency is not long enough, the relevant values are not 
remembered, and the system will try remember the needed levels the next 
time the tuning is encountered.  If no scan is long enough, then each is 
then at its own level.  This is not a bad thing -- indeed there are 
arguments (particularly relevant at high frequencies) where it can be 
argued that this is a good thing (i.e., we shouldn't be remembering 
anything).  Gain changes associated with different attenuator settings 
should be (and are !!!) corrected by the switched power monitoring 
system.  The only valid reason for the 'set and remember' mechanism is 
to stabilize the bandpass shape, since this is known to vary with 
attenuator settings.  But I digress ...

    In my file, no observation at any frequency lasted more than 20 
seconds (!), except at L-band.  To enable 'set and remember', the first 
observation was extended to 40 seconds for each band.  It was thought 
that this should be sufficient.  But I hadn't counted on system aborts ...

    There were two restarts, both of which 'forgot' the attenuator 
settings established at the beginning.  This meant that all observations 
were 'on their own', since there was never enough time to remember a 
setting.  And this was no problem at all bands, except at C-band, where 
an extraordinary event occurred.  For it seems that about four hours 
after the second restart, the executor somehow established, and 
remembered, the attenuator settings for X-band, then applied them to 
C-band (and not to X-band).   Once 'remembered', the system used these 
erroneous values for the rest of the run. 

    The evidence to support this contention is circumstantial, but 
pretty strong. 

    1)  The largest power offsets found in the c-band data are from 
those antennas which do not have new, wideband, X-band receivers.  These 
are well known to be noticeably underpowered, so the T304 attenuator 
settings are turned down to raise the output power. 
    2) Both the X and C band observations prior to the first C-band 
observation with the higher power levels are both missing from the 
archive.  The last valid scan before the 'event' is at L-band, after 
which there is a four-minute gap (where X band referenced pointing, 
X-band observation, and C-band observation should be), following which 
is a valid S-band observation. 
    3) The referenced pointing solution which should be been determined 
within that gap does not exist. 
   
    There is nothing in any log which calls attention to any event 
during the period. 

    It appears that some confusion in the system took place during this 
four minute gap, during which the referenced pointing duration (more 
than 40 seconds) was somehow mistook by the system as a C-band 
observation, which thus caused it to 'remember' the settings established 
for X-band, and applied to all subsequent C-band observations.  We were 
certainly in a 'non-remembering' mode prior to this event (1 dB power 
steps are seen in the PSum and PDif data), and we were certainly in a 
'remembering' mode following it -- no 1 dB steps are seen in any antenna. 

    But we (Ken, Keith or I) have not discovered what actually caused 
this to happen.

    Lessons:

    1)  I would use this 'event' to emphasize the importance of getting 
the outputs from all the receivers to be about the same (at the input to 
the T304 modules), to minimize the effect of a problem like this in the 
future.   We should not count on software to get us out of these changes 
in power. 

    2) There is one silver lining in all of this:  The large power jump 
allows me to estimate how well the PDif monitoring system can correct 
for power changes seen by the sampler/requantizer.  The good range is 
rather less than you might expect:
         Applying PDif should -- if the system is perfectly linear, 
perfectly correct for power changes at the input.  I found that all 
those antenna-IFs whose power went down were corrected to within the 
accuracy of the system (about 1%).  And all antenna-IFs whose power went 
up were corrected equally well, but only if the power jump was less than 
a factor of two (3 dB).  For antenna/IFs with larger jumps, the result 
of applying the PDif monitoring undercorrected the visibilities -- the 
apparent flux remained too high.  When the power quadrupled, the 
apparent flux is too high by 20%, when the power jumped by an order of 
magnitude, the apparent flux is too high by a factor of two, and when 
the power jumped by a factor of 50, the apparent flux of the sources is 
too high by about a factor of 7. 
   
      The mechanism for the undercorrection is easy to identify.  The 
requantizer is overflowing -- voltages are being supplied which lie 
outside the +/- 8 levels on each side of zero.  These 'overflow' 
voltages are not ignored, but are counted as the minimum or maximum 
value -- always less (in the absolute value sense) than their true 
values.  So the 'digital' power, determined from the state counts, is 
always being underrepresented, and this underestimate gets larger and 
larger as the input power increases. 
   
      This is the same mechanism which is the primary (but probably  not 
the only) cause of the 'PDif Compression' that we see in the 3-bit system.