[evlatests] New and bizarre DTS behavior

Mike Revnell mrevnell at nrao.edu
Thu May 3 11:35:49 EDT 2007


The following is a rather lengthy description to justify the request I 
make in the last two paragraphs.

We are observing a new and bizarre behavior in the DTS modules.

This just started happeing about 3 weeks ago. Some of the modules it has 
happend to have been working in antennas for a couple years. We have 
made no changes to the affected parts of the modules for a long time so 
I have difficulty believing it is bad hardware.

At seemingly random times seemingly random groups of DTS modules go 
off-line. The state we find them in is consistent with them having 
received a "PSCReset" command or an explicit command to turn off the 
formatter. The formatter board is powered down but the digitizer is 
still under power. Power to these two boards is controlled 
independently. If module power is interrputed all module components are 
turned off. The logic that controls digitizer power runs off the same 
regulator as logic that controls formatter power.

The PSCReset command causes the fpga that controls formatter power to be 
reconfigured. Thus in this event the formatter gets powered down but the 
digitizer, if it is on, will not.

Last night 6 of them went down in a 10 minute period. Here is the timing 
of events to 1 minute resolution. These times come from timestamps on 
email messages.

10:09 PM 16 A
10:11 PM 21 A
10:12 PM 26 A
10:14 PM 16 C
10:16 PM 16 D
10:19 PM 23 A

As far as we are able to tell, when more than one DTS goes off-line they 
do so in groups with time resolutions similar to the above. This timing 
is consistent with some person doing something.

Given the state we find the modules in I have been able to think of only 
a couple scenarios to explain the behavior.

1. Something or someone has started sending PSCReset commands to the 
modules. I can think of no reason to do this because the circumstances 
which normally require it happen only when a module loses then regains 
its time code input. If something were automatically doing it I would 
expect it to happen more frequently and see more links go off line. I 
would also expect all links to go off-line within the same minute. Since 
doing this would have required writing some new piece of software, at 
the expense of writing something more useful, I don't think this is the 
case.

2. Some change in a seemingly unrelated system or procedure has started 
expoliting a hitherto unused bug in the DTS MIB code or FPGA code. This 
could be, for example, a command to the DTS module that for some reason 
through a sneak logic connection causes the flip-flop that controls the 
formatter power to be reset. Another possibility is that some command to 
the MIB is causing it to assert its chip-select line that causes the 
FPGA to reconfigure.

Again this is behavior observed, for the first time about 3 weeks ago, 
in modules that have had no changes. Some of them have been running, 
without problem, in antennas for over 2 years.

All that to justify this.

Please review any changes to software and/or procedures which have been 
instituted in the last few weeks. It seems to me that we may be 
exercising a hitherto unused bug somewhere through a sneak path from an 
unrelated change. In this case, I believe coincidence does matter.

If anyone might have an idea of what they or someone else was doing with 
the system around 10:09 last night it could provide a helpful clue.

Thanks.

Mike Revnell




More information about the evlatests mailing list