[evlatests] New and bizarre DTS behavior
Mike Revnell
mrevnell at nrao.edu
Thu May 3 11:35:49 EDT 2007
The following is a rather lengthy description to justify the request I
make in the last two paragraphs.
We are observing a new and bizarre behavior in the DTS modules.
This just started happeing about 3 weeks ago. Some of the modules it has
happend to have been working in antennas for a couple years. We have
made no changes to the affected parts of the modules for a long time so
I have difficulty believing it is bad hardware.
At seemingly random times seemingly random groups of DTS modules go
off-line. The state we find them in is consistent with them having
received a "PSCReset" command or an explicit command to turn off the
formatter. The formatter board is powered down but the digitizer is
still under power. Power to these two boards is controlled
independently. If module power is interrputed all module components are
turned off. The logic that controls digitizer power runs off the same
regulator as logic that controls formatter power.
The PSCReset command causes the fpga that controls formatter power to be
reconfigured. Thus in this event the formatter gets powered down but the
digitizer, if it is on, will not.
Last night 6 of them went down in a 10 minute period. Here is the timing
of events to 1 minute resolution. These times come from timestamps on
email messages.
10:09 PM 16 A
10:11 PM 21 A
10:12 PM 26 A
10:14 PM 16 C
10:16 PM 16 D
10:19 PM 23 A
As far as we are able to tell, when more than one DTS goes off-line they
do so in groups with time resolutions similar to the above. This timing
is consistent with some person doing something.
Given the state we find the modules in I have been able to think of only
a couple scenarios to explain the behavior.
1. Something or someone has started sending PSCReset commands to the
modules. I can think of no reason to do this because the circumstances
which normally require it happen only when a module loses then regains
its time code input. If something were automatically doing it I would
expect it to happen more frequently and see more links go off line. I
would also expect all links to go off-line within the same minute. Since
doing this would have required writing some new piece of software, at
the expense of writing something more useful, I don't think this is the
case.
2. Some change in a seemingly unrelated system or procedure has started
expoliting a hitherto unused bug in the DTS MIB code or FPGA code. This
could be, for example, a command to the DTS module that for some reason
through a sneak logic connection causes the flip-flop that controls the
formatter power to be reset. Another possibility is that some command to
the MIB is causing it to assert its chip-select line that causes the
FPGA to reconfigure.
Again this is behavior observed, for the first time about 3 weeks ago,
in modules that have had no changes. Some of them have been running,
without problem, in antennas for over 2 years.
All that to justify this.
Please review any changes to software and/or procedures which have been
instituted in the last few weeks. It seems to me that we may be
exercising a hitherto unused bug somewhere through a sneak path from an
unrelated change. In this case, I believe coincidence does matter.
If anyone might have an idea of what they or someone else was doing with
the system around 10:09 last night it could provide a helpful clue.
Thanks.
Mike Revnell
More information about the evlatests
mailing list