[evla-sw-discuss] VLA Network outage next Tuesday 9am to 1pm

James Robnett jrobnett at nrao.edu
Thu Jan 31 20:39:22 EST 2013


A bit of a wide audience but I wanted to describe the process
next Tuesday, where we'd like help and what to expect.

We are replacing the central network switch at the VLA, each
antenna, all infrastructure and M&C servers, some MIBs and
secondary switches for connections to other areas are directly
connected to this switch (100'ish connections).

During the outage all networking into and out of the VLA will
be disrupted.   This includes traffic to Pie Town so it will
need to go to dial backup but should be unaffected where observing
is concerned.  It also means antenna phones are down, email, etc.

We intend to start at 9am sharp, initially we will shutdown the
switch and start disconnecting cables and fibers.  We will then
install the new switch and reverse the process.

The order of re-connections and therefore tests we want to
do are:

1) AOC/VLA link, then PT link I'll test these
2) Primary infrastructure servers (I'll test) and antennas.
    After the first antenna is connected I will want somebody
    to ensure that MIBS can be communicated with and rebooted.
3) M&C servers,  I'll test but it will wait till I know the
    infrastructure servers are stable.  When I think they look
    correct I'll let software group know via email.
4) Secondary switches beginning with correlator room and then
    DRack room for GPS time servers and other MIBS.  After
    basic server tests (with possible reboots) we need to test
    that CMIBS still work, we may need to reboot the correlator.
    It's not possible for me to predict, we do not need to power
    it down.
5) Connect everything else.
6) Check timecode in MIBS and CMIBS
7) Test observing.

It's possible we'll need to reboot some servers but I'm fairly
optimistic most will be fine, it depends on dependencies and
order.  The mostly likely candidates for reboot are the oracle
servers and their dependents (mcmonitor etc).

My intent is by 1pm not only are we done with the work but have
a pretty good handle on whether things are working and have actually
tested through step 5 in parallel.  Steps 6 and 7 will probably be
after 1pm unless we're better at parallelism than I'd guess.  I'd very
much like to know if there are any general problems with observing
by 3pm.

To that end I need names and numbers of folks I can contact to
help.  At a guess I'd assume Hichem for MIBS, Kerry for initial
correlator health, specific software folks for M&C and WIDAR servers
(I know who you are)  and Ken for general system health and time code
but anybody's fine for all of those as long as I know who it is.

James
ps: I'll send an email tomorrow and monday to so/vlaemploy and
make the effects clearer (antenna phones) while dumping the process bit.



More information about the evla-sw-discuss mailing list