[evla-sw-discuss] Widar board boot problem
James Robnett
jrobnett at nrao.edu
Mon Oct 15 17:27:46 EDT 2012
As some of you know the disk in widar-boot-2, which supports
WIDAR racks 5 through 8, died around noon today. We've temporarily
redirected boards to widar-boot-1 but we're in a rather hybrid
state.
Boards that were already up still point at widar-boot-2, as long
as they don't actually page any new files in they're fine. Normally
they don't but will eventually.
A few boards that tried to reboot after the disk failure have since
been rebooted and are running off widar-boot-1. So we have a mix
of boards in racks 5 to 8. Those booting off widar-boot-1 and
those blissfully unaware that any I/O they try to widar-boot-2
will fail but probably will never need to try.
This is probably ok for a day or so but not a good plan long term.
Here's what we'd like to do.
1) Tomorrow or Wednesday (your pick) we shutdown widar-boot-2, swap
in a new disk and sync it to widar-boot-1. Should take about 2 hours.
The boards that are currently on widar-boot-2 should be fine while
we're doing the swap but they might hang. So we should find a window
when we're either not using them or we are but if we lose them it's
ok.
2) Once that's done we test a board and if it boots we reboot all the
boards in racks 5 to 8. The NFS filehandles will have changed so at
that point we just need to reboot all boards in racks 5 to 8.
Otherwise we have weird classes of boards that are either booting off
widar-boot-1 or had been rebooted and running off widar-boot-2 or
hadn't been rebooted and are a train wreck waiting to happen.
3) In addition we'd like to replace the gbit SFP to mchammer. It
caused some issues last week. That probably takes about 5 seconds
and shouldn't even be noticeable except for a few dropped packets.
james
More information about the evla-sw-discuss
mailing list