[evla-sw-discuss] EVLA Lustre IP addressing and telcal
James Robnett
jrobnett at nrao.edu
Thu Sep 9 17:18:08 EDT 2010
Some maintenance day, either this one or next we need to shut down
lustre and the CBE for an hour or so and re-address the Lustre file
system nodes.
A few weeks ago we ran into a problem where multiple (3 in this case)
telcal instances flooded the 1Gbit link from the old to new correlator
rooms. The telcals are running in the old room and the lustre install
is in the new.
The obvious solution, and what we planned, was to add dedicated
links from mchammer and mctest to the new correlator room and the lustre
filesystem. This will be necessary fairly soon for bandwidth reasons
if each telcal instance is to keep up with observe rates.
Sadly, since lustre and mchammer/mctest are already on the same
10.80.100.xxx M&C network there is no way to force traffic to go over
this dedicated link, it still goes out over their normal interface. The
only solution is to re-address lustre so it's on a separate network from
the M&C 10.80.100.xxx network. Then those two machines secondary
dedicated links can be on that new network.
At the same time I'd like to upgrade Lustre to 1.8.4 from 1.8.3.
We've been running it here for a month or so without issue.
The point of this email is to figure out when this should be done.
james
ps: it doesn't have to be a maintenance day, just a 1+ hour window
when we're not observing during normal business hours.
More information about the evla-sw-discuss
mailing list