[evla-sw-discuss] EVLA Lustre problem
James Robnett
jrobnett at nrao.edu
Thu Dec 13 13:44:33 EST 2012
Quick update. There's some evidence all machines but NGAS nodes
are fine.
The NGAS nodes have both Inifinband (to connect to the AOC lustre)
and gbit ethernet interfaces. Despite everything looking correct
(explicit mount command, routing etc) they seem to want to mount
the EVLA lustre via their IB interface rather than their gbit
interface.
This is impossible so it fails.
I'd still prefer nothing get rebooted but I have good reason to
believe the risk isn't as severe as I stated.
James
On 12/13/2012 11:08 AM, James Robnett wrote:
>
> I think there may be a problem with the EVLA lustre filesystem
> that will prevent machines that reboot from re-mounting Lustre.
>
> All the existing clients (CBE nodes, mchammer, mctest, most NGAS
> nodes) have been up for a while and don't have an issue.
>
> Please do not reboot any machine that mounts the EVLA Lustre
> filesystem for now. It's rare that they actually need rebooting,
> typically people reboot out of expedience not necessity. If
> there's a problem with one of these machines let me know so
> I can fix it without rebooting.
>
> James
> ps: This is probably a side effect from the Infiniband work
> last week.
More information about the evla-sw-discuss
mailing list