[evla-sw-discuss] EVLA Lustre problem
James Robnett
jrobnett at nrao.edu
Wed Dec 19 14:04:33 EST 2012
As I mentioned in the meeting today I'm now quite certain this
only effected the 4 active NGAS nodes which have Infiniband
connections for the AOC Lustre and were trying to use that
interface to communicate with the EVLA lustre filesystem.
Those 4 nodes are fixed and should be ok now.
Regardless it was agreed we should probably avoid rebooting
any of the servers if possible till after Christmas or New
Years since they've been up for quite a while. We want to
avoid triggering any latent startup issues.
james
On 12/13/2012 11:08 AM, James Robnett wrote:
>
> I think there may be a problem with the EVLA lustre filesystem
> that will prevent machines that reboot from re-mounting Lustre.
>
> All the existing clients (CBE nodes, mchammer, mctest, most NGAS
> nodes) have been up for a while and don't have an issue.
>
> Please do not reboot any machine that mounts the EVLA Lustre
> filesystem for now. It's rare that they actually need rebooting,
> typically people reboot out of expedience not necessity. If
> there's a problem with one of these machines let me know so
> I can fix it without rebooting.
>
> James
> ps: This is probably a side effect from the Infiniband work
> last week.
> _______________________________________________
> evla-sw-discuss mailing list
> evla-sw-discuss at listmgr.cv.nrao.edu
> http://listmgr.cv.nrao.edu/mailman/listinfo/evla-sw-discuss
More information about the evla-sw-discuss
mailing list