[Difx-users] {Disarmed} {Disarmed} Re: {Disarmed} {Disarmed} Infiniband device problems with Slurm/mpifxcorr

Reynolds, Cormac (S&A, Kensington WA) Cormac.Reynolds at csiro.au
Mon Aug 16 23:59:07 EDT 2021


hi Joe,

I don't think our experience with IB is likely to be useful - we had some
issues with MPI assuming IB was available even though we had only ethernet,
and so included instructions to MPI to explicitly disable IB. startdifx used
to do something similar by default, but I don't believe it does any more. This
is kind of the opposite of the problem you are having.

There is some sketchy documentation of the old behaviour here:

https://www.atnf.csiro.au/vlbi/dokuwiki/doku.php/difx/difxmpi

but it is old and probably not that relevant. The discussion might give some
clues to things you could try though.

cheers,
Cormac.





On Tue, 2021-08-17 at 08:33 +1000, Adam Deller via Difx-users wrote:
> Just to add to Walter's excellent description, while I've thankfully not
> had to deal much with MPI's infiniband pickiness myself I think that Cormac
> Reynolds and/or Helge Rottman have overcome similar problems on their
> compute clusters in the past, so they might be able to chime in if Walter's
> suggestions don't work.
> 
> Googling 'infiniband slurm mpi' yields a lot of hits, so I doubt you are
> the only person that's come across similar issues!
> 
> Cheers,
> Adam
> 
> On Tue, 17 Aug 2021 at 08:21, Walter Brisken via Difx-users <
> difx-users at listmgr.nrao.edu> wrote:
> 
> > 
> > Hi Joe,
> > 
> > I've not used slurm or any process management layer above MPI, so can't
> > directly get at your problem.
> > 
> > I can give you one bit of advice on working out MPI problems though.
> > 
> > mpifxcorr is a pretty complicated program and it can add complexity when
> > sorting out MPI issues.  There is a program packaged with mpifxcorr called
> > "mpispeed".  This program requires an even number of processes to be
> > started at the same time.  All the program does is tell the odd numbered
> > processes to stream data as fast as possible to the even numbered
> > processes (1 goes to 2, 3 goes to 4, ...)  The program would exercise all
> > of the machinery required to start the MPI process without the
> > dependencies on difx filesets, ...
> > 
> > On the MPI issue itself: a couple things to try:
> > 
> > 1. If some of your machines have multiple ethernet ports that are on
> > different networks, then the routing tables need to be configured properly
> > so you stay on the initiating network.
> > 
> > 2. If you are using ssh between nodes, there can be subtle authentication
> > issues that creep in.  Make sure you can ssh from the head node to each of
> > the other nodes without entry of a password or passphrase.
> > 
> > 3. Probably if you have infiniband, it will only try to use TCP/ethernet
> > for the process management, not for IPC.  You might try removing the
> > "--mca btl_tcp_if_include eth0" parameters.  You could even try forcefully
> > excluding it with "--mca btl self,openib".  If you were explicitly using
> > "btl_tcp_if_include" to work around network routing issues, see suggestion
> > #1 above.
> > 
> > 
> > Hopefully something above helps a bit...
> > 
> > -Walter
> > 
> > 
> > -------------------------
> > Walter Brisken
> > NRAO
> > Deputy Assistant Director for VLBA Development
> > (505)-234-5912 (cell)
> > (575)-835-7133 (office; not useful during COVID times)
> > 
> > On Mon, 16 Aug 2021, Joe Skeens via Difx-users wrote:
> > 
> > > Hi all,
> > > I'm what you might call an MPI newbie, and I've been trying to run
> > 
> > mpifxcorr on a cluster with the Slurm
> > > scheduler and running into some problems. In the cluster setup, there's
> > 
> > an InfiniBand device that handles
> > > communication between nodes, but the setup doesn't seem to
> > 
> > recognize/utilize it properly.
> > > 
> > > For the command line prompt,
> > > salloc -N 7 mpirun -np 7 mpifxcorr ${EXPER}.input
> > > 
> > > I get:
> > > WARNING: There is at least non-excluded one OpenFabrics device found,
> > 
> > but there are no active ports
> > > detected (or Open MPI was unable to use them). This is most certainly
> > 
> > not what you wanted. Check your
> > > cables, subnet manager configuration, etc. The openib BTL will be
> > 
> > ignored for this job. Local host: nod50
> > > 
> > > This leads to a fatal failure to connect between nodes (I think):
> > > 
> > > WARNING: Open MPI failed to TCP connect to a peer MPI process. This
> > 
> > should not happen. Your Open MPI job
> > > may now fail. Local host: nod77 PID: 4410 Message: connect()
> > 
> > to MailScanner warning: numerical links are
> > > often malicious: MailScanner warning: numerical links are often
> > 
> > malicious: 192.168.5.76:1024 failed Error:
> > > Operation now in progress (115)
> > > 
> > > Notably, if I force connection through an ethernet device with the
> > 
> > command line prompt,
> > > 
> > > salloc -N 7 mpirun -np 7 --mca btl_tcp_if_include eth0 mpifxcorr
> > 
> > ${EXPER}.input
> > > 
> > > mpifxcorr runs with no problem, although presumably at a large loss in
> > 
> > efficiency.
> > > 
> > > This may be impossible to diagnose without knowing more about the
> > 
> > server/cluster architecture, but I
> > > figured I'd see if anyone else has run into similar issues and found a
> > 
> > solution. It's also entirely
> > > possible I'm missing something obvious.
> > > 
> > > 
> > > Thanks,
> > > 
> > > Joe Skeens
> > > 
> > > 
> > > [no_photo.png]
> > > ReplyForward
> > > 
> > > _______________________________________________
> > 
> > Difx-users mailing list
> > Difx-users at listmgr.nrao.edu
> > https://listmgr.nrao.edu/mailman/listinfo/difx-users
> > 
> 
> 
> _______________________________________________
> Difx-users mailing list
> Difx-users at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/difx-users



More information about the Difx-users mailing list