[Difx-users] Intel oneAPI and MPI

Jan Florian Wagner jwagner105 at googlemail.com
Tue Sep 28 07:55:25 EDT 2021


Hi all,

has anyone tried out the Intel oneAPI 2021.3 packages? How has your
experience been? In particular, did you get Intel MPI working?

I've installed oneAPI here and the DiFX components compile fine under the
respectively required Intel icc (C), icpc (C++), or Intel MPI mpicxx
compilers, plus the Intel IPP 2021.3 library.

However I cannot get MPI to work across compute nodes and Mark6. For
example:

$ which mpirun
/opt/intel/oneapi/mpi/2021.3.1/bin/mpirun

($ export I_MPI_PLATFORM=auto)
$ mpirun -prepend-rank -n 6 -perhost 1 -machinefile intel.hostfile -bind-to
none -iface ib0 mpifxcorr
[1] About to run MPIInit on node mark6-02
[0] About to run MPIInit on node mark6-01
[2] About to run MPIInit on node mark6-03
[5] About to run MPIInit on node node12.service
[3] About to run MPIInit on node node10.service
[4] About to run MPIInit on node node11.service
[1] Abort(1615503) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init:
Other MPI error, error stack:
[1] MPIR_Init_thread(138)........:
[1] MPID_Init(1169)..............:
[1] MPIDI_OFI_mpi_init_hook(1842):
[1] MPIDU_bc_table_create(336)...: Missing hostname or invalid host/port
description in business card

The error is quite cryptic and I have not found much help elsewhere online.

Maybe someone here has come across it?

Oddly, mpirun or rather the MPI_Init() in mpifxcorr works just fine when
the machinefile contains only Mark6 units, or when it contains only compute
nodes.

Mixing both compute and Mark6 leads to the above error. All hosts have the
same CentOS 7.7.1908 and Mellanox Infiniband mlx4_0 as ib0...

many thanks,
regards,
Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/difx-users/attachments/20210928/d8b67607/attachment.html>


More information about the Difx-users mailing list