[Difx-users] DiFX mpirun problem

Stuart Weston nzobservers at gmail.com
Tue Aug 30 21:45:59 EDT 2016


Do IP addresses get added in when the code is compiled ?

oper at ww-flexbuf-01 DiFX-2.4.3 v534a> mpirun -machinefile v534a_9.machines
-np 12 mpifxcorr v534a_9.input
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node wark167
[ww-flexbuf-01][[12885,1],6][../../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 156.62.231.167 failed: No route to host (113)
[ww-flexbuf-01][[12885,1],5][../../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 156.62.231.167 failed: No route to host (113)
[ww-flexbuf-01][[12885,1],3][../../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 156.62.231.167 failed: No route to host (113)
[ww-flexbuf-01][[12885,1],11][../../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 156.62.231.167 failed: No route to host (113)

The correct IP address should be 163.7.128.11 and not 156.62.231.167.

I have checked "/etc/hosts" on both servers. Also stop/start "rpcbind" just
in case. I have tried putting the IP addresses in the machines file and not
the host name. Still get the error ?

Tried with a very simple mpirun and thats good, ie:

oper at ww-flexbuf-01 DiFX-2.4.3 v534a> cat hosts
163.7.128.194
163.7.128.11

oper at ww-flexbuf-01 DiFX-2.4.3 v534a> mpirun -np 2 -hostfile hosts hostname
ww-flexbuf-01
wark167

Any ideas as to why it insists on picking up the wrong IP address ?

oper at ww-flexbuf-01 DiFX-2.4.3 v534a> cat v534a_9.machines
163.7.128.194
163.7.128.194
163.7.128.194
163.7.128.194
163.7.128.194
163.7.128.194
163.7.128.194
163.7.128.11
oper at ww-flexbuf-01 DiFX-2.4.3 v534a> mpirun -machinefile v534a_9.machines
-np 12 mpifxcorr v534a_9.input
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node ww-flexbuf-01
About to run MPIInit on node wark167
[ww-flexbuf-01][[3498,1],6][../../../../../../ompi/mca/btl/tcp/btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 156.62.231.167 failed: No route to host (113)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/difx-users/attachments/20160831/756dda58/attachment.html>


More information about the Difx-users mailing list