[Difx-users] mpicorrdifx cannot be loaded correctly on more than a single node

Arash Roshanineshat arashroshani92 at gmail.com
Wed Jun 28 17:34:53 EDT 2017


Hi,

I could install difx but it can only be run on a single node cluster.

The *.machines and *.threads files are attached to this email.

Openmpi is installed on all nodes and difx folder and data folder is 
shared among the clusters using NFS filesystem. Difx works perfectly 
with correct output on single machines.

executing "startdifx -v -f e17d05-Sm-Sr_1000.input" returns the 
following error:

DIFX_MACHINES -> /home/arash/Shared_Examples/Example2/C.txt
Found modules:
Executing:  mpirun -np 6 --hostfile 
/home/arash/Shared_Examples/Example2/e17d05-Sm-Sr_1000.machines --mca 
mpi_yield_when_idle 1 --mca rmaps seq  runmpifxcorr.DiFX-2.5 
/home/arash/Shared_Examples/Example2/e17d05-Sm-Sr_1000.input
--------------------------------------------------------------------------
While computing bindings, we found no available cpus on
the following node:

   Node:  fringes-difx0

Please check your allocation.
--------------------------------------------------------------------------
Elapsed time (s) = 0.50417590141

and executing

$ mpirun -np 6 --hostfile 
/home/arash/Shared_Examples/Example2/e17d05-Sm-Sr_1000.machines 
/home/arash/difx/bin/mpifxcorr 
/home/arash/Shared_Examples/Example2/e17d05-Sm-Sr_1000.input

seems to be working but by observing the cpu usage, I see only 6 cpus 
involving "5 in fringes-difx0 and 1 in fringes-difx1". I was expecting 
it to use the number of cpus equal to the number in "*.threads" file. 
How can I solve this issue?

the specification of the cluster is Socket=2, Core per Socket=10 and 
Threads per core=2.

Best Regards

Arash Roshanineshat




-------------- next part --------------
fringes-difx0
fringes-difx0
fringes-difx0
fringes-difx0
fringes-difx1
fringes-difx0
-------------- next part --------------
NUMBER OF CORES:    2
20
19


More information about the Difx-users mailing list