[Difx-users] Multiple Heads in same subnet - one waits for the other to finish ?

Stuart Weston stuart.weston at aut.ac.nz
Wed Apr 19 22:37:53 EDT 2017


I have two head nodes, each head node has 6 workers.

I split the job up into two groups of files, the idea being Head-1 does scans/files 1-6 and Head-2 does scans/files 7-11.

I create two separate input files with different file lists etc. Also two separate thread and machine files appropriate to the two different groups of ip addresses.

head-1, worker-1-1, worker-1-2 ... worker-1-6
head-2, worker-2-1 ..... worker-2-6

So set two jobs running in parallel

Head-1 > mpirun -machinefile machines-1 -np 5 mpifxcorr hw03_1.input
Head-2 > mpirun -machinefile machines-2 -np 5 mpifxcorr hw03_2.input

Now all machines are in the same subnet. I am guessing some communication is going on as Head-2 seem's to wait while Head-1 processes files 1-6, once Head -1 has finished Head-2 gets busy doing files 7-11.

Is there any way to have Head-1 and Head-2 running at the same time ? ie Head-2 doesn't wait for Head-1 to finish !
Stuart Weston Bsc (Hons), MPhil (Hons), MInstP
Mobile: 021 713062
Skype: stuart.d.weston
Email:  stuart.weston at aut.ac.nz<mailto:stuart.weston at aut.ac.nz>
http://www.atnf.csiro.au/people/Stuart.Weston/index.html

Software Engineer
Institute for Radio Astronomy & Space Research (IRASR)
School of Computing & Mathematical Sciences
Faculty of Creative Technologies
Auckland University of Technology, New Zealand.
http://www.irasr.aut.ac.nz/

[NewIRASRLogo]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/difx-users/attachments/20170420/af295e6b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.jpg
Type: image/jpeg
Size: 11334 bytes
Desc: image003.jpg
URL: <http://listmgr.nrao.edu/pipermail/difx-users/attachments/20170420/af295e6b/attachment.jpg>


More information about the Difx-users mailing list