[Difx-users] Multiple Heads in same subnet - one waits for the other to finish ?

Stuart Weston stuart.weston at aut.ac.nz
Wed Apr 19 22:52:56 EDT 2017


Hi Adam,

Job2 – yes it starts and waits for job1 to finish. Yes it writes to difxlog

File-based correlation – yes, mk5 files

Do we need a higher level of debug to see why it pauses ? should I use “—bind-to none” ?

Stuart

From: adeller at gmail.com [mailto:adeller at gmail.com] On Behalf Of Adam Deller
Sent: Thursday, 20 April 2017 2:46 p.m.
To: Stuart Weston <stuart.weston at aut.ac.nz>
Cc: Difx-users at listmgr.nrao.edu
Subject: Re: [Difx-users] Multiple Heads in same subnet - one waits for the other to finish ?

Hi Stuart,

So I'm assuming that job2 does actually start and something gets written to the difxlog, then it pauses until job1 finishes, and then it fires up and runs to completion?  If that is the case, can you post the job2 difxlog as it stands during the "pause" phase?  That might give a clue as to what it is waiting for.

Also is this file-based correlation?

Cheers,
Adam

On 20 April 2017 at 12:37, Stuart Weston <stuart.weston at aut.ac.nz<mailto:stuart.weston at aut.ac.nz>> wrote:
I have two head nodes, each head node has 6 workers.

I split the job up into two groups of files, the idea being Head-1 does scans/files 1-6 and Head-2 does scans/files 7-11.

I create two separate input files with different file lists etc. Also two separate thread and machine files appropriate to the two different groups of ip addresses.

head-1, worker-1-1, worker-1-2 … worker-1-6
head-2, worker-2-1 ….. worker-2-6

So set two jobs running in parallel

Head-1 > mpirun -machinefile machines-1 -np 5 mpifxcorr hw03_1.input
Head-2 > mpirun -machinefile machines-2 -np 5 mpifxcorr hw03_2.input

Now all machines are in the same subnet. I am guessing some communication is going on as Head-2 seem’s to wait while Head-1 processes files 1-6, once Head -1 has finished Head-2 gets busy doing files 7-11.

Is there any way to have Head-1 and Head-2 running at the same time ? ie Head-2 doesn’t wait for Head-1 to finish !
Stuart Weston Bsc (Hons), MPhil (Hons), MInstP
Mobile: 021 713062
Skype: stuart.d.weston
Email:  stuart.weston at aut.ac.nz<mailto:stuart.weston at aut.ac.nz>
http://www.atnf.csiro.au/people/Stuart.Weston/index.html

Software Engineer
Institute for Radio Astronomy & Space Research (IRASR)
School of Computing & Mathematical Sciences
Faculty of Creative Technologies
Auckland University of Technology, New Zealand.
http://www.irasr.aut.ac.nz/

[NewIRASRLogo]



_______________________________________________
Difx-users mailing list
Difx-users at listmgr.nrao.edu<mailto:Difx-users at listmgr.nrao.edu>
https://listmgr.nrao.edu/mailman/listinfo/difx-users



--
!=============================================================!
Dr. Adam Deller
ARC Future Fellow, Senior Lecturer
Centre for Astrophysics & Supercomputing
Swinburne University of Technology
John St, Hawthorn VIC 3122 Australia
phone: +61 3 9214 5307
fax: +61 3 9214 8797

office days (usually): Mon-Thu
!=============================================================!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/difx-users/attachments/20170420/c54464f3/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 11334 bytes
Desc: image001.jpg
URL: <http://listmgr.nrao.edu/pipermail/difx-users/attachments/20170420/c54464f3/attachment-0001.jpg>


More information about the Difx-users mailing list