[Difx-users] Problem in performing "startdifx"

Walter Brisken wbrisken at lbo.us
Mon Apr 10 09:25:18 EDT 2017


There is a program called mpispeed which I find useful when diagnosing 
mpi-related problems.  It comes with the mpifxcorr package.  To run it, 
select an even number of hosts (I usually just use 2) and run:

mpirun <options> mpispeed

where <options> are whatever options (including either a .machines file or 
explicit list of hosts) you want to run.  Its often easier to diagnose the 
issues without the extra complexity of the correlator.  If successful, it 
will do 256 sends from the first host to the second.

When there are mpi problems, the issue may be beyond the scope of this 
mailing list and you might do well using google or documentation at 
openmpi.org to find the fix.

-Walter

On Mon, 10 Apr 2017, Lupin Lin wrote:

> Dear Adam,
>
> Thanks for your response.
> I checked the verbose output, and obtained the following messages:
>
> Executing: difxlog g3_7038 /data/lupin/g3-7038/g3_7038.difxlog 4 44843 &
> Executing:  mpirun -np 5 --hostfile /data/lupin/g3-7038/g3_7038.machines --mca mpi_yield_when_idle 1 --mca rmaps seq  runmpifxcorr.trunk /data/lupin/g3-7038/g3_7038.input
> --------------------------------------------------------------------------
> Your job failed to map. Either no mapper was available, or none
> of the available mappers was able to perform the requested
> mapping operation. This can happen if you request a map type
> (e.g., loadbalance) and the corresponding mapper was not built.
> --------------------------------------------------------------------------
> Elapsed time (s) = 0.351403951645
>
> Then I compared with the mpirun command manually, and I found that the following command/description can be successfully performed.
> ------------------
> mpirun -np 5 --hostfile /data/lupin/g3-7038/g3_7038.machines --mca mpi_yield_when_idle 1 mpifxcorr /data/lupin/g3-7038/g3_7038.input
> ------------------
>
> However, I will meet the problem if I increase the parameters of ¡§--mac rmaps" and "seq runmpifxcorr.trunk¡¨.
> Should I avoid to use mapping operation?
>
> Sincerely,
> --
> Lupin Chun-Che Lin
> Supporting Scientist of GLT (GreenLand Telescope) project
> in Institute of Astronomy and Astrophysics, Academia Sinica,
> 14F of Astronomy-Mathematics Building (Rm: 1405),
> National Taiwan University.
> No.1, Sec. 4, Roosevelt Rd, Taipei 10617, Taiwan.
> Tel: +886-2-2366-5464
> Fax: +886-2-2367-7849
>
>
>
>> Adam Deller <adeller at astro.swin.edu.au> ©ó 2017¦~4¤ë10¤é ¤U¤È6:45 ¼g¹D¡G
>>
>> Hi Lupin,
>>
>> Try startdifx -v -v to get verbose output, and compare the mpirun command to the one you're using manually.
>>
>> Cheers,
>> Adam
>>
>> On 10 April 2017 at 19:00, Lupin Lin <lupin at asiaa.sinica.edu.tw <mailto:lupin at asiaa.sinica.edu.tw>> wrote:
>> To the experienced user of DiFX,
>>
>> I have re-installed the DiFX on the OS of "Scientific Linux release 6.8 (Carbon)¡¨.
>> However, when I performed the ¡§startdifx¡¨, I obtained the following error messages.
>>>>>
>> --------------------------------------------------------------------------
>> Your job failed to map. Either no mapper was available, or none
>> of the available mappers was able to perform the requested
>> mapping operation. This can happen if you request a map type
>> (e.g., loadbalance) and the corresponding mapper was not built.
>> --------------------------------------------------------------------------
>>
>> But when I used the mpifxcorr, I did not get any error message and I can successfully obtain the result.
>> For example, I can use the following command to replace the startdifx:
>> mpirun --machinefile g3_7038.machines -np 11 mpifxcorr g3_7038.input
>>
>> So it seems the problem is not due to the connection to the mpi.
>>
>> Does anyone know the possible origin to cause this problem? And please give an indication to solve it.
>>
>> Thanks,
>> --
>> Lupin Chun-Che Lin
>> Supporting Scientist of GLT (GreenLand Telescope) project
>> in Institute of Astronomy and Astrophysics, Academia Sinica,
>> 14F of Astronomy-Mathematics Building (Rm: 1405),
>> National Taiwan University.
>> No.1, Sec. 4, Roosevelt Rd, Taipei 10617, Taiwan.
>> Tel: +886-2-2366-5464 <tel:+886%202%202366%205464>
>> Fax: +886-2-2367-7849 <tel:+886%202%202367%207849>
>>
>>
>>
>>
>> _______________________________________________
>> Difx-users mailing list
>> Difx-users at listmgr.nrao.edu <mailto:Difx-users at listmgr.nrao.edu>
>> https://listmgr.nrao.edu/mailman/listinfo/difx-users <https://listmgr.nrao.edu/mailman/listinfo/difx-users>
>>
>>
>>
>>
>> --
>> !=============================================================!
>> Dr. Adam Deller
>> ARC Future Fellow, Senior Lecturer
>> Centre for Astrophysics & Supercomputing
>> Swinburne University of Technology
>> John St, Hawthorn VIC 3122 Australia
>> phone: +61 3 9214 5307
>> fax: +61 3 9214 8797
>>
>> office days (usually): Mon-Thu
>> !=============================================================!
>
>

-- 
-------------------------
Walter Brisken
Director
Long Baseline Observatory
(505)-234-5912


More information about the Difx-users mailing list