[Difx-users] Problem in performing "startdifx"

Adam Deller adeller at astro.swin.edu.au
Mon Apr 10 17:50:11 EDT 2017


Echoing Walter: this is an openmpi issue, not a DiFX issue per se.  It may
well be related to your specific openmpi version, see
http://users.open-mpi.narkive.com/XhLEHyPF/ompi-users-openmpi-1-8-rmaps-seq-doesn-t-work.
Try upgrading (or downgrading) openmpi.

On 10 April 2017 at 23:25, Walter Brisken <wbrisken at lbo.us> wrote:

>
> There is a program called mpispeed which I find useful when diagnosing
> mpi-related problems.  It comes with the mpifxcorr package.  To run it,
> select an even number of hosts (I usually just use 2) and run:
>
> mpirun <options> mpispeed
>
> where <options> are whatever options (including either a .machines file or
> explicit list of hosts) you want to run.  Its often easier to diagnose the
> issues without the extra complexity of the correlator.  If successful, it
> will do 256 sends from the first host to the second.
>
> When there are mpi problems, the issue may be beyond the scope of this
> mailing list and you might do well using google or documentation at
> openmpi.org to find the fix.
>
> -Walter
>
>
> On Mon, 10 Apr 2017, Lupin Lin wrote:
>
> Dear Adam,
>>
>> Thanks for your response.
>> I checked the verbose output, and obtained the following messages:
>>
>> Executing: difxlog g3_7038 /data/lupin/g3-7038/g3_7038.difxlog 4 44843 &
>> Executing:  mpirun -np 5 --hostfile /data/lupin/g3-7038/g3_7038.machines
>> --mca mpi_yield_when_idle 1 --mca rmaps seq  runmpifxcorr.trunk
>> /data/lupin/g3-7038/g3_7038.input
>> ------------------------------------------------------------
>> --------------
>> Your job failed to map. Either no mapper was available, or none
>> of the available mappers was able to perform the requested
>> mapping operation. This can happen if you request a map type
>> (e.g., loadbalance) and the corresponding mapper was not built.
>> ------------------------------------------------------------
>> --------------
>> Elapsed time (s) = 0.351403951645
>>
>> Then I compared with the mpirun command manually, and I found that the
>> following command/description can be successfully performed.
>> ------------------
>> mpirun -np 5 --hostfile /data/lupin/g3-7038/g3_7038.machines --mca
>> mpi_yield_when_idle 1 mpifxcorr /data/lupin/g3-7038/g3_7038.input
>> ------------------
>>
>> However, I will meet the problem if I increase the parameters of “--mac
>> rmaps" and "seq runmpifxcorr.trunk”.
>> Should I avoid to use mapping operation?
>>
>> Sincerely,
>> --
>> Lupin Chun-Che Lin
>> Supporting Scientist of GLT (GreenLand Telescope) project
>> in Institute of Astronomy and Astrophysics, Academia Sinica,
>> 14F of Astronomy-Mathematics Building (Rm: 1405),
>> National Taiwan University.
>> No.1, Sec. 4, Roosevelt Rd, Taipei 10617, Taiwan.
>> Tel: +886-2-2366-5464
>> Fax: +886-2-2367-7849
>>
>>
>>
>> Adam Deller <adeller at astro.swin.edu.au> 於 2017年4月10日 下午6:45 寫道:
>>>
>>> Hi Lupin,
>>>
>>> Try startdifx -v -v to get verbose output, and compare the mpirun
>>> command to the one you're using manually.
>>>
>>> Cheers,
>>> Adam
>>>
>>> On 10 April 2017 at 19:00, Lupin Lin <lupin at asiaa.sinica.edu.tw <mailto:
>>> lupin at asiaa.sinica.edu.tw>> wrote:
>>> To the experienced user of DiFX,
>>>
>>> I have re-installed the DiFX on the OS of "Scientific Linux release 6.8
>>> (Carbon)”.
>>> However, when I performed the “startdifx”, I obtained the following
>>> error messages.
>>>
>>>>
>>>>>> ------------------------------------------------------------
>>> --------------
>>> Your job failed to map. Either no mapper was available, or none
>>> of the available mappers was able to perform the requested
>>> mapping operation. This can happen if you request a map type
>>> (e.g., loadbalance) and the corresponding mapper was not built.
>>> ------------------------------------------------------------
>>> --------------
>>>
>>> But when I used the mpifxcorr, I did not get any error message and I can
>>> successfully obtain the result.
>>> For example, I can use the following command to replace the startdifx:
>>> mpirun --machinefile g3_7038.machines -np 11 mpifxcorr g3_7038.input
>>>
>>> So it seems the problem is not due to the connection to the mpi.
>>>
>>> Does anyone know the possible origin to cause this problem? And please
>>> give an indication to solve it.
>>>
>>> Thanks,
>>> --
>>> Lupin Chun-Che Lin
>>> Supporting Scientist of GLT (GreenLand Telescope) project
>>> in Institute of Astronomy and Astrophysics, Academia Sinica,
>>> 14F of Astronomy-Mathematics Building (Rm: 1405),
>>> National Taiwan University.
>>> No.1, Sec. 4, Roosevelt Rd, Taipei 10617, Taiwan.
>>> Tel: +886-2-2366-5464 <tel:+886%202%202366%205464>
>>> Fax: +886-2-2367-7849 <tel:+886%202%202367%207849>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Difx-users mailing list
>>> Difx-users at listmgr.nrao.edu <mailto:Difx-users at listmgr.nrao.edu>
>>> https://listmgr.nrao.edu/mailman/listinfo/difx-users <
>>> https://listmgr.nrao.edu/mailman/listinfo/difx-users>
>>>
>>>
>>>
>>>
>>> --
>>> !=============================================================!
>>> Dr. Adam Deller
>>> ARC Future Fellow, Senior Lecturer
>>> Centre for Astrophysics & Supercomputing
>>> Swinburne University of Technology
>>> John St, Hawthorn VIC 3122 Australia
>>> phone: +61 3 9214 5307
>>> fax: +61 3 9214 8797
>>>
>>> office days (usually): Mon-Thu
>>> !=============================================================!
>>>
>>
>>
>>
> --
> -------------------------
> Walter Brisken
> Director
> Long Baseline Observatory
> (505)-234-5912




-- 
!=============================================================!
Dr. Adam Deller
ARC Future Fellow, Senior Lecturer
Centre for Astrophysics & Supercomputing
Swinburne University of Technology
John St, Hawthorn VIC 3122 Australia
phone: +61 3 9214 5307
fax: +61 3 9214 8797

office days (usually): Mon-Thu
!=============================================================!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/difx-users/attachments/20170411/dde66212/attachment.html>


More information about the Difx-users mailing list