[Difx-users] Error in running the startdifx command with DiFX software {External} {External}

Adam Deller adeller at astro.swin.edu.au
Wed May 24 20:42:27 EDT 2023


Hi De Wu,

If I run

mpirun -H localhost,localhost mpispeed 1000 10s 1

it runs correctly as follows:

adeller at ar313-adeller trunk Downloads> mpirun -H localhost,localhost
mpispeed 1000 10s 1 | head
Processor = <my host name>
Rank = 0/2
[0] Starting
Processor =<my host name>
Rank = 1/2
[1] Starting

It seems like in your case, MPI is looking at the two identical host names
you've given and is deciding to only start one process, rather than two.
What if you run

mpirun -n 2 -H wude,wude mpispeed 1000 10s 1

?

I think the issue is with your MPI installation / the parameters being
passed to mpirun. Unfortunately as I've mentioned previously the behaviour
of MPI with default parameters seems to change from implementation to
implementation and version to version - you just need to track down what is
needed to make sure it actually runs the number of processes you want on
the nodes you want!

Cheers,
Adam


On Wed, 24 May 2023 at 18:30, 深空探测 via Difx-users <
difx-users at listmgr.nrao.edu> wrote:

> Hi  All,
>
> I am writing to seek assistance regarding an issue I encountered while
> working with MPI on a CentOS 7 virtual machine.
>
> I have successfully installed openmpi-1.6.5 on the CentOS 7 virtual
> machine. However, when I attempted to execute the command "startdifx -f -n
> -v aov070.joblist," I received the following error message:
>
> "Environment variable DIFX_CALC_PROGRAM was set, so
> Using specified calc program: difxcalc
>
> No errors with input file /vlbi/corr/aov070/aov070_1.input
>
> Executing: mpirun -np 4 --hostfile /vlbi/corr/aov070/aov070_1.machines
> --mca mpi_yield_when_idle 1 --mca rmaps seq runmpifxcorr.DiFX-2.6.2
> /vlbi/corr/aov070/aov070_1.input
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------"
>
> To further investigate the MPI functionality, I wrote a Python program
> “mpi_hello_world.py” as follows:
>
> from mpi4py import MPI
>
> comm = MPI.COMM_WORLD
> rank = comm.Get_rank()
> size = comm.Get_size()
>
> print("Hello from rank", rank, "of", size)
>
> When I executed the command "mpiexec -n 4 python mpi_hello_world.py," the
> output was as follows:
>
> ('Hello from rank', 0, 'of', 1)
> ('Hello from rank', 0, 'of', 1)
> ('Hello from rank', 0, 'of', 1)
> ('Hello from rank', 0, 'of', 1)
>
> Additionally, I attempted to test the MPI functionality using the
> "mpispeed" command with the following execution command: "mpirun -H
> wude,wude mpispeed 1000 10s 1".  “wude” is my hostname. However, I
> encountered the following error message:
>
> "Processor = wude
> Rank = 0/1
> Sorry, must run with an even number of processes
> This program should be invoked in a manner similar to:
> mpirun -H host1,host2,...,hostN mpispeed [<numSends>|<timeSend>s]
> [<sendSizeMByte>]
>     where
>         numSends: number of blocks to send (e.g., 256), or
>         timeSend: duration in seconds to send (e.g., 100s)
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------"
>
> I am uncertain about the source of these issues and would greatly
> appreciate your guidance in resolving them. If you have any insights or
> suggestions regarding the aforementioned errors and how I can rectify them,
> please let me know.
>
> Thank you for your time and assistance.
>
> Best regards,
>
> De Wu
> _______________________________________________
> Difx-users mailing list
> Difx-users at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/difx-users
>


-- 
!=============================================================!
Prof. Adam Deller
Centre for Astrophysics & Supercomputing
Swinburne University of Technology
John St, Hawthorn VIC 3122 Australia
phone: +61 3 9214 5307
fax: +61 3 9214 8797
!=============================================================!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/difx-users/attachments/20230525/5290c78f/attachment.html>


More information about the Difx-users mailing list