[Difx-users] Error in running the startdifx command with DiFX software {External}

深空探测 wude7826580 at gmail.com
Wed May 24 04:29:18 EDT 2023


Hi  All,

I am writing to seek assistance regarding an issue I encountered while
working with MPI on a CentOS 7 virtual machine.

I have successfully installed openmpi-1.6.5 on the CentOS 7 virtual
machine. However, when I attempted to execute the command "startdifx -f -n
-v aov070.joblist," I received the following error message:

"Environment variable DIFX_CALC_PROGRAM was set, so
Using specified calc program: difxcalc

No errors with input file /vlbi/corr/aov070/aov070_1.input

Executing: mpirun -np 4 --hostfile /vlbi/corr/aov070/aov070_1.machines
--mca mpi_yield_when_idle 1 --mca rmaps seq runmpifxcorr.DiFX-2.6.2
/vlbi/corr/aov070/aov070_1.input
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process that
caused that situation.
--------------------------------------------------------------------------"

To further investigate the MPI functionality, I wrote a Python program
“mpi_hello_world.py” as follows:

from mpi4py import MPI

comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

print("Hello from rank", rank, "of", size)

When I executed the command "mpiexec -n 4 python mpi_hello_world.py," the
output was as follows:

('Hello from rank', 0, 'of', 1)
('Hello from rank', 0, 'of', 1)
('Hello from rank', 0, 'of', 1)
('Hello from rank', 0, 'of', 1)

Additionally, I attempted to test the MPI functionality using the
"mpispeed" command with the following execution command: "mpirun -H
wude,wude mpispeed 1000 10s 1".  “wude” is my hostname. However, I
encountered the following error message:

"Processor = wude
Rank = 0/1
Sorry, must run with an even number of processes
This program should be invoked in a manner similar to:
mpirun -H host1,host2,...,hostN mpispeed [<numSends>|<timeSend>s]
[<sendSizeMByte>]
    where
        numSends: number of blocks to send (e.g., 256), or
        timeSend: duration in seconds to send (e.g., 100s)
--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process that
caused that situation.
--------------------------------------------------------------------------"

I am uncertain about the source of these issues and would greatly
appreciate your guidance in resolving them. If you have any insights or
suggestions regarding the aforementioned errors and how I can rectify them,
please let me know.

Thank you for your time and assistance.

Best regards,

De Wu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/difx-users/attachments/20230524/5f554e77/attachment.html>


More information about the Difx-users mailing list