[Difx-users] Error in running the startdifx command with DiFX software {External} {External} {External} {External}

深空探测 wude7826580 at gmail.com
Mon Jul 10 02:39:49 EDT 2023


Hi Adam,

I believe I now have a better understanding, and as long as these remaining
differences fall within an acceptable range, it has been proven that my
process is effective.

I genuinely appreciate your support and guidance throughout this matter.
Thank you once again for your assistance.

Best regards,

De Wu

Adam Deller <adeller at astro.swin.edu.au> 于2023年7月10日周一 10:36写道:

> Hi De Wu,
>
> That's still a little bit odd to me, since the header should not be
> differing at all if the same .im file is used.  (Any difference in the
> header would normally be due to different uvw entries resulting from
> different CALC results). Maybe that is spurious though.  I think the
> remaining differences seen in the visibility contents are probably due to
> an amplitude scaling effect that (once the visibilities are normalised by
> the autocorrelations) would disappear. The reference correlation is so old
> now, I think it has a couple of now-corrected errors in it that affected
> the amplitude scaling at the sub-percent level.
>
> Cheers,
> Adam
>
>
>
> On Mon, 10 Jul 2023 at 10:44, 深空探测 via Difx-users <
> difx-users at listmgr.nrao.edu> wrote:
>
>> Hi Adam,
>>
>> I made in the DiFX program, specifically regarding the replacement of the
>> "example_1.im" file with the "reference_1.im" file.
>>
>> After performing the file replacement and rerunning the "diffDiFX.py"
>> program, I observed results consistent with your previous advice. The
>> program displayed the following outcome:
>>
>> - At the end, 1032 records showed disagreement in the header.
>> - After processing 1848 records, the mean percentage absolute difference
>> was calculated as 0.05823689, and the mean percentage mean difference was
>> determined as -0.01528561 + 0.00029173i.
>>
>> These results indicate that the differences between the two files have
>> indeed become very small. I sincerely appreciate your assistance and
>> guidance in this matter.
>>
>> Best regards,
>>
>> De Wu
>>
>> Adam Deller <adeller at astro.swin.edu.au> 于2023年7月8日周六 15:40写道:
>>
>>> Hi De Wu,
>>>
>>> For the differences to be so gross with the rdv70 dataset, I would guess
>>> that the discrepancy would be caused by a difference in the model being
>>> used.  difxcalc will generate a slightly different model compared to the
>>> older calcif2.  If you want to compare apples to apples, you can copy the
>>> reference .im file to have the name wude_1.im and then run the
>>> correlation again (ensuring that the im file is not being regenerated again
>>> - you can add "--dont-calc" to the startdifx invocation to be sure) you
>>> should hopefully see the differences disappear.
>>>
>>> Because the S/N on the individual visibility points is very low, even a
>>> small change in the model leads to a large difference in the result.
>>>
>>> Cheers,
>>> Adam
>>>
>>> On Fri, 7 Jul 2023 at 12:52, 深空探测 <wude7826580 at gmail.com> wrote:
>>>
>>>> Hi Adam,
>>>>
>>>> I wanted to provide you with an update regarding my previous confusion
>>>> regarding the "--mca" option. I have finally gained a clear understanding,
>>>> and I can confirm that the "startdifx" command executed successfully
>>>> without using the "--mca" option.  I sincerely apologize for not thoroughly
>>>> comprehending the implications of the "--mca" option, which caused the
>>>> recurring issues with mpirun.
>>>>
>>>> For testing purposes, I utilized the rdv70 dataset and followed the
>>>> instructions outlined in the README file. Specifically, I used the command
>>>> "diffDiFX.py reference_1.difx/DIFX_54656_074996.s0000.b0000
>>>> example_1.difx/DIFX_54656_074996.s0000.b0000 -i example_1.input" to compare
>>>> my own computed results (example data) with the reference data. The final
>>>> two lines of the displayed results were as follows:
>>>>
>>>> "At the end, 1320 records disagreed on the header.
>>>> After 1848 records, the mean percentage absolute difference is
>>>> 67.67053685, and the mean difference is 2.27732242 + 5.89398558 i."
>>>>
>>>> Although I did not encounter any issues during the data processing, it
>>>> appears that there are substantial differences in the comparison results. I
>>>> am uncertain about the specific step that might have caused this problem.
>>>>
>>>> Furthermore, when I generated the directory for the 1234 files using
>>>> the "difx2mark4 -e 1234 example_1.difx" command, I encountered an issue
>>>> when executing the command "fourfit -pt -c ../1234 191-2050" within the
>>>> 1234 directory. The resulting error messages were as follows:
>>>>
>>>> "fourfit: Invalid $block statement '$STATION A B BR-VLBA AXEL 2.0000
>>>> 90.0 ......
>>>> fourfit: Failure in locate_blocks()
>>>> fourfit: Low-level parse of
>>>> '/home/wude/difx/test_data/rdv70/1234/191-2050//4C39_25.2SN1CT' failed
>>>> fourfit: The above errors occurred while processing
>>>> fourfit: 191-2050//4C39_25.2SN1CT
>>>> fourfit: the top-level resolution is as follows: Error reading root for
>>>> file 191-2050/, skipping."
>>>>
>>>> However, when I conducted a test using the tc016a.pulsar dataset and
>>>> ran the command "fourfit -pt -c ../1234 No0040," I successfully obtained
>>>> the interference fringe image.
>>>>
>>>> Thank you for your time and support.
>>>>
>>>> Best regards,
>>>>
>>>> De Wu
>>>>
>>>> Adam Deller <adeller at astro.swin.edu.au> 于2023年7月6日周四 12:19写道:
>>>>
>>>>> Sorry, I just saw that you had done this (and reported in your second
>>>>> email):
>>>>>
>>>>> *Subsequently, I proceeded to run the command "mpirun -np 8
>>>>> -machinefile wude_1.machines mpifxcorr wude_1.input," and I was able to
>>>>> obtain the ".difx" files successfully.*
>>>>>
>>>>> So if you edit the startdifx file and find where mpirun is being
>>>>> invoked, and remove those --mca options, you should be fine.
>>>>>
>>>>> Cheers,
>>>>> Adam
>>>>>
>>>>> On Thu, 6 Jul 2023 at 14:16, Adam Deller <adeller at astro.swin.edu.au>
>>>>> wrote:
>>>>>
>>>>>> Hi Wu,
>>>>>>
>>>>>> calcif2 is the delay-generating program that requires the calcserver
>>>>>> to be running (which wasn't the case for you). Setting
>>>>>> DIFX_CALC_PROGRAM=difxcalc determines which program which will be called by
>>>>>> startdifx.   But you were trying to run calcif2 itself from the command
>>>>>> line, so naturally this won't work.  If you run difxcalc wude_1.calc, it
>>>>>> should work.  And as you saw, if you run startdifx after setting
>>>>>> DIFX_CALC_PROGRAM=difxcalc , that also works fine.
>>>>>>
>>>>>> Once you have run difxcalc (or calcif2) the .im file will be
>>>>>> generated. If you try to run difxcalc/calcif2 again once the .im file has
>>>>>> been generated, it won't run unless you force it (since it sees that the
>>>>>> .im file has been generated, so no need to re-generate it).
>>>>>>
>>>>>> So your remaining problem now is that MPI seems to think that you
>>>>>> don't have any available CPUs on your host.  Once again (I think this is
>>>>>> the third time I'm making this suggestion): please try running the mpirun
>>>>>> command *without* the --mca options.  I.e.,
>>>>>>
>>>>>> mpirun -np 4 --hostfile wude_1.machines runmpifxcorr.DiFX-2.6.2
>>>>>> wude_1.input
>>>>>>
>>>>>> You may also have success by adding --oversubscribe to the mpirun
>>>>>> command (although that is more of a band-aid getting around the fact that
>>>>>> it seems that openmpi isn't seeing how many CPUs are available).
>>>>>>
>>>>>> If you can figure out what mpirun option is causing the problem, you
>>>>>> will then be able to modify startdifx to remove the offending option for
>>>>>> you always.
>>>>>>
>>>>>> Cheers,
>>>>>> Adam
>>>>>>
>>>>>> On Tue, 4 Jul 2023 at 17:30, 深空探测 <wude7826580 at gmail.com> wrote:
>>>>>>
>>>>>>> Subject: Issue with DiFX Testing - RPC Errors and CPU Allocation
>>>>>>>
>>>>>>> Hi Adam,
>>>>>>>
>>>>>>> I apologize for the delay in getting back to you. I've been
>>>>>>> conducting tests with DiFX lately, and I encountered a few issues that I
>>>>>>> would appreciate your insight on.
>>>>>>>
>>>>>>> Initially, I faced problems running the `mpirun` command, but I
>>>>>>> managed to resolve them by reinstalling DiFX on a new CentOS7 system.
>>>>>>> Previously, I had installed `openmpi-1.6.5` in the `/usr/local` directory,
>>>>>>> but this time, I used the command `sudo yum install openmpi-devel` to
>>>>>>> install `openmpi`, and then I installed DiFX in the
>>>>>>> `/home/wude/difx/DIFXROOT` directory. Following this setup, the `mpirun`
>>>>>>> command started working correctly. I suspect that the previous installation
>>>>>>> in the system directory might have been causing the issues with `mpirun`.
>>>>>>>
>>>>>>> However, I encountered a new problem when running the command
>>>>>>> `calcif2 wude_1.calc`. The output displayed the following error:
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------------------------------
>>>>>>> calcif2 processing file 1/1 = wude_1
>>>>>>> localhost: RPC: Program not registered
>>>>>>> Error: calcif2: RPC clnt_create fails for host: localhost
>>>>>>> Error: Cannot initialize CalcParams
>>>>>>>
>>>>>>> ----------------------------------------------------------------------------------------
>>>>>>>
>>>>>>> Previously, I resolved a similar error by running the command:
>>>>>>> `export DIFX_CALC_PROGRAM=difxcalc`. However, when I tried the same
>>>>>>> solution this time, it didn't resolve the issue.
>>>>>>>
>>>>>>> Additionally, when running the command: `mpirun -np 4 --hostfile
>>>>>>> wude_1.machines --mca mpi_yield_when_idle 1 --mca rmaps seq
>>>>>>> runmpifxcorr.DiFX-2.6.2 wude_1.input`, the output displayed the following
>>>>>>> message:
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------------------------------------------------
>>>>>>> While computing bindings, we found no available CPUs on the
>>>>>>> following node:
>>>>>>>     Node: wude
>>>>>>> Please check your allocation.
>>>>>>>
>>>>>>> ---------------------------------------------------------------------------------------------------------------
>>>>>>>
>>>>>>> My hostname is "wude", and it seems like there are no available
>>>>>>> CPUs, but I can't determine the cause of this issue. Hence, I am reaching
>>>>>>> out to seek your guidance on this matter.
>>>>>>>
>>>>>>> Thank you for your time and support.
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> De Wu
>>>>>>>
>>>>>>> Adam Deller <adeller at astro.swin.edu.au> 于2023年6月26日周一 07:36写道:
>>>>>>>
>>>>>>>> Have you tried removing the --mca options from the command? E.g.,
>>>>>>>>
>>>>>>>> mpirun -np 4 --hostfile /vlbi/aov070/aov070_1.machines
>>>>>>>> runmpifxcorr.DiFX-2.6.2 /vlbi/aov070/aov070_1.input
>>>>>>>>
>>>>>>>> I have a suspicion that either the seq or rmaps option is not
>>>>>>>> playing nice, but it is easiest to just remove all the options and see if
>>>>>>>> that makes any difference.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Adam
>>>>>>>>
>>>>>>>> On Mon, 26 Jun 2023 at 01:58, 深空探测 <wude7826580 at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Adam,
>>>>>>>>>
>>>>>>>>> As you suggested, I removed the "| head" from the command, and I
>>>>>>>>> was able to run it successfully.
>>>>>>>>>
>>>>>>>>> However, when executing the following command: "mpirun -np 4
>>>>>>>>> --hostfile /vlbi/aov070/aov070_1.machines --mca mpi_yield_when_idle 1 --mca
>>>>>>>>> rmaps seq runmpifxcorr.DiFX-2.6.2 /vlbi/aov070/aov070_1.input". The output
>>>>>>>>> displayed the following message:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>> mpirun noticed that the job aborted, but has no info as to the
>>>>>>>>> process
>>>>>>>>> that caused that situation.
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> Additionally, when running the command "mpirun -np 4 -H
>>>>>>>>> localhost,localhost,localhost,localhost --mca mpi_yield_when_idle 1 --mca
>>>>>>>>> rmaps seq runmpifxcorr.DiFX-2.6.2 /vlbi/aov070/aov070_1.input," and it
>>>>>>>>> resulted in the following error message:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>> There are no nodes allocated to this job.
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> It is quite puzzling that even when specifying only one localhost
>>>>>>>>> in the command, I still receive this output. I have been considering the
>>>>>>>>> possibility that this issue might be due to limitations in system
>>>>>>>>> resources, node access permissions, or node configuration within the
>>>>>>>>> CentOS7 virtual machine environment.
>>>>>>>>>
>>>>>>>>> Thank you for your attention to this matter.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>>
>>>>>>>>> De Wu
>>>>>>>>>
>>>>>>>>> Adam Deller <adeller at astro.swin.edu.au> 于2023年6月22日周四 15:53写道:
>>>>>>>>>
>>>>>>>>>> Hi De Wu,
>>>>>>>>>>
>>>>>>>>>> The "SIGPIPE detected on fd 13 - aborting" errors when running
>>>>>>>>>> mpispeed are related to piping the output to head.  Remove the "| head" and
>>>>>>>>>> you should see it run normally.
>>>>>>>>>>
>>>>>>>>>> For running mpifxcorr, the obvious difference between your
>>>>>>>>>> invocation of mpispeed and mpifxcorr is the use of the various mca
>>>>>>>>>> options.  What happens if you add " --mca mpi_yield_when_idle 1 --mca rmaps
>>>>>>>>>> seq" to your mpispeed launch (before or after the -H localhost,localhost)?
>>>>>>>>>> If it doesn't work, then probably one or the other of those options is the
>>>>>>>>>> problem, and you need to change startdifx to get rid of the offending
>>>>>>>>>> option when running mpirun.
>>>>>>>>>>
>>>>>>>>>> If running mpispeed still works when with those options, what
>>>>>>>>>> about the following:
>>>>>>>>>> 1. manually run mpirun -np 4 --hostfile
>>>>>>>>>> /vlbi/aov070/aov070_1.machines --mca mpi_yield_when_idle 1 --mca rmaps seq
>>>>>>>>>>  runmpifxcorr.DiFX-2.6.2 /vlbi/aov070/aov070_1.input, see what output comes
>>>>>>>>>> out
>>>>>>>>>> 2. manually run mpirun -np 4 -H
>>>>>>>>>> localhost,localhost,localhost,localhost --mca mpi_yield_when_idle 1 --mca
>>>>>>>>>> rmaps seq  runmpifxcorr.DiFX-2.6.2 /vlbi/aov070/aov070_1.input, see what
>>>>>>>>>> output comes out
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>> Adam
>>>>>>>>>>
>>>>>>>>>> On Mon, 19 Jun 2023 at 18:02, 深空探测 via Difx-users <
>>>>>>>>>> difx-users at listmgr.nrao.edu> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I recently reinstalled OpenMPI-1.6.5 and successfully ran the
>>>>>>>>>>> example program provided within the OpenMPI package. By executing the
>>>>>>>>>>> command "mpiexec -n 6 ./hello_c," I obtained the following output:
>>>>>>>>>>>
>>>>>>>>>>> ```
>>>>>>>>>>> wude at wude DiFX-2.6.2 examples> mpiexec -n 6 ./hello_c
>>>>>>>>>>> Hello, world, I am 4 of 6
>>>>>>>>>>> Hello, world, I am 2 of 6
>>>>>>>>>>> Hello, world, I am 0 of 6
>>>>>>>>>>> Hello, world, I am 1 of 6
>>>>>>>>>>> Hello, world, I am 3 of 6
>>>>>>>>>>> Hello, world, I am 5 of 6
>>>>>>>>>>> ```
>>>>>>>>>>>
>>>>>>>>>>> The program executed without any issues, displaying the expected
>>>>>>>>>>> output. Each line represents a separate process, showing the process number
>>>>>>>>>>> and the total number of processes involved.
>>>>>>>>>>>
>>>>>>>>>>> However, I encountered some difficulties when running the
>>>>>>>>>>> command "mpirun -H localhost,localhost mpispeed 1000 10s 1 | head."
>>>>>>>>>>> Although both nodes seem to run properly, there appear to be some errors in
>>>>>>>>>>> the output. Below is the output I received, with "wude" being my username:
>>>>>>>>>>>
>>>>>>>>>>> ```
>>>>>>>>>>> wude at wude DiFX-2.6.2 ~> mpirun -H localhost,localhost mpispeed
>>>>>>>>>>> 1000 10s 1 | head
>>>>>>>>>>> Processor = wude
>>>>>>>>>>> Rank = 0/2
>>>>>>>>>>> [0] Starting
>>>>>>>>>>> Processor = wude
>>>>>>>>>>> Rank = 1/2
>>>>>>>>>>> [1] Starting
>>>>>>>>>>> [1] Recvd 0 -> 0 : 2740.66 Mbps curr : 2740.66 Mbps mean
>>>>>>>>>>> [1] Recvd 1 -> 0 : 60830.52 Mbps curr : 5245.02 Mbps mean
>>>>>>>>>>> [1] Recvd 2 -> 0 : 69260.57 Mbps curr : 7580.50 Mbps mean
>>>>>>>>>>> [1] Recvd 3 -> 0 : 68545.44 Mbps curr : 9747.65 Mbps mean
>>>>>>>>>>> [wude:05649] mpirun: SIGPIPE detected on fd 13 - aborting
>>>>>>>>>>> mpirun: killing job...
>>>>>>>>>>>
>>>>>>>>>>> [wude:05649] mpirun: SIGPIPE detected on fd 13 - aborting
>>>>>>>>>>> mpirun: killing job...
>>>>>>>>>>> ```
>>>>>>>>>>>
>>>>>>>>>>> I'm unsure whether you experience the same "mpirun: SIGPIPE
>>>>>>>>>>> detected on fd 13 - aborting mpirun: killing job..." message when running
>>>>>>>>>>> this command on your computer.
>>>>>>>>>>>
>>>>>>>>>>> Furthermore, when I ran the command "startdifx -v -f -n
>>>>>>>>>>> aov070.joblist," the .difx file was not generated. Could you please provide
>>>>>>>>>>> some guidance or suggestions to help me troubleshoot this issue?
>>>>>>>>>>>
>>>>>>>>>>> Here is the output I received when running the command:
>>>>>>>>>>>
>>>>>>>>>>> ```
>>>>>>>>>>> wude at wude DiFX-2.6.2 aov070> startdifx -v -f -n aov070.joblist
>>>>>>>>>>> No errors with input file /vlbi/aov070/aov070_1.input
>>>>>>>>>>>
>>>>>>>>>>> Executing:  mpirun -np 4 --hostfile
>>>>>>>>>>> /vlbi/aov070/aov070_1.machines --mca mpi_yield_when_idle 1 --mca rmaps seq
>>>>>>>>>>>  runmpifxcorr.DiFX-2.6.2 /vlbi/aov070/aov070_1.input
>>>>>>>>>>>
>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>> mpirun noticed that the job aborted, but has no info as to the
>>>>>>>>>>> process
>>>>>>>>>>> that caused that situation.
>>>>>>>>>>>
>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>> Elapsed time (s) = 82.2610619068
>>>>>>>>>>> ```
>>>>>>>>>>> Best regards,
>>>>>>>>>>>
>>>>>>>>>>> De Wu
>>>>>>>>>>>
>>>>>>>>>>> Adam Deller <adeller at astro.swin.edu.au> 于2023年5月25日周四 08:42写道:
>>>>>>>>>>>
>>>>>>>>>>>> Hi De Wu,
>>>>>>>>>>>>
>>>>>>>>>>>> If I run
>>>>>>>>>>>>
>>>>>>>>>>>> mpirun -H localhost,localhost mpispeed 1000 10s 1
>>>>>>>>>>>>
>>>>>>>>>>>> it runs correctly as follows:
>>>>>>>>>>>>
>>>>>>>>>>>> adeller at ar313-adeller trunk Downloads> mpirun -H
>>>>>>>>>>>> localhost,localhost mpispeed 1000 10s 1 | head
>>>>>>>>>>>> Processor = <my host name>
>>>>>>>>>>>> Rank = 0/2
>>>>>>>>>>>> [0] Starting
>>>>>>>>>>>> Processor =<my host name>
>>>>>>>>>>>> Rank = 1/2
>>>>>>>>>>>> [1] Starting
>>>>>>>>>>>>
>>>>>>>>>>>> It seems like in your case, MPI is looking at the two identical
>>>>>>>>>>>> host names you've given and is deciding to only start one process, rather
>>>>>>>>>>>> than two. What if you run
>>>>>>>>>>>>
>>>>>>>>>>>> mpirun -n 2 -H wude,wude mpispeed 1000 10s 1
>>>>>>>>>>>>
>>>>>>>>>>>> ?
>>>>>>>>>>>>
>>>>>>>>>>>> I think the issue is with your MPI installation / the
>>>>>>>>>>>> parameters being passed to mpirun. Unfortunately as I've mentioned
>>>>>>>>>>>> previously the behaviour of MPI with default parameters seems to change
>>>>>>>>>>>> from implementation to implementation and version to version - you just
>>>>>>>>>>>> need to track down what is needed to make sure it actually runs the number
>>>>>>>>>>>> of processes you want on the nodes you want!
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Adam
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, 24 May 2023 at 18:30, 深空探测 via Difx-users <
>>>>>>>>>>>> difx-users at listmgr.nrao.edu> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi  All,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am writing to seek assistance regarding an issue I
>>>>>>>>>>>>> encountered while working with MPI on a CentOS 7 virtual machine.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have successfully installed openmpi-1.6.5 on the CentOS 7
>>>>>>>>>>>>> virtual machine. However, when I attempted to execute the command
>>>>>>>>>>>>> "startdifx -f -n -v aov070.joblist," I received the following error message:
>>>>>>>>>>>>>
>>>>>>>>>>>>> "Environment variable DIFX_CALC_PROGRAM was set, so
>>>>>>>>>>>>> Using specified calc program: difxcalc
>>>>>>>>>>>>>
>>>>>>>>>>>>> No errors with input file /vlbi/corr/aov070/aov070_1.input
>>>>>>>>>>>>>
>>>>>>>>>>>>> Executing: mpirun -np 4 --hostfile
>>>>>>>>>>>>> /vlbi/corr/aov070/aov070_1.machines --mca mpi_yield_when_idle 1 --mca rmaps
>>>>>>>>>>>>> seq runmpifxcorr.DiFX-2.6.2 /vlbi/corr/aov070/aov070_1.input
>>>>>>>>>>>>>
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> mpirun noticed that the job aborted, but has no info as to the
>>>>>>>>>>>>> process that caused that situation.
>>>>>>>>>>>>>
>>>>>>>>>>>>> --------------------------------------------------------------------------"
>>>>>>>>>>>>>
>>>>>>>>>>>>> To further investigate the MPI functionality, I wrote a Python
>>>>>>>>>>>>> program “mpi_hello_world.py” as follows:
>>>>>>>>>>>>>
>>>>>>>>>>>>> from mpi4py import MPI
>>>>>>>>>>>>>
>>>>>>>>>>>>> comm = MPI.COMM_WORLD
>>>>>>>>>>>>> rank = comm.Get_rank()
>>>>>>>>>>>>> size = comm.Get_size()
>>>>>>>>>>>>>
>>>>>>>>>>>>> print("Hello from rank", rank, "of", size)
>>>>>>>>>>>>>
>>>>>>>>>>>>> When I executed the command "mpiexec -n 4 python
>>>>>>>>>>>>> mpi_hello_world.py," the output was as follows:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ('Hello from rank', 0, 'of', 1)
>>>>>>>>>>>>> ('Hello from rank', 0, 'of', 1)
>>>>>>>>>>>>> ('Hello from rank', 0, 'of', 1)
>>>>>>>>>>>>> ('Hello from rank', 0, 'of', 1)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Additionally, I attempted to test the MPI functionality using
>>>>>>>>>>>>> the "mpispeed" command with the following execution command: "mpirun -H
>>>>>>>>>>>>> wude,wude mpispeed 1000 10s 1".  “wude” is my hostname. However, I
>>>>>>>>>>>>> encountered the following error message:
>>>>>>>>>>>>>
>>>>>>>>>>>>> "Processor = wude
>>>>>>>>>>>>> Rank = 0/1
>>>>>>>>>>>>> Sorry, must run with an even number of processes
>>>>>>>>>>>>> This program should be invoked in a manner similar to:
>>>>>>>>>>>>> mpirun -H host1,host2,...,hostN mpispeed
>>>>>>>>>>>>> [<numSends>|<timeSend>s] [<sendSizeMByte>]
>>>>>>>>>>>>>     where
>>>>>>>>>>>>>         numSends: number of blocks to send (e.g., 256), or
>>>>>>>>>>>>>         timeSend: duration in seconds to send (e.g., 100s)
>>>>>>>>>>>>>
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>> mpirun noticed that the job aborted, but has no info as to the
>>>>>>>>>>>>> process that caused that situation.
>>>>>>>>>>>>>
>>>>>>>>>>>>> --------------------------------------------------------------------------"
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am uncertain about the source of these issues and would
>>>>>>>>>>>>> greatly appreciate your guidance in resolving them. If you have any
>>>>>>>>>>>>> insights or suggestions regarding the aforementioned errors and how I can
>>>>>>>>>>>>> rectify them, please let me know.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thank you for your time and assistance.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>
>>>>>>>>>>>>> De Wu
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Difx-users mailing list
>>>>>>>>>>>>> Difx-users at listmgr.nrao.edu
>>>>>>>>>>>>> https://listmgr.nrao.edu/mailman/listinfo/difx-users
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> !=============================================================!
>>>>>>>>>>>> Prof. Adam Deller
>>>>>>>>>>>> Centre for Astrophysics & Supercomputing
>>>>>>>>>>>> Swinburne University of Technology
>>>>>>>>>>>> John St, Hawthorn VIC 3122 Australia
>>>>>>>>>>>> phone: +61 3 9214 5307
>>>>>>>>>>>> fax: +61 3 9214 8797
>>>>>>>>>>>> !=============================================================!
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Difx-users mailing list
>>>>>>>>>>> Difx-users at listmgr.nrao.edu
>>>>>>>>>>> https://listmgr.nrao.edu/mailman/listinfo/difx-users
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> !=============================================================!
>>>>>>>>>> Prof. Adam Deller
>>>>>>>>>> Centre for Astrophysics & Supercomputing
>>>>>>>>>> Swinburne University of Technology
>>>>>>>>>> John St, Hawthorn VIC 3122 Australia
>>>>>>>>>> phone: +61 3 9214 5307
>>>>>>>>>> fax: +61 3 9214 8797
>>>>>>>>>> !=============================================================!
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> !=============================================================!
>>>>>>>> Prof. Adam Deller
>>>>>>>> Centre for Astrophysics & Supercomputing
>>>>>>>> Swinburne University of Technology
>>>>>>>> John St, Hawthorn VIC 3122 Australia
>>>>>>>> phone: +61 3 9214 5307
>>>>>>>> fax: +61 3 9214 8797
>>>>>>>> !=============================================================!
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> !=============================================================!
>>>>>> Prof. Adam Deller
>>>>>> Centre for Astrophysics & Supercomputing
>>>>>> Swinburne University of Technology
>>>>>> John St, Hawthorn VIC 3122 Australia
>>>>>> phone: +61 3 9214 5307
>>>>>> fax: +61 3 9214 8797
>>>>>> !=============================================================!
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> !=============================================================!
>>>>> Prof. Adam Deller
>>>>> Centre for Astrophysics & Supercomputing
>>>>> Swinburne University of Technology
>>>>> John St, Hawthorn VIC 3122 Australia
>>>>> phone: +61 3 9214 5307
>>>>> fax: +61 3 9214 8797
>>>>> !=============================================================!
>>>>>
>>>>
>>>
>>> --
>>> !=============================================================!
>>> Prof. Adam Deller
>>> Centre for Astrophysics & Supercomputing
>>> Swinburne University of Technology
>>> John St, Hawthorn VIC 3122 Australia
>>> phone: +61 3 9214 5307
>>> fax: +61 3 9214 8797
>>> !=============================================================!
>>>
>> _______________________________________________
>> Difx-users mailing list
>> Difx-users at listmgr.nrao.edu
>> https://listmgr.nrao.edu/mailman/listinfo/difx-users
>>
>
>
> --
> !=============================================================!
> Prof. Adam Deller
> Centre for Astrophysics & Supercomputing
> Swinburne University of Technology
> John St, Hawthorn VIC 3122 Australia
> phone: +61 3 9214 5307
> fax: +61 3 9214 8797
> !=============================================================!
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/difx-users/attachments/20230710/8552d9bd/attachment-0001.html>


More information about the Difx-users mailing list