<div dir="ltr">Since this failure is during the startup of "mpirun" it shouldn't be a bug in difx -- "opal" is a part of OpenMPI. I'd recommend updating your OpenMPI version, perhaps the bug is already fixed.<div><br></div><div>The dependence on the environment variables is something that I've seen before -- the exact size of the text of environment variables moves the rest of the code around in memory. "ssh -X" and "ssh -Y" have different environment variables.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Mar 9, 2022 at 5:23 AM Eskil Varenius via Difx-users <<a href="mailto:difx-users@listmgr.nrao.edu">difx-users@listmgr.nrao.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Dear DiFX users,<br>
I wanted to share an intriguing segfault-error which has kept me puzzled <br>
for some time. Just in case someone else runs into the same, or maybe <br>
knows the reason. Strictly speaking it's (very likely) not an <br>
difx-issue, but somehow related to the way I run difx.<br>
<br>
Problem: I try to correlate some r1-data using difx 2.5.4 or 2.6.3 (same <br>
behaviour with both; I did not test older versions). I connect from my <br>
laptop (OS X 12.0.1) to my server (Linux Mint 19.3) using "ssh -Y <br>
user@server" and then run "startdifx -n -f -v r11026_01.input". <br>
Everything runs fine, except that the last rows on screen are<br>
<br>
[...]<br>
start frame = 0<br>
end second = 61220<br>
end frame = 5048<br>
first frame offset = 0 bytes<br>
[gyller:30568] *** Process received signal ***<br>
[gyller:30568] Signal: Segmentation fault (11)<br>
[gyller:30568] Signal code: Address not mapped (1)<br>
[gyller:30568] Failing at address: 0xceec27309<br>
[gyller:30568] [ 0] <br>
/lib/x86_64-linux-gnu/libc.so.6(+0x3f040)[0x7fd2b0f23040]<br>
[gyller:30568] [ 1] <br>
/usr/local/openmpi_4.1.1_gcc/lib/libopen-pal.so.40(opal_hwloc201_hwloc_bitmap_free+0x9)[0x7fd2b136e2c9]<br>
[gyller:30568] [ 2] <br>
/usr/local/openmpi_4.1.1_gcc/lib/libopen-pal.so.40(+0x8f70b)[0x7fd2b136470b]<br>
[gyller:30568] [ 3] <br>
/usr/local/openmpi_4.1.1_gcc/lib/libopen-pal.so.40(opal_hwloc_base_free_topology+0x79)[0x7fd2b13671b9]<br>
[gyller:30568] [ 4] <br>
/usr/local/openmpi_4.1.1_gcc/lib/libopen-pal.so.40(+0x8f5a0)[0x7fd2b13645a0]<br>
[gyller:30568] [ 5] <br>
/usr/local/openmpi_4.1.1_gcc/lib/libopen-pal.so.40(mca_base_framework_close+0x67)[0x7fd2b1339567]<br>
[gyller:30568] [ 6] <br>
/usr/local/openmpi_4.1.1_gcc/lib/libopen-pal.so.40(opal_finalize+0x83)[0x7fd2b130c113]<br>
[gyller:30568] [ 7] mpirun(+0xfbd)[0x561bed3d0fbd]<br>
[gyller:30568] [ 8] <br>
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fd2b0f05bf7]<br>
[gyller:30568] [ 9] mpirun(+0xd8a)[0x561bed3d0d8a]<br>
[gyller:30568] *** End of error message ***<br>
Segmentation fault (core dumped)<br>
Elapsed time (s) = 12.6550719738<br>
<br>
The segfault got me nervous. Investigating environment settings, Simon <br>
Casey and I found that the parameter "LC_CTYPE" was not set to anything. <br>
Setting this as export LC_CTYPE="UTF-8" before running "startdifx" makes <br>
the problem go away.<br>
<br>
Another way to make the problem go away is to use "ssh" or "ssh -X" <br>
instead of "ssh -Y" to connect to my server. With this, there are no <br>
segfault errors - even without setting the "LC_CTYPE". However, I need <br>
the "-Y flag" to get X-forwarding working for my current OS X setup. <br>
Technically, I of course don't need that for running DiFX (which makes <br>
it more puzzling that it has an impact), but for e.g. fourfit and <br>
similar later. So it's easy to work around this problem.<br>
<br>
Not sure what to make of this, but the error (if using ssh -Y and not <br>
setting LC_CTYPE) appears benign as far as the geodetic results go. <br>
Maybe this can save someone from doing the same investigation, if <br>
someone is nervous about the segfault :).<br>
<br>
Kind regards<br>
Eskil and Simon in Onsala<br>
<br>
_______________________________________________<br>
Difx-users mailing list<br>
<a href="mailto:Difx-users@listmgr.nrao.edu" target="_blank">Difx-users@listmgr.nrao.edu</a><br>
<a href="https://listmgr.nrao.edu/mailman/listinfo/difx-users" rel="noreferrer" target="_blank">https://listmgr.nrao.edu/mailman/listinfo/difx-users</a><br>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr">Greg Lindahl<br>Software Architect, Event Horizon Telescope<br>Smithsonian Astrophysical Observatory<br>60 Garden Street | MS 66 | Cambridge, MA 02138<br></div></div>