[daip] NaN issues with Arch Linux 4.12.12-1-ARCH kkernel 4.12.12) and Debian 9
Eric Greisen
egreisen at nrao.edu
Thu Oct 12 10:41:17 EDT 2017
On 10/11/2017 02:47 AM, Jay Blanchard wrote:
> Hi Eric, all,
> We've observed similar issues to your front page warning about Fedora 25
> , 26 and Ubuntu 17.04 with task 'IMAGR' running in Arch Linux (debian
> based) and Debian 9.
>
> Three computers have shown this issue with exactly the same symptoms,
> all running kernels 4.12.12 or 4.12.13
>
> Along with the NaN we also get the following feedback in the aips window
> (not msgserv):
>
> ZMSFI2: REQUEST FOR BYTES 1 THRU = 0
> ZMSFI2: BEYOND EOF = 0
> ZMSGDK: OPER = READ LUN = 12 NREC = 1
> ZMSGER: SYSTEM I/O ERROR CODE = -1
> MSGWRT ERROR 3 AT OPEN
> IMAGR3: ZFI2: DELAY 1
> ZMSFI2: REQUEST FOR BYTES 1 THRU = 0
> ZMSFI2: BEYOND EOF = 0
> ZMSGDK: OPER = READ LUN = 12 NREC = 1
> ZMSGER: SYSTEM I/O ERROR CODE = -1
> MSGWRT ERROR 3 AT OPEN
> IMAGR3: ZABORS: signal 11 received
> ZMSFI2: REQUEST FOR BYTES 1 THRU = 0
> ZMSFI2: BEYOND EOF = 0
> ZMSGDK: OPER = READ LUN = 12 NREC = 1
> ZMSGER: SYSTEM I/O ERROR CODE = -1
> MSGWRT ERROR 3 AT OPEN
> IMAGR3: ABORT!
> AIPS 3: TASK QUIT WITHOUT RESUMING ME
>
> In MSGSERV the feedback is:
> LOCALH> IMAGR3: GRDFLT: convolution function sampled every 1/100 of a cell
> LOCALH> IMAGR3: GRDMEM: Ave 8 Channels; 4.934740E+09 to 5.046240E+09 Hz
> LOCALH> IMAGR3: Field 1 Sum of gridding weights = 7.48882E+03
> LOCALH> IMAGR3: Field 1 Beam min = NaN Jy, max = NaN Jy
> LOCALH> IMAGR3: FITBM: SOLUTION FOR RESTORING BEAM FAILED
> LOCALH> IMAGR3: Fit Gaussian FWHM = 2.719 x 2.719 Milliarcsec,
> PA= 45.0
> LOCALH> IMAGR3: CLBHIS: minimum component 0.500 of current peak
> LOCALH> IMAGR3: Field 1 min = -448.7 MilliJy,max = 1.5 Jy
> LOCALH> IMAGR3: Loading field 1 from -4.49E-01 to 1.52E+00 interp by 4
>
> I compiled AIPS from source using gcc and gfortran and no longer see
> these issues.
> However I now see the following message occasionally:
>
> "Note: The following floating-point exceptions are signalling:
> IEEE_UNDERFLOW_FLAG IEEE_DENORMAL"
>
> This may or may not be related to the above problem. It could also be me
> missing a compiler flag/option.
> It doesn't seem to be affecting anything noticably so far...
>
> I apologise if this is the wrong place for this kind of feedback but
> wasn't sure where else to put it.
> Regards!
> Jay
No - you have sent this message where it should go.
I am hoping that a more modern Intel compiler will solve the NaN issue
when I can get the IT folks to install one for me (they are feet
dragging). I have no idea what it could be - running the same job twice
produces exactly the same NaNs but each job with different data produces
different NaNs. If the more modern compiler does not work then I too
will switch to gcc/gfortran 6.3 or so for the binary installations.
The IEEE messages are normal - I don't really understand the DENORMAL
one but underflow just means that some computation came out 10**(-60 or
so) and was changed to 0.0 which is of course what one wants.
The MSGWRT errors indicate that the message file for the user has been
damaged and is 0 bytes long. It is in the first data area with name
MSDuuu000.uuu\; where uuu is the user number in extended hex (base 36).
Eric Greisen
More information about the Daip
mailing list