[daip] NaN issues with Arch Linux 4.12.12-1-ARCH kkernel 4.12.12) and Debian 9

Eric Greisen egreisen at nrao.edu
Thu Oct 12 10:41:17 EDT 2017


On 10/11/2017 02:47 AM, Jay Blanchard wrote:
> Hi Eric, all,
> We've observed similar issues to your front page warning about Fedora 25 
> , 26 and Ubuntu 17.04 with task 'IMAGR' running in Arch Linux (debian 
> based) and Debian 9.
> 
> Three computers have shown this issue with exactly the same symptoms, 
> all running kernels 4.12.12 or 4.12.13
> 
> Along with the NaN we also get the following feedback in the aips window 
> (not msgserv):
> 
> ZMSFI2: REQUEST FOR BYTES 1 THRU = 0
> ZMSFI2: BEYOND EOF = 0
>   ZMSGDK: OPER = READ LUN = 12 NREC =        1
>   ZMSGER: SYSTEM I/O ERROR CODE =         -1
> MSGWRT ERROR      3 AT OPEN
> IMAGR3: ZFI2: DELAY 1
> ZMSFI2: REQUEST FOR BYTES 1 THRU = 0
> ZMSFI2: BEYOND EOF = 0
>   ZMSGDK: OPER = READ LUN = 12 NREC =        1
>   ZMSGER: SYSTEM I/O ERROR CODE =         -1
> MSGWRT ERROR      3 AT OPEN
> IMAGR3: ZABORS: signal 11 received
> ZMSFI2: REQUEST FOR BYTES 1 THRU = 0
> ZMSFI2: BEYOND EOF = 0
>   ZMSGDK: OPER = READ LUN = 12 NREC =        1
>   ZMSGER: SYSTEM I/O ERROR CODE =         -1
> MSGWRT ERROR      3 AT OPEN
> IMAGR3: ABORT!
> AIPS 3: TASK QUIT WITHOUT RESUMING ME
> 
> In MSGSERV the feedback is:
> LOCALH> IMAGR3: GRDFLT: convolution function sampled every 1/100 of a cell
> LOCALH> IMAGR3: GRDMEM: Ave    8 Channels; 4.934740E+09 to 5.046240E+09 Hz
> LOCALH> IMAGR3: Field    1 Sum of gridding weights =  7.48882E+03
> LOCALH> IMAGR3: Field    1 Beam min =    NaN      Jy, max =    NaN      Jy
> LOCALH> IMAGR3: FITBM: SOLUTION FOR RESTORING BEAM FAILED
> LOCALH> IMAGR3: Fit Gaussian FWHM =    2.719 x    2.719 Milliarcsec, 
> PA=   45.0
> LOCALH> IMAGR3: CLBHIS: minimum component 0.500 of current peak
> LOCALH> IMAGR3: Field    1 min = -448.7 MilliJy,max =    1.5      Jy
> LOCALH> IMAGR3: Loading field    1 from -4.49E-01 to  1.52E+00 interp by 4
> 
> I compiled AIPS from source using gcc and gfortran and no longer see 
> these issues.
> However I now see the following message occasionally:
> 
> "Note: The following floating-point exceptions are signalling: 
> IEEE_UNDERFLOW_FLAG IEEE_DENORMAL"
> 
> This may or may not be related to the above problem. It could also be me 
> missing a compiler flag/option.
> It doesn't seem to be affecting anything noticably so far...
> 
> I apologise if this is the wrong place for this kind of feedback but 
> wasn't sure where else to put it.
> Regards!
> Jay

No - you have sent this message where it should go.

I am hoping that a more modern Intel compiler will solve the NaN issue 
when I can get the IT folks to install one for me (they are feet 
dragging).  I have no idea what it could be - running the same job twice 
produces exactly the same NaNs but each job with different data produces 
different NaNs.  If the more modern compiler does not work then I too 
will switch to gcc/gfortran 6.3 or so for the binary installations.

The IEEE messages are normal - I don't really understand the DENORMAL 
one but underflow just means that some computation came out 10**(-60 or 
so) and was changed to 0.0 which is of course what one wants.

The MSGWRT errors indicate that the message file for the user has been 
damaged and is 0 bytes long.  It is in the first data area with name 
MSDuuu000.uuu\; where uuu is the user number in extended hex (base 36).

Eric Greisen



More information about the Daip mailing list