[daip] MNJ problem with SUL on NFS

Shami Chatterjee schatter at aoc.nrao.edu
Wed Oct 29 15:43:24 EST 2003


Hi,
This is most likely a problem on the Cornell end - is there an easy 
diagnosis for why NFS copying would keep going into hibernation?
(I'm pretty sure it has nothing to do with LIBR.DAT mentioned below...)

Specifically, our MNJ runs on Mingus (Linux) by CVS; this is fine. Then 
Ella (SUL) runs an MNJ off Mingus by NFS; this suddenly stopped working 
with the error message listed below.

Thanks,
Shami

---------- Forwarded message ----------
Date: Wed, 29 Oct 2003 11:07:06 -0500 (EST)
From: Kristine Spekkens <spekkens at astro.cornell.edu>

We are having some problems with AIPS on ella; in particular, the mnj is 
no longer running on ella at all. I checked the crontab, and the mnj 
command is still there:
AIPS% crontab -l
# Restore for 31Dec03 only
15 5 * * 0,1,3,5  /home/space1/aips/do_daily.ella


 The last mnj run on ella was on Oct. 20; everything seemed okay. We then 
got an email from Eric:

Date: Tue, 21 Oct 2003 08:59:17 -0600
From: Eric Greisen <egreisen at nrao.edu>
To: mnj at nrao.edu
Subject: [Mnj] Warning from last night

The MNJ last night suggests that you have extra work.  In fact, unless
you have a Mac, there is nothing to do.  Even in the Mac case, you
probably do not need to worry - the LIBR.DAT in $SYSLOCAL uses termcap
while that now in $SYSMAC uses ncurses.

 While we hadn't gotten an mnj warning the previous night, we did get 
something the next morning:

ate: Wed, 22 Oct 2003 03:15:18 -0400
From: AIPS Account <aips at astro.cornell.edu>
To: aips at mingus.astro.cornell.edu
Subject: 31DEC03 LINUX needs extra work

The UPDATE job has detected a LIBR.DAT change for
version 31DEC03 on mingus at Wed Oct 22 07:15:18 UTC 2003 (UT).
It MAY be that it applies to architecture LINUX;
(Sorry, this dumb script can't tell for sure)
if so, you  should copy /home/space1/aips//31DEC03/LINUX/SYSTEM/LIBR.DAT
to your /home/space1/aips//31DEC03/LINUX/SYSTEM/CORNELL area.


 I ignored it because of Eric's email the day before. It turns out that
after this, the mnj for mingus kept going, but one has not been done on
ella since. I tried to run one this morning on ella, but here is what I
got:
AIPS% ./do_daily.ella
AIPSUPD - RunTime is 20031029.150142
AIPSUPD - LogFile is UPD20031029.150142.LOG
AIPSUPD - ErrorFile is UPD20031029.150142.ERR
AIPSUPD - cd /home/space1/aips//31DEC03/SUL/UPDATE
AIPSUPD - NOTE: if NFS copying fails, will NOT use SECURE SHELL.
AIPSUPD - calling UPDCONTROL
UPDCONTROL - trying to get LASTGOOD.DAT
UPDCONTROL - sleeping 30 minutes...
UPDCONTROL - trying to get LASTGOOD.DAT
UPDCONTROL - sleeping 30 minutes...
^CAIPS%

... and it just sits there ad infinitum until I kill it.  I not sure what 
to do:

 - Could my failure to move LIBR.DAT have resulted in ella being cut off 
from the midnight job (which relies on mingus, right?)
 - should I move LIBR.DAT to
/home/space1/aips//31DEC03/LINUX/SYSTEM/CORNELL now, as suggested a week
ago? Is there anything else that I have to do to the AIPS installation on
mingus before trying to get the mnj on ella going again?

 Any suggestions/advice would be much appreciated!

Kristine















More information about the Daip mailing list