[daip] ZSTRTA.EXE

Eric Greisen egreisen at cv3.cv.nrao.edu
Wed Mar 6 18:44:08 EST 2002


Carlos De Breuck writes:

 > 8 months ago, I installed AIPS on our local cluster of DEC ALPHA machines.
 > It all worked fine untill recently.
 > It starts up the TEK and MSGservers, and the TPMON deamons, and then
 > hangs on executing $AIPSROOT/31DEC00/ALPHA/LOAD/ZSTRTA.EXE
 > 
 > This problem also occurs when I start up AIPS with the notv option.
 > It happens before I can even enter in a user number.
 > There are no error messages, so I have no indication on where problem
 > occurs. It just doesn't get past the ZSTRTA.EXE
 > 
 > The strange thing is that I haven't changed anything on the installation
 > here (but I'm only the AIPS manager, not the system manager). Could you
 > give me some hints on where to start looking to solve this problem?
 > 

We have had similar reports occasionally.  The most recent one seems
to have stopped e-mailing me before telling me what if anything they
found.

In their case, it appeared that some user had decide to "clean up" and
stopped a variety of processes that appeared dead.  Perhaps he
mistyped the process number or they were not really dead, but the
computer came to a jarring stop.  They got it to go back to a login
screen eventually by killing more processes from a remote machine.
But when they logged in, they had what you describe.

My theory is that some one or more of the files in $NET0/<host-name>
are thought by the operating system to be locked by another process,
quite possibly over nfs (network file system).

$AIPGUNIX/ZSTRTA.FOR starts by calling ZDCHIN which opens a disk file
in that area called  SPD000000; 

That is the most likely file to have this phantom lock.

But I made suggestions like
       mv SPD000000\; SPD.old
       cp SPD.old SPD000000\; 
which should move the lock to SPD.old and leave SPD000000; unlocked
and they said that did not work.

They mv'd the entire directory and recreated a new one with FILAIP and
claimed that that did not work.

Of course, they rebooted the computer before all this. I asked about
other computers that might have been accessing these files.

Of late I have not heard from them and I was not entirely certain that
they had followed the instructions literally.

You can put print statements into 

     $AIPGUNIX/ZSTRTA.FOR

(not aips message calls which need ZDCHIN to have worked) ahead of and
after the call to ZDCHIN and rebuild ZSTRTA with

     COMLNK $AIPGUNIX/ZSTRTA

You could also try changing the call to ZDCHIN to

      CALL ZDCHIN (.FALSE.)

(the 2nd argument is obsolete) which will skip the disk read.  You
probably do not want to do this long term but to debug what is going
on...

Let me know what happens,

Eric Greisen




More information about the Daip mailing list