[daip] ZSTRTA.EXE

Carlos De Breuck debreuck at iap.fr
Thu Mar 7 11:30:37 EST 2002


Hi Eric,

Thanks for your suggestions. You seem to have narrowed down the problem,
but I haven't been able to solve it yet.
I tried un-locking the SPD000000; file as you suggested, without succes.
Next, I tried to rebuild the whole directory for one machine (called
giscours, but the problem is the same for all of them), using SYSETUP.
This gets me it bit further, but gives the following errors (using notv):

TVDEVS.SH: Starting TPMON daemons on GISCOURS asynchronously...
Starting up 31DEC00 AIPS with normal priority
ZDCHI1: ZERROR: ON FILE DA00:SPD000000;
ZDCHI1: ZERROR: IN ZDAOPN ERRNO = 13 (Permission denied)
 ZDCHIN: COULD NOT READ PARAMETER FILE
 ZDCHIN: (USING MINIMUM SYSTEM CONFIGURATION)
Begin the one true AIPS number 1 (release of 31DEC00) at priority =   0 
DADEVS.PL: This program is untested under Perl version 5.006
AIPS 1: ZERROR: ON FILE DA00:SPD000000;
AIPS 1: ZERROR: IN ZDAOPN ERRNO = 13 (Permission denied)
 ZDCHIN: COULD NOT READ PARAMETER FILE
 ZDCHIN: (USING MINIMUM SYSTEM CONFIGURATION)
AIPS 1: You are NOT assigned a TV device or server
AIPS 1: You are NOT assigned a graphics device or server
AIPS 1: Enter user ID number
?134

(and then it hangs on process AIPS1)

This seems like a simple permission problem, so I changed all the files in 
$NET0/GISCOURS to writeable for all users:

-rw-rw-rw-   1 aips     aips      102400 Mar  7 15:45 ACD000000;
lrwxrwxrwx   1 aips     aips          44 Mar  7 15:45 GRD000000; ->
/home/soft/osf/aips/DA00/TARIQUET/GRD000000;
-rw-rw-rw-   1 aips     aips      561152 Mar  7 15:45 MED000001;
-rw-rw-rw-   1 aips     aips      561152 Mar  7 15:45 MED000002;
-rw-rw-rw-   1 aips     aips      561152 Mar  7 15:45 MED000003;
-rw-rw-rw-   1 aips     aips      196608 Mar  7 15:45 MED000004;
-rw-rw-rw-   1 aips     aips      561152 Mar  7 15:45 MED000005;
-rw-rw-rw-   1 aips     aips      561152 Mar  7 15:45 MED000006;
-rw-rw-rw-   1 aips     aips      561152 Mar  7 15:45 MED000007;
-rw-rw-rw-   1 aips     aips      561152 Mar  7 15:45 MED000008;
lrwxrwxrwx   1 aips     aips          44 Mar  7 15:45 PWD000000; ->
/home/soft/osf/aips/DA00/TARIQUET/PWD000000;
-rw-rw-rw-   1 aips     aips        1024 Mar  7 15:45 SPD000000;
-rw-rw-rw-   1 aips     aips      165888 Mar  7 15:45 TCD000001;
-rw-rw-rw-   1 aips     aips       62464 Mar  7 15:45 TDD000004;
-rw-rw-rw-   1 aips     aips        1024 Mar  7 15:45 TPD001001;
-rw-rw-rw-   1 aips     aips        1024 Mar  7 15:45 TPD001002;
-rw-rw-rw-   1 aips     aips        1024 Mar  7 15:45 TPD001003;
-rw-rw-rw-   1 aips     aips        1024 Mar  7 15:45 TPD001004;
-rw-rw-rw-   1 aips     aips        1024 Mar  7 15:45 TPD001005;

After this change, the errors disappear, but it hangs on ZSTRTA.EXE even
before it gets to asking the user number.

So, I moved to your next option, and edited $AIPGUNIX/ZSTRTA.FOR
to change      
CALL ZDCHIN (.TRUE., IBUF)
to 
CALL ZDCHIN (.FALSE., IBUF)

This gets me a bit further. It brings up the Copyright 2002 message (which
is strange given that I'm using the 31DEC00 version). It gets to this:

...
TVDEVS.SH: Starting TPMON daemons on GISCOURS asynchronously...
Starting up 31DEC00 AIPS with normal priority
 ----------------------------------------------------------------------  
    Copyright (C) 2002                                                   
   Associated Universities, Inc. Washington DC, USA.                     
...
                          Charlottesville, VA 22903-2475 USA             
 ----------------------------------------------------------------------  
Begin the one true AIPS number 1 (release of 31DEC00) at priority =   0 
DADEVS.PL: This program is untested under Perl version 5.006
STARTPMON: [GISCOURS] Starting TPMON1 with output SUPPRESSED
STARTPMON: [GISCOURS] Starting TPMON2 with output SUPPRESSED

And then it hangs on AIPS1.

I then tried to see how far it got in ZSTRTA.EXE by adding in some WRITE
statement. It goes almost to the end:
C                                       Activate AIPSx (x = MYPOPS).
      AIPSN = 'AIPS  '
	WRITE(*,*)'It gets to here...'
      CALL ZACTV8 (AIPSN, MYPOPS, VERSIN, PID, IERR)
	WRITE(*,*)'but not here...'
C                                       Never expect to get here, but
C                                       just in case.
      IF (IERR.EQ.0) GO TO 999
         WRITE (MSGTXT,1060) IERR
 990     CALL MSGWRT (8)
C

So, it seems to hang on the CALL ZACTV8.

I haven't been able to get further than that, not even when I run AIPS
from the aips account.

Can you give me some more suggestions?

Thanks,
Carlos


> We have had similar reports occasionally.  The most recent one seems
> to have stopped e-mailing me before telling me what if anything they
> found.
> 
> In their case, it appeared that some user had decide to "clean up" and
> stopped a variety of processes that appeared dead.  Perhaps he
> mistyped the process number or they were not really dead, but the
> computer came to a jarring stop.  They got it to go back to a login
> screen eventually by killing more processes from a remote machine.
> But when they logged in, they had what you describe.
> 
> My theory is that some one or more of the files in $NET0/<host-name>
> are thought by the operating system to be locked by another process,
> quite possibly over nfs (network file system).
> 
> $AIPGUNIX/ZSTRTA.FOR starts by calling ZDCHIN which opens a disk file
> in that area called  SPD000000; 
> 
> That is the most likely file to have this phantom lock.
> 
> But I made suggestions like
>        mv SPD000000\; SPD.old
>        cp SPD.old SPD000000\; 
> which should move the lock to SPD.old and leave SPD000000; unlocked
> and they said that did not work.
> 
> They mv'd the entire directory and recreated a new one with FILAIP and
> claimed that that did not work.
> 
> Of course, they rebooted the computer before all this. I asked about
> other computers that might have been accessing these files.
> 
> Of late I have not heard from them and I was not entirely certain that
> they had followed the instructions literally.
> 
> You can put print statements into 
> 
>      $AIPGUNIX/ZSTRTA.FOR
> 
> (not aips message calls which need ZDCHIN to have worked) ahead of and
> after the call to ZDCHIN and rebuild ZSTRTA with
> 
>      COMLNK $AIPGUNIX/ZSTRTA
> 
> You could also try changing the call to ZDCHIN to
> 
>       CALL ZDCHIN (.FALSE.)
> 
> (the 2nd argument is obsolete) which will skip the disk read.  You
> probably do not want to do this long term but to debug what is going
> on...
> 
> Let me know what happens,
> 
> Eric Greisen
> 
> 






More information about the Daip mailing list