[daip] aips hangs

Eric Greisen egreisen at nrao.edu
Wed Aug 18 15:34:23 EDT 2010


Ulrich Hiller wrote:
>   Hello,
> I have a problem with a hanging aips. The messages are:
> :~> /aips/START_AIPS tv=local
> START_AIPS: Will use or start first available Unix Socket based TV
> 
> You have a choice of 3 printers.  These are:
> 
>      No. [ type  ] Description
> -------------------------------------------------------------
>       1. [     PS] laps w
>       2. [     PS] laps i
>       3. [     PS] laps e
> -------------------------------------------------------------
> 
> START_AIPS: Enter your choice, or the word QUIT [default is 1]:
> START_AIPS: Your initial AIPS printer is the laps w
> START_AIPS:  - system name laps_w, AIPS type PS
> 
> START_AIPS: User data area assignments:
> DADEVS.PL: This program is untested under Perl version 5.010
>    (Using global default file /rw/aips/DA00/DADEVS.LIST for DADEVS.PL)
>     Disk 1 (1) is /home/aips/DATA/AIDA47_1
>     Disk 2 (2) is /disk1/aips/DATA/AIDA47_2
>     Disk 3 (3) is /disk2/aips/DATA/AIDA47_3
>     Disk 4 (4) is /disk3/aips/DATA/AIDA47_4
>     Disk 5 (5) is /disk4/aips/DATA/AIDA47_5
>     Disk 6 (6) is /disk5/aips/DATA/AIDA47_6
>     Disk 7 (7) is /disk6/aips/DATA/AIDA47_7
> 
> Tape assignments:
>     Tape 1 is REMOTE
>     Tape 2 is REMOTE
> 
> START_AIPS: Starting TV servers on aida47 asynchronously
> START_AIPS:  - WITH Unix Sockets as requested...
> START_AIPS: Starting TPMON daemons on AIDA47 asynchronously...
> Starting up 31DEC05 AIPS with normal priority
> Begin the one true AIPS number 1 (release of 31DEC05) at priority =   0
> AIPS 1: You are not on a local TV device, welcome stranger
> AIPS 1: You are assigned TV device/server  25
> AIPS 1: You are assigned graphics device/server  25
> AIPS 1: Enter user ID number
> ?DADEVS.PL: This program is untested under Perl version 5.010
> UNIXSERVERS: TVSRV1 is already running on host aida47, user linadm
> UNIXSERVERS: Start XAS1 on aida47, DISPLAY localhost:10.0
> XAS: ** TrueColor FOUND!!!
> XAS: Cannot use shared memory on remote XAS link
> XAS: !!! Shared memory not selected !!!
> XAS: Using screen width height 1270 924, max grey level 255
> UNIXSERVERS: Start graphics server TKSRV1 on aida47, display localhost:10.0
> UNIXSERVERS: Start message server MSSRV1 on aida47, display localhost:10.0
> STARTPMON: [AIDA47] Starting TPMON1 with output SUPPRESSED
> 
> AIPS 1: Enter user ID number
> ?1000
> 
> Then aips hangs. The X-Aips-tv-screen, the aiops-teksrv-window and the 
> aips-msgsrv-window came up. No error messages.
> The computer is freshly booted (before it did not work either), /tmp is 
> cleaned.
> The system is opensuse 11.2 x86_64
> 
> This happens also on new aips versions.
> 
> How can I debug to know what the problem is?
> I do not know whether it gives a clue, but the aips disks are on a nfs 
> mounted disk.

I am trying to figure out what you are really doing.  If I read this 
correctly, you are sitting in front of computer XXX with a window open 
on aida47.  In that window, you are executing an ancient version of 
aips.  Having read the user number (and I am assuming it hangs on all 
user numbers??) it then needs to create a message file on AIDA47_1
and user catalogs on AIDA47_n for all n.  It seems to be hanging there 
because the copyright messages come out next.  This implies to me that 
the necessary file lock daemons for file locking over nfs are probably 
not working - if the disks are not on AIDA47.  Note that we find that to 
be a very bad idea - nfs read and esp write is very slow compared to 
doing things on a local disk.

In our set up in Socorro, we may have the central data areas for our 
many machines on a central file server.  But the 
$AIPS_ROOT/DA00/<hostname> directories are actually symbolic links to a 
directory on <hostname>.  Similarly, the data areas for hostname are on 
hostname even if they are reached via symbolic link.  This whole 
business of file locking is very important in a multi-process 
environment, but it requires some mysterious daemon processes to be 
running.  We have found here that I could read with locking files on 
most machines but not on some.  When those few were re-booted I could 
then read the files on them (PRTAC has the option to read all accounting 
files in the LAN).

I seem to remember very similar questions from someone else with 
machines named AIDAnn - perhaps you should ask around locally.

Furthermore, 31DEC05 is rether old.  We are proud of what we have 
accomplished since...

Eric Greisen




More information about the Daip mailing list