[Difx-users] WARNING Could not open monitoring socket

Chris.Phillips at csiro.au Chris.Phillips at csiro.au
Thu Apr 21 21:25:35 EDT 2016


Hi Stuart

I assume that actually the message was:

"Could not open command monitoring socket! Aborting message receive thread.”

You really need to send the full output for us to have any chance of diagnosing this. When you say “nothing more” in errormon2, does ANYTHING appear there?

The message receive thread in most circumstances is not important. If DIFX messages is not working you will however not get any logging messages.

Which processes give this message and on which machines are they running?

difxmessage library does not report error unfortunately, just return if there were errors.

I would suggest making a temp change to difxmessage/multicast.c and recompiling it and mpifxcorr

Add some calls to perror before all the error returns in the routine openMultiCastSocket. E.g.


     /* Make UDP socket */
        sock = socket(AF_INET, SOCK_DGRAM, 0);
        if(sock < 0)
        {
         perror(“Trying to create socket: ”);
        return -1;
        }

      /* Allow reuse of port */
        v = setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(yes));
        if(v < 0)
        {
perror(“Setsockopt: “);
                return -2;
        }

      /* bind to receive address */
        v = bind(sock, (struct sockaddr *)&addr, sizeof(struct sockaddr_in));
        if(v < 0)
        {
perror(“Binding to socket: ");
                return -3;
        }

        v = inet_aton(group, &mreq.imr_multiaddr);
        if(!v)
        {
perror(“inet_aton: );
               return -4;
        }

I am pretty sure the problem is not the choice of multicast address - if cannot connect to multicast group the code should give a major warning.

Just to double check - do you see the following message:

 Unicast (XXXXX) difxMessage in use. Some functionallity may be reduced

If you do, thats the problem

Cheers
Chris

On 22 Apr 2016, at 11:02 AM, Stuart Weston <nzobservers at gmail.com<mailto:nzobservers at gmail.com>> wrote:

I have two servers, they both have 2 x CPU ( 6 cores, hyperthreaded). So potentially I have 24 cores and 48 threads.

mpirun starts mpifxcorr on both servers, but we get the “WARNING Could not open monitoring socket ! Aborting message receive thread” on the master ? The processes seem to sit there and do nothing, nothing more in errmon2.

If I change the machines file I can run the same correlation on each server individually to completion, so DiFX has to be good.

ww-flexbuf-01:/raid0/etransfer/hw04# cat machines
ww-flexbuf-01
wark167
ww-flexbuf-01:/raid0/etransfer/hw04# cat threads
NUMBER OF CORES:    6
2
2
2
2
2
2

Note our network we have been asked to use a different multicast address, so in DIFXHOME/setup.bash I have set:

DIFX_MESSAGE_GROUP=239.253.253.90
DIFX_BINARY_GROUP=239.253.253.90



Any ideas ?
_______________________________________________
Difx-users mailing list
Difx-users at listmgr.nrao.edu<mailto:Difx-users at listmgr.nrao.edu>
https://listmgr.nrao.edu/mailman/listinfo/difx-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/difx-users/attachments/20160422/4f0a453b/attachment-0001.html>


More information about the Difx-users mailing list