[Difx-users] FxManager: Error in launching writethread!!

Richard Dodson richard.dodson at uwa.edu.au
Thu May 25 07:34:55 EDT 2017


Does

% ulimit -n 4096

 help with this?
  R

On Thu, May 25, 2017 at 6:56 PM, 江悟 <jiangwu at shao.ac.cn> wrote:

>
> Hi all,
>
> I think Bill Chen found the problem. Many difxlog threads are opened and
> kept there when running the correlation. So if the number of jobs is large,
> it is easy to reach the maximum of thread number limited by the operating
> system.
> Is there any 'key' or setting to turn off the difxlog threads timely?
>
> Cheers,
> Wu
>
> -----原始邮件-----
> *发件人:* "Chen Bill" <billchen001 at gmail.com>
> *发送时间:* 2017年5月24日 星期三
> *收件人:* "江悟" <jiangwu at shao.ac.cn>
> *抄送:* "Adam Deller" <adeller at astro.swin.edu.au>, difxusers <
> difx-users at listmgr.nrao.edu>
> *主题:* Re: [Difx-users] FxManager: Error in launching writethread!!
>
>
> Hi All,
>
> I checked this issue, I think the issue is about too many of difxlog
> processes, in this case there are around 800 scans need to be process, and
> there will be 800 difxlog process running until all work done. The default
> Linux kernel only support 1024 process for one user.
>
> I just wonder is it possible to close that difxlog process when one scan
> finished.
> for this issue, we can increase the kernel parameter "noproc" to a  big
> number, but I think the good way is to enhance the code to reduce the
> number of difxlog process.
>
> Jiangwu, please correct me if I have mistake.
>
>
>
> Thanks,
> Bill Chen
> www.simplehpc.com
>
> On Tue, May 23, 2017 at 1:27 PM, 江悟 <jiangwu at shao.ac.cn> wrote:
>
>>
>> Hi Chris and Adam,
>>
>> Attached are the .v2d, .input, .vex files I used. I was using errormon2
>> when the error turned out, please check the last line of the errormon2.log.
>> Unfortunately, when I used errormon this morning for re-correlating the
>> same scans, no error reported.
>>
>> Regards,
>> Wu
>>
>> -----原始邮件-----
>> *发件人:* "Adam Deller" <adeller at astro.swin.edu.au>
>> *发送时间:* 2017年5月23日 星期二
>> *收件人:* "江悟" <jiangwu at shao.ac.cn>
>> *抄送:* difxusers <difx-users at listmgr.nrao.edu>
>> *主题:* Re: FxManager: Error in launching writethread!!
>>
>> I don't recall seeing this before.  Might it be an error with pthreads
>> generally?  Does it still occur if you're running a much smaller
>> correlation?
>>
>> Cheers,
>> Adam
>>
>> On 23 May 2017 at 12:03, 江悟 <jiangwu at shao.ac.cn> wrote:
>>
>>>
>>> Hi all,
>>>
>>> Recently I ran difx (2.4.1 version) and came cross the error as
>>> following:
>>> FxManager: Error in launching writethread!!
>>>
>>> I was using 100 cores and 4 threads each, 1 seperate header node as set
>>> in the v2d file. And the visbufferlength was set to 80. The number of
>>> stations was 3, the raw data was put in a RAID with parallel file system,
>>> while the visbility output was collected and recorded on the local disk of
>>> the header noder. Other correlation parameters was,
>>>   tInt =0.131072 <13%2010%2072>
>>>   subintNS = 8192000
>>>   nChan = 512
>>>
>>> I also checked the memory of the header node, the maximum occupied
>>> memory is less than 7%. So I don't know the reason of this error. Have you
>>> ever met this error before and could you please help to identify it?
>>>
>>> Thanks a lot.
>>>
>>> Best regards,
>>> Wu Jiang
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> !=============================================================!
>> Dr. Adam Deller
>> ARC Future Fellow, Senior Lecturer
>> Centre for Astrophysics & Supercomputing
>> Swinburne University of Technology
>> John St, Hawthorn VIC 3122 Australia
>> phone: +61 3 9214 5307 <+61%203%209214%205307>
>> fax: +61 3 9214 8797 <+61%203%209214%208797>
>>
>> office days (usually): Mon-Thu
>> !=============================================================!
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> Difx-users mailing list
>> Difx-users at listmgr.nrao.edu
>> https://listmgr.nrao.edu/mailman/listinfo/difx-users
>>
>>
>
>
>
>
>


-- 
-------------------------
Dr Richard Dodson,
International Centre for Radio Astronomy Research
University of Western Australia
P: +8 6488 7842 E: richard.dodson at icrar.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/difx-users/attachments/20170525/2a489230/attachment-0001.html>


More information about the Difx-users mailing list