[Difx-users] FxManager: Error in launching writethread!!

Chen Bill billchen001 at gmail.com
Wed May 24 06:57:02 EDT 2017


Hi All,

I checked this issue, I think the issue is about too many of difxlog
processes, in this case there are around 800 scans need to be process, and
there will be 800 difxlog process running until all work done. The default
Linux kernel only support 1024 process for one user.

I just wonder is it possible to close that difxlog process when one scan
finished.
for this issue, we can increase the kernel parameter "noproc" to a  big
number, but I think the good way is to enhance the code to reduce the
number of difxlog process.

Jiangwu, please correct me if I have mistake.



Thanks,
Bill Chen
www.simplehpc.com

On Tue, May 23, 2017 at 1:27 PM, 江悟 <jiangwu at shao.ac.cn> wrote:

>
> Hi Chris and Adam,
>
> Attached are the .v2d, .input, .vex files I used. I was using errormon2
> when the error turned out, please check the last line of the errormon2.log.
> Unfortunately, when I used errormon this morning for re-correlating the
> same scans, no error reported.
>
> Regards,
> Wu
>
> -----原始邮件-----
> *发件人:* "Adam Deller" <adeller at astro.swin.edu.au>
> *发送时间:* 2017年5月23日 星期二
> *收件人:* "江悟" <jiangwu at shao.ac.cn>
> *抄送:* difxusers <difx-users at listmgr.nrao.edu>
> *主题:* Re: FxManager: Error in launching writethread!!
>
> I don't recall seeing this before.  Might it be an error with pthreads
> generally?  Does it still occur if you're running a much smaller
> correlation?
>
> Cheers,
> Adam
>
> On 23 May 2017 at 12:03, 江悟 <jiangwu at shao.ac.cn> wrote:
>
>>
>> Hi all,
>>
>> Recently I ran difx (2.4.1 version) and came cross the error as following:
>> FxManager: Error in launching writethread!!
>>
>> I was using 100 cores and 4 threads each, 1 seperate header node as set
>> in the v2d file. And the visbufferlength was set to 80. The number of
>> stations was 3, the raw data was put in a RAID with parallel file system,
>> while the visbility output was collected and recorded on the local disk of
>> the header noder. Other correlation parameters was,
>>   tInt =0.131072
>>   subintNS = 8192000
>>   nChan = 512
>>
>> I also checked the memory of the header node, the maximum occupied memory
>> is less than 7%. So I don't know the reason of this error. Have you ever
>> met this error before and could you please help to identify it?
>>
>> Thanks a lot.
>>
>> Best regards,
>> Wu Jiang
>>
>>
>>
>>
>>
>
>
> --
> !=============================================================!
> Dr. Adam Deller
> ARC Future Fellow, Senior Lecturer
> Centre for Astrophysics & Supercomputing
> Swinburne University of Technology
> John St, Hawthorn VIC 3122 Australia
> phone: +61 3 9214 5307 <+61%203%209214%205307>
> fax: +61 3 9214 8797 <+61%203%209214%208797>
>
> office days (usually): Mon-Thu
> !=============================================================!
>
>
>
>
>
>
> _______________________________________________
> Difx-users mailing list
> Difx-users at listmgr.nrao.edu
> https://listmgr.nrao.edu/mailman/listinfo/difx-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listmgr.nrao.edu/pipermail/difx-users/attachments/20170524/df0f56b0/attachment.html>


More information about the Difx-users mailing list