[evla-sw-discuss] CBE-NODE-20 back online
Karyn Roberts
kroberts at nrao.edu
Tue Feb 28 12:23:36 EST 2017
Bob Huber will be onsite - if it indeed does require service, please
have Bob bring this server back to the AOC for service as well.. Can
you confirm?
Regards,
Karyn Roberts
On 02/28/2017 10:22 AM, Martin Pokorny wrote:
> Since we're on the subject of bad CBE nodes, we've kept node 29 out of
> use for some time, as well. The problems with it are different than
> node 20, as I recall, but I don't remember the details. It's an
> obvious failure, I believe, in that the normal CBE software clearly
> fails to run when that machine is used. Paul would know for certain,
> but the problem doesn't seem to affect the yuppi software. We should
> remind ourselves what's wrong with that node, and fix it so it's
> possible to bring it fully back into service.
>
> On 02/27/2017 03:38 PM, Paul Demorest wrote:
>> It actually died within minutes once we started running processing on
>> it. In case it's helpful for debugging, this mode had 128 MB/s coming
>> in via two of the 1gig correlator interfaces (p2p1/p2p2), and 0.5 MB/s
>> being written out to lustre.
>>
>> -Paul
>>
>> On 2017-02-27 15:20, Karyn Roberts wrote:
>>> Will do - this is as we expected, though I must admit I'm surprised it
>>> only lasted a day. I'll send the team out tomorrow to retrieve the
>>> system for repair. Thanks for letting me know!
>>>
>>> Regards,
>>> Karyn
>>>
>>>
>>> On 02/27/2017 03:16 PM, Vivek Dhawan wrote:
>>>> Node 20 was, up, Paul tried some pulsar processing on it, and it
>>>> promptly died. it now seems to have rebooted itself. But we cannot
>>>> depend on it during actual observing, so I vote for a real fix.
>>>>
>>>> We will avoid using it until then. Priority is moderately high -
>>>> a few days without it is OK.
>>>>
>>>> On Mon, February 27, 2017 11:20, Karyn Roberts wrote:
>>>> | The server CBE-NODE-20 is back online - after speaking with K.Scott
>>>> | we're not sure for how long - it has crashed twice before and we
>>>> expect
>>>> | it to crash again under similar circumstances. Upon the next crash
>>>> well
>>>> | will remove the service to AOC for service.
>>>> |
>>>> | Regards,
>>>> |
>>>> | Karyn
>>>> |
>>>> | _______________________________________________
>>>> | evla-sw-discuss mailing list
>>>> | evla-sw-discuss at listmgr.nrao.edu
>>>> | https://listmgr.nrao.edu/mailman/listinfo/evla-sw-discuss
>>>> |
>>>>
>>
>> _______________________________________________
>> evla-sw-discuss mailing list
>> evla-sw-discuss at listmgr.nrao.edu
>> https://listmgr.nrao.edu/mailman/listinfo/evla-sw-discuss
>
>
More information about the evla-sw-discuss
mailing list