[evla-sw-discuss] CBE-NODE-20 back online

Karyn Roberts kroberts at nrao.edu
Tue Feb 28 12:30:38 EST 2017


Will do - next visit I'll have Bob / Jeff address it.

Karyn


On 02/28/2017 10:27 AM, Martin Pokorny wrote:
> On 02/28/2017 10:23 AM, Karyn Roberts wrote:
>> Bob Huber will be onsite - if it indeed does require service, please
>> have Bob bring this server back to the AOC for service as well..    Can
>> you confirm?
>
> I suspect it's premature to do that. We should try to diagnose the 
> issue first; see if we can make any progress without moving the 
> machine. It's been a lingering issue for some time, and I want to make 
> sure that it's not forgotten.
>
>>
>> Regards,
>>
>> Karyn Roberts
>>
>>
>> On 02/28/2017 10:22 AM, Martin Pokorny wrote:
>>> Since we're on the subject of bad CBE nodes, we've kept node 29 out of
>>> use for some time, as well. The problems with it are different than
>>> node 20, as I recall, but I don't remember the details. It's an
>>> obvious failure, I believe, in that the normal CBE software clearly
>>> fails to run when that machine is used. Paul would know for certain,
>>> but the problem doesn't seem to affect the yuppi software. We should
>>> remind ourselves what's wrong with that node, and fix it so it's
>>> possible to bring it fully back into service.
>>>
>>> On 02/27/2017 03:38 PM, Paul Demorest wrote:
>>>> It actually died within minutes once we started running processing on
>>>> it.  In case it's helpful for debugging, this mode had 128 MB/s coming
>>>> in via two of the 1gig correlator interfaces (p2p1/p2p2), and 0.5 MB/s
>>>> being written out to lustre.
>>>>
>>>> -Paul
>>>>
>>>> On 2017-02-27 15:20, Karyn Roberts wrote:
>>>>> Will do - this is as we expected, though I must admit I'm 
>>>>> surprised it
>>>>> only lasted a day.  I'll send the team out tomorrow to retrieve the
>>>>> system for repair.  Thanks for letting me know!
>>>>>
>>>>> Regards,
>>>>> Karyn
>>>>>
>>>>>
>>>>> On 02/27/2017 03:16 PM, Vivek Dhawan wrote:
>>>>>> Node 20 was, up, Paul tried some pulsar processing on it, and it
>>>>>> promptly died. it now seems to have rebooted itself. But we cannot
>>>>>> depend on it during actual observing, so I vote for a real fix.
>>>>>>
>>>>>> We will avoid using it until then.  Priority is moderately high -
>>>>>> a few days without it is OK.
>>>>>>
>>>>>> On Mon, February 27, 2017 11:20, Karyn Roberts wrote:
>>>>>> | The server CBE-NODE-20 is back online - after speaking with 
>>>>>> K.Scott
>>>>>> | we're not sure for how long - it has crashed twice before and we
>>>>>> expect
>>>>>> | it to crash again under similar circumstances.  Upon the next 
>>>>>> crash
>>>>>> well
>>>>>> | will remove the service to AOC for service.
>>>>>> |
>>>>>> | Regards,
>>>>>> |
>>>>>> | Karyn
>>>>>> |
>>>>>> | _______________________________________________
>>>>>> | evla-sw-discuss mailing list
>>>>>> | evla-sw-discuss at listmgr.nrao.edu
>>>>>> | https://listmgr.nrao.edu/mailman/listinfo/evla-sw-discuss
>>>>>> |
>>>>>>
>>>>
>>>> _______________________________________________
>>>> evla-sw-discuss mailing list
>>>> evla-sw-discuss at listmgr.nrao.edu
>>>> https://listmgr.nrao.edu/mailman/listinfo/evla-sw-discuss
>>>
>>>
>
>



More information about the evla-sw-discuss mailing list