[evla-sw-discuss] Alert server

Sonja Vrcic sonja.vrcic at nrc.gc.ca
Thu Jul 12 17:25:35 EDT 2007



Kevin Ryan wrote:
> This type of problem, I believe, is caused by trying to use what is  
> essentially an open-loop logging system as the monitor portion of a  
> control system.  I believe that to maintain reliable and accurate  
> system state its components must be periodically polled.
>   
The other alternative is that each component periodically reports its 
state.
The central system, which maintains the overall status of the 
(sub)system, instead of polling the subordinate components,
periodically "wakes up" and checks the status report timestamps for each 
component.
If the status report for one (or more) of the components is not recent 
enough, its polls the component and/or raises an alarm.
This works well for a hierarchical system.

Advantages (compared to polling):

    *  reduces the traffic in the network and
    *  uses less processing time on the central system (it takes less to
      check the time stamps than to generate messages).

Sonja
> The EVLA's method of having the 'patient report its own illness'  
> seems simpler compared to a system where every component is polled -  
> but that may not actually be the case.  For one, a server is required  
> in the former to maintain an 'image' of system state.  Keeping that  
> image accurate requires more complexity as illustrated by this  
> problem.  Band-aid solutions will grow a fragile system that does  
> polling anyway (see Barry's suggestions below).
>
> I've wondered, if one were to right now walk out to an antenna and  
> cut power to a MIB, will we know it or do we find out only when we  
> notice that the antenna stopped responding to the associated commands?
>
> A way to fix the problem below without polling would be to have  
> components periodically announce their 'wellness' as well as alerts.   
> But this will get even more complicated and we still still end up  
> with just an image of state maintained by a single central thing.
>
> Rich argues that polling will require a process that knows about  
> every component in the system.  This is where hierarchy comes in.   
> VLA and EVLA Antenna objects are each experts on their own specific  
> components and are able to provide an 'executive summary' of state to  
> whomever, in turn, polls them.  Executive decisions such as flagging  
> and antenna scheduling can be made without having to know what an  
> 'L8' even is.  The state of the system is maintained 'live' - in the  
> system itself - no image or associated maintainer required.
>
> Bruce once argued against polling: 'Would you want the fire  
> department to call you every five minutes to see if your house is on  
> fire?'.  The answer is yes! - albeit not manually by phone.  A house  
> that is on fire (or the person that may or may not be in it) cannot  
> be relied upon to contact the fire department.  By actively 'pinging'  
> the house, the fire department will be guaranteed to know either 1)  
> the state of the house or 2) that it cannot communicate with the  
> house.  Either is valuable information to the monitor half of a  
> control system.
>
> What does anyone else think about this?  Periodic polling is a  
> difficult thing for people to want to embrace but I believe it gives  
> the most reliable and accurate system state and may possibly be the  
> least complex in the long run.
>
> Kevin
>
>
> On Jul 11, 2007, at 3:08 PM, Barry Clark wrote:
>
>   
>> We've always been unclear on the concept of stale alerts (note  
>> included
>> message at the end of this.
>>
>> Now that we have an alert server, it should take care of things in a
>> better way.  But, as illustrated by the incident below, it has merely
>> gone from alerts being erroneously overlooked to alerts being  
>> erroneously
>> preserved.
>>
>> The alert server is perfectly capable of going out to the monitor  
>> point
>> and asking if the alert is still in force.  It should do so.  Question
>> is when.  Could be done periodically, on a slow period.  Or, the  
>> Executor,
>> whenever a new script starts, could send a REST message to the alert
>> server, saying "Here is my ID.  Please check and see if any alerts you
>> have for me are still valid."
>>
>>     
>>> From evlatests-bounces at donar.cv.nrao.edu  Wed Jul 11 14:14:22 2007
>>> Date: Wed, 11 Jul 2007 14:14:04 -0600 (MDT)
>>> From: Ken Sowinski <ksowinsk at nrao.edu>
>>> To: evlatests at nrao.edu
>>>
>>> There was much confusion at the VLA today with regard to timing
>>> between the arrays whcih ened with the CMP in a strange state
>>> and having to be rebooted more than once.  This resulted in
>>> stale "L8 out of sync" messages in the alert server causing
>>> all data from VLA antennas to be flagged as bad.
>>>
>>> AS a temporary measure Walter has kludged idcaf so that L8
>>> alertsd are not turned into flags.  However the alerts are
>>> still there and no one I have talked to knows how to make
>>> them go away.  We need either a little more distributed
>>> knowledge about these parts of the system, or a system
>>> with less remembered state.  I wonder if some (certainly not
>>> all) of our problems with bad flagging may be related to this
>>> kind of behavior.
>>>
>>> Notice of removal of these alerts would be appreciated so that
>>> idcaf can be restored to its usual funtionality.
>>>
>>> _______________________________________________
>>> evlatests mailing list
>>> evlatests at listmgr.cv.nrao.edu
>>> http://listmgr.cv.nrao.edu/mailman/listinfo/evlatests
>>>
>>>       
>> _______________________________________________
>> evla-sw-discuss mailing list
>> evla-sw-discuss at listmgr.cv.nrao.edu
>> http://listmgr.cv.nrao.edu/mailman/listinfo/evla-sw-discuss
>>     
>
> _______________________________________________
> evla-sw-discuss mailing list
> evla-sw-discuss at listmgr.cv.nrao.edu
> http://listmgr.cv.nrao.edu/mailman/listinfo/evla-sw-discuss
>   

-- 
Sonja Vrcic
Software Engineer
National Research Council
Herzberg Institute of Astrophysics
Dominion Radio Astrophysical Observatory,
Penticton, BC, Canada
Tel:(250)490-4309/(250)493-2277ext.309
Sonja.Vrcic at nrc-cnrc.gc.ca
http://www.drao-ofr.hia-iha.nrc-cnrc.gc.ca/




More information about the evla-sw-discuss mailing list