[evla-sw-discuss] Alert server

Kevin Ryan kryan at nrao.edu
Thu Jul 12 18:08:06 EDT 2007


On Jul 12, 2007, at 3:25 PM, Sonja Vrcic wrote:

>
>
> Kevin Ryan wrote:
>> This type of problem, I believe, is caused by trying to use what is
>> essentially an open-loop logging system as the monitor portion of a
>> control system.  I believe that to maintain reliable and accurate
>> system state its components must be periodically polled.
>>
> The other alternative is that each component periodically reports its
> state.
> The central system, which maintains the overall status of the
> (sub)system, instead of polling the subordinate components,
> periodically "wakes up" and checks the status report timestamps for  
> each
> component.
> If the status report for one (or more) of the components is not recent
> enough, its polls the component and/or raises an alarm.
> This works well for a hierarchical system.
>
> Advantages (compared to polling):
>
>     *  reduces the traffic in the network and
>     *  uses less processing time on the central system (it takes  
> less to
>       check the time stamps than to generate messages).
>
> Sonja

See my third paragraph down from here.  This is basically what it  
refers to.

>> The EVLA's method of having the 'patient report its own illness'
>> seems simpler compared to a system where every component is polled -
>> but that may not actually be the case.  For one, a server is required
>> in the former to maintain an 'image' of system state.  Keeping that
>> image accurate requires more complexity as illustrated by this
>> problem.  Band-aid solutions will grow a fragile system that does
>> polling anyway (see Barry's suggestions below).
>>
>> I've wondered, if one were to right now walk out to an antenna and
>> cut power to a MIB, will we know it or do we find out only when we
>> notice that the antenna stopped responding to the associated  
>> commands?
>>
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
>> A way to fix the problem below without polling would be to have
>> components periodically announce their 'wellness' as well as alerts.
>> But this will get even more complicated and we still still end up
>> with just an image of state maintained by a single central thing.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>> Rich argues that polling will require a process that knows about
>> every component in the system.  This is where hierarchy comes in.
>> VLA and EVLA Antenna objects are each experts on their own specific
>> components and are able to provide an 'executive summary' of state to
>> whomever, in turn, polls them.  Executive decisions such as flagging
>> and antenna scheduling can be made without having to know what an
>> 'L8' even is.  The state of the system is maintained 'live' - in the
>> system itself - no image or associated maintainer required.
>>
>> Bruce once argued against polling: 'Would you want the fire
>> department to call you every five minutes to see if your house is on
>> fire?'.  The answer is yes! - albeit not manually by phone.  A house
>> that is on fire (or the person that may or may not be in it) cannot
>> be relied upon to contact the fire department.  By actively 'pinging'
>> the house, the fire department will be guaranteed to know either 1)
>> the state of the house or 2) that it cannot communicate with the
>> house.  Either is valuable information to the monitor half of a
>> control system.
>>
>> What does anyone else think about this?  Periodic polling is a
>> difficult thing for people to want to embrace but I believe it gives
>> the most reliable and accurate system state and may possibly be the
>> least complex in the long run.
>>
>> Kevin
>>
>>
>> On Jul 11, 2007, at 3:08 PM, Barry Clark wrote:
>>
>>
>>> We've always been unclear on the concept of stale alerts (note
>>> included
>>> message at the end of this.
>>>
>>> Now that we have an alert server, it should take care of things in a
>>> better way.  But, as illustrated by the incident below, it has  
>>> merely
>>> gone from alerts being erroneously overlooked to alerts being
>>> erroneously
>>> preserved.
>>>
>>> The alert server is perfectly capable of going out to the monitor
>>> point
>>> and asking if the alert is still in force.  It should do so.   
>>> Question
>>> is when.  Could be done periodically, on a slow period.  Or, the
>>> Executor,
>>> whenever a new script starts, could send a REST message to the alert
>>> server, saying "Here is my ID.  Please check and see if any  
>>> alerts you
>>> have for me are still valid."
>>>
>>>
>>>> From evlatests-bounces at donar.cv.nrao.edu  Wed Jul 11 14:14:22 2007
>>>> Date: Wed, 11 Jul 2007 14:14:04 -0600 (MDT)
>>>> From: Ken Sowinski <ksowinsk at nrao.edu>
>>>> To: evlatests at nrao.edu
>>>>
>>>> There was much confusion at the VLA today with regard to timing
>>>> between the arrays whcih ened with the CMP in a strange state
>>>> and having to be rebooted more than once.  This resulted in
>>>> stale "L8 out of sync" messages in the alert server causing
>>>> all data from VLA antennas to be flagged as bad.
>>>>
>>>> AS a temporary measure Walter has kludged idcaf so that L8
>>>> alertsd are not turned into flags.  However the alerts are
>>>> still there and no one I have talked to knows how to make
>>>> them go away.  We need either a little more distributed
>>>> knowledge about these parts of the system, or a system
>>>> with less remembered state.  I wonder if some (certainly not
>>>> all) of our problems with bad flagging may be related to this
>>>> kind of behavior.
>>>>
>>>> Notice of removal of these alerts would be appreciated so that
>>>> idcaf can be restored to its usual funtionality.
>>>>
>>>> _______________________________________________
>>>> evlatests mailing list
>>>> evlatests at listmgr.cv.nrao.edu
>>>> http://listmgr.cv.nrao.edu/mailman/listinfo/evlatests
>>>>
>>>>
>>> _______________________________________________
>>> evla-sw-discuss mailing list
>>> evla-sw-discuss at listmgr.cv.nrao.edu
>>> http://listmgr.cv.nrao.edu/mailman/listinfo/evla-sw-discuss
>>>
>>
>> _______________________________________________
>> evla-sw-discuss mailing list
>> evla-sw-discuss at listmgr.cv.nrao.edu
>> http://listmgr.cv.nrao.edu/mailman/listinfo/evla-sw-discuss
>>
>
> -- 
> Sonja Vrcic
> Software Engineer
> National Research Council
> Herzberg Institute of Astrophysics
> Dominion Radio Astrophysical Observatory,
> Penticton, BC, Canada
> Tel:(250)490-4309/(250)493-2277ext.309
> Sonja.Vrcic at nrc-cnrc.gc.ca
> http://www.drao-ofr.hia-iha.nrc-cnrc.gc.ca/
>
> _______________________________________________
> evla-sw-discuss mailing list
> evla-sw-discuss at listmgr.cv.nrao.edu
> http://listmgr.cv.nrao.edu/mailman/listinfo/evla-sw-discuss




More information about the evla-sw-discuss mailing list