[evla-sw-discuss] Alert server
Kevin Ryan
kryan at nrao.edu
Thu Jul 12 18:08:06 EDT 2007
On Jul 12, 2007, at 3:25 PM, Sonja Vrcic wrote:
>
>
> Kevin Ryan wrote:
>> This type of problem, I believe, is caused by trying to use what is
>> essentially an open-loop logging system as the monitor portion of a
>> control system. I believe that to maintain reliable and accurate
>> system state its components must be periodically polled.
>>
> The other alternative is that each component periodically reports its
> state.
> The central system, which maintains the overall status of the
> (sub)system, instead of polling the subordinate components,
> periodically "wakes up" and checks the status report timestamps for
> each
> component.
> If the status report for one (or more) of the components is not recent
> enough, its polls the component and/or raises an alarm.
> This works well for a hierarchical system.
>
> Advantages (compared to polling):
>
> * reduces the traffic in the network and
> * uses less processing time on the central system (it takes
> less to
> check the time stamps than to generate messages).
>
> Sonja
See my third paragraph down from here. This is basically what it
refers to.
>> The EVLA's method of having the 'patient report its own illness'
>> seems simpler compared to a system where every component is polled -
>> but that may not actually be the case. For one, a server is required
>> in the former to maintain an 'image' of system state. Keeping that
>> image accurate requires more complexity as illustrated by this
>> problem. Band-aid solutions will grow a fragile system that does
>> polling anyway (see Barry's suggestions below).
>>
>> I've wondered, if one were to right now walk out to an antenna and
>> cut power to a MIB, will we know it or do we find out only when we
>> notice that the antenna stopped responding to the associated
>> commands?
>>
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
>> A way to fix the problem below without polling would be to have
>> components periodically announce their 'wellness' as well as alerts.
>> But this will get even more complicated and we still still end up
>> with just an image of state maintained by a single central thing.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> Rich argues that polling will require a process that knows about
>> every component in the system. This is where hierarchy comes in.
>> VLA and EVLA Antenna objects are each experts on their own specific
>> components and are able to provide an 'executive summary' of state to
>> whomever, in turn, polls them. Executive decisions such as flagging
>> and antenna scheduling can be made without having to know what an
>> 'L8' even is. The state of the system is maintained 'live' - in the
>> system itself - no image or associated maintainer required.
>>
>> Bruce once argued against polling: 'Would you want the fire
>> department to call you every five minutes to see if your house is on
>> fire?'. The answer is yes! - albeit not manually by phone. A house
>> that is on fire (or the person that may or may not be in it) cannot
>> be relied upon to contact the fire department. By actively 'pinging'
>> the house, the fire department will be guaranteed to know either 1)
>> the state of the house or 2) that it cannot communicate with the
>> house. Either is valuable information to the monitor half of a
>> control system.
>>
>> What does anyone else think about this? Periodic polling is a
>> difficult thing for people to want to embrace but I believe it gives
>> the most reliable and accurate system state and may possibly be the
>> least complex in the long run.
>>
>> Kevin
>>
>>
>> On Jul 11, 2007, at 3:08 PM, Barry Clark wrote:
>>
>>
>>> We've always been unclear on the concept of stale alerts (note
>>> included
>>> message at the end of this.
>>>
>>> Now that we have an alert server, it should take care of things in a
>>> better way. But, as illustrated by the incident below, it has
>>> merely
>>> gone from alerts being erroneously overlooked to alerts being
>>> erroneously
>>> preserved.
>>>
>>> The alert server is perfectly capable of going out to the monitor
>>> point
>>> and asking if the alert is still in force. It should do so.
>>> Question
>>> is when. Could be done periodically, on a slow period. Or, the
>>> Executor,
>>> whenever a new script starts, could send a REST message to the alert
>>> server, saying "Here is my ID. Please check and see if any
>>> alerts you
>>> have for me are still valid."
>>>
>>>
>>>> From evlatests-bounces at donar.cv.nrao.edu Wed Jul 11 14:14:22 2007
>>>> Date: Wed, 11 Jul 2007 14:14:04 -0600 (MDT)
>>>> From: Ken Sowinski <ksowinsk at nrao.edu>
>>>> To: evlatests at nrao.edu
>>>>
>>>> There was much confusion at the VLA today with regard to timing
>>>> between the arrays whcih ened with the CMP in a strange state
>>>> and having to be rebooted more than once. This resulted in
>>>> stale "L8 out of sync" messages in the alert server causing
>>>> all data from VLA antennas to be flagged as bad.
>>>>
>>>> AS a temporary measure Walter has kludged idcaf so that L8
>>>> alertsd are not turned into flags. However the alerts are
>>>> still there and no one I have talked to knows how to make
>>>> them go away. We need either a little more distributed
>>>> knowledge about these parts of the system, or a system
>>>> with less remembered state. I wonder if some (certainly not
>>>> all) of our problems with bad flagging may be related to this
>>>> kind of behavior.
>>>>
>>>> Notice of removal of these alerts would be appreciated so that
>>>> idcaf can be restored to its usual funtionality.
>>>>
>>>> _______________________________________________
>>>> evlatests mailing list
>>>> evlatests at listmgr.cv.nrao.edu
>>>> http://listmgr.cv.nrao.edu/mailman/listinfo/evlatests
>>>>
>>>>
>>> _______________________________________________
>>> evla-sw-discuss mailing list
>>> evla-sw-discuss at listmgr.cv.nrao.edu
>>> http://listmgr.cv.nrao.edu/mailman/listinfo/evla-sw-discuss
>>>
>>
>> _______________________________________________
>> evla-sw-discuss mailing list
>> evla-sw-discuss at listmgr.cv.nrao.edu
>> http://listmgr.cv.nrao.edu/mailman/listinfo/evla-sw-discuss
>>
>
> --
> Sonja Vrcic
> Software Engineer
> National Research Council
> Herzberg Institute of Astrophysics
> Dominion Radio Astrophysical Observatory,
> Penticton, BC, Canada
> Tel:(250)490-4309/(250)493-2277ext.309
> Sonja.Vrcic at nrc-cnrc.gc.ca
> http://www.drao-ofr.hia-iha.nrc-cnrc.gc.ca/
>
> _______________________________________________
> evla-sw-discuss mailing list
> evla-sw-discuss at listmgr.cv.nrao.edu
> http://listmgr.cv.nrao.edu/mailman/listinfo/evla-sw-discuss
More information about the evla-sw-discuss
mailing list