[evla-sw-discuss] severity of alerts

Bruce Rowen browen at aoc.nrao.edu
Thu Jul 21 11:46:28 EDT 2005


Bryan Butler wrote:

>
> all,
>
> we've gotten to the point where we need to define a severity level for 
> alerts.  the operators need this in order to tell the importance level 
> of them as they arrive on the checker screen.
>
> i propose that we define an integer alert level from 0 to 5, with 0 
> being the highest importance (issues of safety) and 5 being 
> informational only.  if somebody can make a case for more granularity 
> (do we need 10 levels?), that's fine.

The correlator CMIBs will use the Linux/Unix syslog system for general 
internal error reporting and quite possibly extend its use for all 
error/log messaging. I think it would be prudent, at a minimum, to adopt 
the numbering system used by syslog so as not to preclude its many 
attributes from being used in the future for the EVLA.  Look at 
/usr/include/sys/syslog.h and the manual page (man syslogd) for more 
details.

Of concern here is to at least keep the order (0 being most severe) and 
granularity (0-7) common with syslog.

>
> the engineers will be going over each MIB and its monitor points and 
> assigning this severity code to its alerts.  pat van buskirk is going 
> to do the leg work of pestering the engineers on this.
>
> in addition to a severity level, an "action" for the operator has to 
> be defined for each of these alerts.  this would be similar to the 
> page at: http://www.vla.nrao.edu/operators/alarms/ for the VLA.


It is my opinion that these two items should be applied to the alert by 
a higher level system (above the MIB) to avoid too much thinking and 
policy at the MIB level

>
> once they have them defined, then we need to support them.  there are 
> two ways that i see to do this:
>  1 - each MIB has coded into it these severity levels, just as it
>      has coded into it the levels at which alerts are triggered, and
>      when the alert is sent out, the severity code goes out with it;
>  2 - there is a lookup table which checker uses, given the MIB and
>      the monitor point/alert, to assign severity, and any program
>      that receives the alerts can use that lookup table to retrieve
>      the severity level.
>
> the advantage to 1 is that it keeps the information closest to the 
> MIB.  it also saves the "management" software upstream.  the 
> disadvantage is that if you decide to change anything you have to 
> modify all of those MIB images.  the advantage to 2 is that you avoid 
> that MIB image modification, and can centralize everything (in a 
> database or similar).  another advantage is that you can also include 
> the "action" in this database, as well as flagging information.  since 
> you are going to need these other things there, you might as well add 
> a column for severity.
>
> i prefer the lookup table/database, but would like to hear other 
> opinions.


One beauty of syslog is it's ability to sort and dispatch log/error 
messages to a highly tunable set of target machines and/or users. All 
built in, all programed and tested, and already in use by the sysadmin 
people for their needs. No need to create our own proprietary logging 
system.

-Bruce

>
>     -bryan
>
>
> _______________________________________________
> evla-sw-discuss mailing list
> evla-sw-discuss at listmgr.cv.nrao.edu
> http://listmgr.cv.nrao.edu/mailman/listinfo/evla-sw-discuss





More information about the evla-sw-discuss mailing list