[evla-sw-discuss] severity of alerts

Bryan Butler bbutler at nrao.edu
Thu Jul 21 12:24:07 EDT 2005



On 7/21/05 09:46, Bruce Rowen wrote:
> Bryan Butler wrote:
> 
>>
>> all,
>>
>> we've gotten to the point where we need to define a severity level for 
>> alerts.  the operators need this in order to tell the importance level 
>> of them as they arrive on the checker screen.
>>
>> i propose that we define an integer alert level from 0 to 5, with 0 
>> being the highest importance (issues of safety) and 5 being 
>> informational only.  if somebody can make a case for more granularity 
>> (do we need 10 levels?), that's fine.
> 
> 
> The correlator CMIBs will use the Linux/Unix syslog system for general 
> internal error reporting and quite possibly extend its use for all 
> error/log messaging. I think it would be prudent, at a minimum, to adopt 
> the numbering system used by syslog so as not to preclude its many 
> attributes from being used in the future for the EVLA.  Look at 
> /usr/include/sys/syslog.h and the manual page (man syslogd) for more 
> details.
> 
> Of concern here is to at least keep the order (0 being most severe) and 
> granularity (0-7) common with syslog.

this seems like a fine suggestion to me.

>> the engineers will be going over each MIB and its monitor points and 
>> assigning this severity code to its alerts.  pat van buskirk is going 
>> to do the leg work of pestering the engineers on this.
>>
>> in addition to a severity level, an "action" for the operator has to 
>> be defined for each of these alerts.  this would be similar to the 
>> page at: http://www.vla.nrao.edu/operators/alarms/ for the VLA.
> 
> It is my opinion that these two items should be applied to the alert by 
> a higher level system (above the MIB) to avoid too much thinking and 
> policy at the MIB level
> 

OK - we're in agreement here then.

>> once they have them defined, then we need to support them.  there are 
>> two ways that i see to do this:
>>  1 - each MIB has coded into it these severity levels, just as it
>>      has coded into it the levels at which alerts are triggered, and
>>      when the alert is sent out, the severity code goes out with it;
>>  2 - there is a lookup table which checker uses, given the MIB and
>>      the monitor point/alert, to assign severity, and any program
>>      that receives the alerts can use that lookup table to retrieve
>>      the severity level.
>>
>> the advantage to 1 is that it keeps the information closest to the 
>> MIB.  it also saves the "management" software upstream.  the 
>> disadvantage is that if you decide to change anything you have to 
>> modify all of those MIB images.  the advantage to 2 is that you avoid 
>> that MIB image modification, and can centralize everything (in a 
>> database or similar).  another advantage is that you can also include 
>> the "action" in this database, as well as flagging information.  since 
>> you are going to need these other things there, you might as well add 
>> a column for severity.
>>
>> i prefer the lookup table/database, but would like to hear other 
>> opinions.
>
> One beauty of syslog is it's ability to sort and dispatch log/error 
> messages to a highly tunable set of target machines and/or users. All 
> built in, all programed and tested, and already in use by the sysadmin 
> people for their needs. No need to create our own proprietary logging 
> system.

yes, but we already have the distribution mechanism in place - the 
multicasting of the alerts.  so we don't need the dispatch feature of 
syslog.  or maybe i'm not understanding how you mean to use syslog...


	-bryan



More information about the evla-sw-discuss mailing list