[evla-sw-discuss] severity of alerts
Bryan Butler
bbutler at nrao.edu
Thu Jul 21 12:24:07 EDT 2005
On 7/21/05 09:46, Bruce Rowen wrote:
> Bryan Butler wrote:
>
>>
>> all,
>>
>> we've gotten to the point where we need to define a severity level for
>> alerts. the operators need this in order to tell the importance level
>> of them as they arrive on the checker screen.
>>
>> i propose that we define an integer alert level from 0 to 5, with 0
>> being the highest importance (issues of safety) and 5 being
>> informational only. if somebody can make a case for more granularity
>> (do we need 10 levels?), that's fine.
>
>
> The correlator CMIBs will use the Linux/Unix syslog system for general
> internal error reporting and quite possibly extend its use for all
> error/log messaging. I think it would be prudent, at a minimum, to adopt
> the numbering system used by syslog so as not to preclude its many
> attributes from being used in the future for the EVLA. Look at
> /usr/include/sys/syslog.h and the manual page (man syslogd) for more
> details.
>
> Of concern here is to at least keep the order (0 being most severe) and
> granularity (0-7) common with syslog.
this seems like a fine suggestion to me.
>> the engineers will be going over each MIB and its monitor points and
>> assigning this severity code to its alerts. pat van buskirk is going
>> to do the leg work of pestering the engineers on this.
>>
>> in addition to a severity level, an "action" for the operator has to
>> be defined for each of these alerts. this would be similar to the
>> page at: http://www.vla.nrao.edu/operators/alarms/ for the VLA.
>
> It is my opinion that these two items should be applied to the alert by
> a higher level system (above the MIB) to avoid too much thinking and
> policy at the MIB level
>
OK - we're in agreement here then.
>> once they have them defined, then we need to support them. there are
>> two ways that i see to do this:
>> 1 - each MIB has coded into it these severity levels, just as it
>> has coded into it the levels at which alerts are triggered, and
>> when the alert is sent out, the severity code goes out with it;
>> 2 - there is a lookup table which checker uses, given the MIB and
>> the monitor point/alert, to assign severity, and any program
>> that receives the alerts can use that lookup table to retrieve
>> the severity level.
>>
>> the advantage to 1 is that it keeps the information closest to the
>> MIB. it also saves the "management" software upstream. the
>> disadvantage is that if you decide to change anything you have to
>> modify all of those MIB images. the advantage to 2 is that you avoid
>> that MIB image modification, and can centralize everything (in a
>> database or similar). another advantage is that you can also include
>> the "action" in this database, as well as flagging information. since
>> you are going to need these other things there, you might as well add
>> a column for severity.
>>
>> i prefer the lookup table/database, but would like to hear other
>> opinions.
>
> One beauty of syslog is it's ability to sort and dispatch log/error
> messages to a highly tunable set of target machines and/or users. All
> built in, all programed and tested, and already in use by the sysadmin
> people for their needs. No need to create our own proprietary logging
> system.
yes, but we already have the distribution mechanism in place - the
multicasting of the alerts. so we don't need the dispatch feature of
syslog. or maybe i'm not understanding how you mean to use syslog...
-bryan
More information about the evla-sw-discuss
mailing list