[evla-sw-discuss] alerts - reliable delivery

Wed Mar 9 18:02:47 EST 2005

The current scheme for alerts coming from EVLA and VLA antennas
is to multicast an alert-on message once when a monitor point
goes into an alert state, and to multicast an alert-off message
once when a monitor point exits an alert state.  As I have
mentioned on several occasions I do not consider this scheme
robust w.r.t. dropped packets and other network glitches.  We
already have examples of alert-off messages not been seen for
a corresponding alert-on message even though direct query of
the mib shows the monitor point to have exited the alert state.

In the MIB Issues meeting of 3/8 we decided that the alert status
of a monitor point would be included in all packets containing
the value of the monitor point, be those packets multicasts destined
for the archive, for software process managing an observation, for
screens, or unicast UDP datagrams returned in response to a "get"
command received over the service port.  W.r.t. catching alerts,
this scheme should work well for clients interested in monitor
point values, but it seems to be extremely inefficient for clients
that are interested in alerts, but not in monitor point values
(such a checker & flagging).

Ken suggested a "box" or a layer whose job is to catch all monitor
point values and alerts and then make this information globally
available.  I narrowed/modified his suggestion to a proposal that
we have a layer that catches only alerts, makes the alerts globally
available, and uses reliable communications (tcp/ip rather than udp)
between the mib and the destination layer.

One of the nice things about multicast is that the sender need have
no concept of destination address.  The sender simply puts the message
onto the wire, using a pre-determined multicast IP address.  Parties
interested in receiving the multicast send out an IGMP "join group"
message that is handled by the network routers in the system, not by
applications.  The forwarding tables are maintained in the routers.
If multicast (or broadcast) is not used for delivery of alerts, then
pretty much all of the alternatives would require the MIBs must have a
notion of a destination address for the delivery of alerts.

I think I would be willing to tolerate requiring the mibs to know one
address, and I think I find that requirement preferable to some of
the alternatives such as periodic retransmission of alert-on and
alert-off messages.  The latter gets very messy once you begin to dig
into it.

OK.  So maybe the alert-on and alert-off messages are still sent only
once, but now they use reliable (tcp/ip) rather than unreliable (udp
or multicast) delivery, and the mibs do have to know the destination
for the alerts.

So, who or what is the destination for the alerts and does that mean we
would have to poll that destination to determine alert status ?

My opinion is that the proper destination for the alert-on and alert-off
messages is the antenna server layer, the one that is to be spun off from
the Executor, where I have suggested that we maintain antenna state.  The
picture in my head is now as follows:

- the antenna server layer is made up of the network addressable antenna
   objects that we spoke of in the early days of the DO Comm team and that
   Kevin has frequently advocated

- antenna state is maintained in these antenna objects.  They tap into
   the ostream multicasts of monitor data, and are the destination for
   alerts issued by EVLA and VLA antenna subsystems.

- Alert-on and alert-off messages produced by VLA and EVLA are still sent
   only once, but are delivered to the antenna objects via a reliable
   protocol, presumably tcp/ip.  The tcp/ip connections would be make &
   break.  Not persistent.

- The antenna objects now serve as a distribution point for processes
   interested in antenna state and alerts.  But, please, no polling.  It's
   a real drag on real-time systems.  Wastes CPU cycles, is not timely,
   and interferes with timing.  The antenna objects are deployed in
   well-resourced boxes.  They can support a publish-subscribe mechanism
   for interested parties such as checker & flagger.
   Of course, the antenna objects are free to develop their own alerts,
   based on alerts and monitor data from the antenna subsystems.

Of course, I will hold back from posting this message until I have a
diagram to attach.

Very little, if any, of what I have written is new, and it supports many
of the ideas that have been floated, such as a layered hierarchy for alert
generation and distribution.  It also echoes some of Doug Tody's recent
comments.  The chief difference between our earlier conversations and now
is that we are no longer talking about concepts, we are talking about what
we will actually implement, over the next few months.

Anyone know of a nice, thin layer or set of libraries that could be used
as the publish-subscribe mechanism between the antenna objects and the
interested parties, such as flagger ?  Maybe something like ACE toolkit/
framework, only for Java rather than C & C++ ?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: alerts_distribution.pdf
Type: application/pdf
Size: 19501 bytes
Desc: not available
URL: <http://listmgr.nrao.edu/pipermail/evla-sw-discuss/attachments/20050309/ee17cbcb/attachment.pdf>