diff options
Diffstat (limited to 'torrus/doc/devdoc/wd.uptime-mon.pod')
-rw-r--r-- | torrus/doc/devdoc/wd.uptime-mon.pod | 162 |
1 files changed, 162 insertions, 0 deletions
diff --git a/torrus/doc/devdoc/wd.uptime-mon.pod b/torrus/doc/devdoc/wd.uptime-mon.pod new file mode 100644 index 000000000..8bc1c423e --- /dev/null +++ b/torrus/doc/devdoc/wd.uptime-mon.pod @@ -0,0 +1,162 @@ +# Copyright (C) 2002 Stanislav Sinyagin +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. + +# $Id: wd.uptime-mon.pod,v 1.1 2010-12-27 00:04:36 ivan Exp $ +# Stanislav Sinyagin <ssinyagin@yahoo.com> +# +# + +=head1 RRFW Working Draft: Service uptime monitoring and reporting + +Status: in pre-design phase. +Date: Sep 26 2003; Last revised: + +=head2 Definitions + +It is often required to monitor the service level in networks. +Service level is normally covered by Service Level Agreement (SLA), +which defines the following parameters: + +=over 4 + +=item * Service definition + +Describes the particular service in terms of functionality and means of +monitoring. Examples are: IP VPN connectivity, WAN uplink, SQL database engine. + +=item * Maintenance window + +Describes the periodic time intervals when service outage is possible +due to some maintenance work. It may be unconditional (outage is always +possible within the window), or conditional (customer confirmation required +for outage within the window). Notification period is normally defined +for maintenance outages. +Example: every 1st Tuesday of the month between 6AM and 8 AM, with 96 hours +notification time. + +=item * Outage types + +Outages may be caused by: 1). system failure; 2). service provider's +infrastructure failure; 3). customer activity. + +=item * Service level objectives + +These are the guarantees that the sevice provider gives to the customer. +Violation of these guarantees is compensated by penalties defined. + +These may include: Maxium maintenance downtime per specified period; +Maximum downtime period due to failures on the service provider side; +Minimum service availability per specified period. + +=back + + +=head2 Event datasource type + +In order to store the service level information, we need a new datasource +type in RRFW: I<event>. It represents an atomic information +about a single event in time, e.g. it canot be devided into more specific +elements or sub-events. Its attributes are as follows: + +=over 4 + +=item * Event group name + +Several events belong to one and only one group. Event group is a unique +entity that describes the service. + +=item * Event name + +Unique name within the event group. Describes the type of the event, such as +C<maintenance>, C<downtime>. Events with the same names cannot overlap in +time. + +=item * Start time + +Timestamp of the event start. + +=item * Duration + +Positive integer that specifies the length of the event in seconds. +Zero duration means that the event has not yet finished. + +=item * Parameters + +Event-specific I<(name, value)> pairs. + +=back + +Events are uniquely identified by I<(Event group, Event name, Start time)> +triple. + + +=head2 Event summary reports + +Renderer should be able to display the events at different summary levels +and in different combinations. Event reports should be specified by +expressions, as follows: + +=over 4 + +=item * Boolean operators + +C<downtime AND NOT maintenance>. + +=item * Time period + +C<(downtime AND NOT maintenance)[-2DAYS,NOW]> + +C<(downtime[-2DAYS,NOW] AND NOT maintenance AND +NOT downtime[200309151200,200309151300])> + +=item * Arithmetic operations + +Sum of durations, substract of durations... + +=back + +=head2 Events generation + +Events may be generated by the following sources: + +=over 4 + +=item * Collector + +SNMP collector may create events on some faulty conditions, like host +unreachable, or on SNMP variables change, like interface status. +Also it's possible to create an ICMP Echo collector type, +which would generate events based on pinging the hosts. + +=item * Monitor + +Obviously, a new monitor action will be to create events. + +=item * Human operator + +First from commandline interface, and later from thr Web interface, +the human operators may create the scheduled events, like maintenance +outages. Security policy should protect certain types of events +from human intervention. + +=back + + + + +=head1 Author + +Copyright (c) 2003 Stanislav Sinyagin E<lt>ssinyagin@yahoo.comE<gt> |