summaryrefslogtreecommitdiff
path: root/torrus/doc/devdoc/wd.uptime-mon.pod
blob: 8bc1c423e0ee224aa6d49066ae54f097402032e7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
#  Copyright (C) 2002  Stanislav Sinyagin
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#  along with this program; if not, write to the Free Software
#  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

# $Id: wd.uptime-mon.pod,v 1.1 2010-12-27 00:04:36 ivan Exp $
# Stanislav Sinyagin <ssinyagin@yahoo.com>
#
#

=head1 RRFW Working Draft: Service uptime monitoring and reporting

Status: in pre-design phase.
Date: Sep 26 2003; Last revised:

=head2 Definitions

It is often required to monitor the service level in networks.
Service level is normally covered by Service Level Agreement (SLA),
which defines the following parameters:

=over 4

=item * Service definition

Describes the particular service in terms of functionality and means of
monitoring. Examples are: IP VPN connectivity, WAN uplink, SQL database engine.

=item * Maintenance window

Describes the periodic time intervals when service outage is possible
due to some maintenance work. It may be unconditional (outage is always
possible within the window), or conditional (customer confirmation required
for outage within the window). Notification period is normally defined
for maintenance outages.
Example: every 1st Tuesday of the month between 6AM and 8 AM, with 96 hours
notification time.

=item * Outage types

Outages may be caused by: 1). system failure; 2). service provider's
infrastructure failure; 3). customer activity.

=item * Service level objectives

These are the guarantees that the sevice provider gives to the customer.
Violation of these guarantees is compensated by penalties defined.

These may include: Maxium maintenance downtime per specified period;
Maximum downtime period due to failures on the service provider side;
Minimum service availability per specified period.

=back


=head2 Event datasource type

In order to store the service level information, we need a new datasource
type in RRFW: I<event>. It represents an atomic information
about a single event in time, e.g. it canot be devided into more specific
elements or sub-events. Its attributes are as follows:

=over 4

=item * Event group name

Several events belong to one and only one group. Event group is a unique
entity that describes the service.

=item * Event name

Unique name within the event group. Describes the type of the event, such as
C<maintenance>, C<downtime>. Events with the same names cannot overlap in
time.

=item * Start time

Timestamp of the event start.

=item * Duration

Positive integer that specifies the length of the event in seconds.
Zero duration means that the event has not yet finished.

=item * Parameters

Event-specific I<(name, value)> pairs.

=back

Events are uniquely identified by I<(Event group, Event name, Start time)>
triple.


=head2 Event summary reports

Renderer should be able to display the events at different summary levels
and in different combinations. Event reports should be specified by
expressions, as follows:

=over 4

=item * Boolean operators

C<downtime AND NOT maintenance>.

=item * Time period

C<(downtime AND NOT maintenance)[-2DAYS,NOW]>

C<(downtime[-2DAYS,NOW] AND NOT maintenance AND
NOT downtime[200309151200,200309151300])>

=item * Arithmetic operations

Sum of durations, substract of durations...

=back

=head2 Events generation

Events may be generated by the following sources:

=over 4

=item * Collector

SNMP collector may create events on some faulty conditions, like host
unreachable, or on SNMP variables change, like interface status.
Also it's possible to create an ICMP Echo collector type,
which would generate events based on pinging the hosts.

=item * Monitor

Obviously, a new monitor action will be to create events.

=item * Human operator

First from commandline interface, and later from thr Web interface,
the human operators may create the scheduled events, like maintenance
outages. Security policy should protect certain types of events
from human intervention.

=back




=head1 Author

Copyright (c) 2003 Stanislav Sinyagin E<lt>ssinyagin@yahoo.comE<gt>