1 # Copyright (C) 2004 Stanislav Sinyagin
2 # Copyright (C) 2004 Christian Schnidrig
4 # This program is free software; you can redistribute it and/or modify
5 # it under the terms of the GNU General Public License as published by
6 # the Free Software Foundation; either version 2 of the License, or
7 # (at your option) any later version.
9 # This program is distributed in the hope that it will be useful,
10 # but WITHOUT ANY WARRANTY; without even the implied warranty of
11 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12 # GNU General Public License for more details.
14 # You should have received a copy of the GNU General Public License
15 # along with this program; if not, write to the Free Software
16 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
18 # $Id: scalability.pod.in,v 1.1 2010-12-27 00:04:32 ivan Exp $
19 # Stanislav Sinyagin <ssinyagin@yahoo.com>
23 =head1 Torrus Scalability Guide
27 Installing Torrus in big enterprise or carrier networks requires special
28 planning and design measures, in order to ensure its reliable and efficient
32 =head2 Hardware Platform Recommendations
34 Hardware planning for large Torrus installations is of big importance.
35 It is vital to understand the potential bottlenecks and performance limits
36 before purchasing the hardware.
38 First of all, you need to estimate the number of devices that you are
39 going to monitor, with some room for future growth. It is a good practice
40 first to model the situation on a test server, and then project the
41 results to a bigger number of network devices. The utilities that
42 would help you in assessing the requirements are C<torrus configinfo> and
43 C<torrus schedulerinfo>.
45 The resources for planning are the server CPU, RAM, and disks.
46 While CPU and RAM are of great importance, it is the disk subsystem that
47 often becomes the bottleneck.
51 For large installations, CPU power is one of the critical resources.
53 One of CPU-intensive processes is XML configuration compiler. A configuration
54 for few hundred of nodes may take few dozens of minutes to compile. In some
55 complicated configuration, it may require few hours to recompile the whole
56 datasource tree. Here CPU power means literally your time while testing the
57 configuration changes or troubleshooting a problem.
59 The SNMP collector is quite moderate in CPU usage, still when the number of
60 SNMP variables reaches dozens of thousands, the CPU power becomes
61 an important resource to pay attention to. In addition, the collector
62 process initialization time can be quite CPU-intensive. This happens every
63 time the collector process starts, or when the configuration has been
66 The empiric estimation made by Christian Schnidrig is that one SNMP counter
67 collection every 5 minutes occupies approximately 1.0e-5 of the
68 Intel Xeon 2.8GHz time, including the OS overhead. For example,
69 the Torrus collectors running on 60'000 counters would make the server
70 busy at the average of 60%.
75 The collector would need RAM space to store all the counters information,
76 and of course it's undesirable to swap. In addition, the more RAM you have
77 available for disk cache, the faster your collector may update the data files.
79 Each update of an RRD file consists of a number of operations: open a file,
80 read the header, seek to the needed offset, and then write. With enough disk
81 cache, it is possible that the read operations are made solely from RAM,
82 and that significantly speeds up the collector running cycle.
84 According to Christian Schnidrig's empiric estimations, 30 KB RAM per counter
85 should be enough to hold all the neccessary data, including the disk cache.
86 For example, for 60'000 counters this gives 1'757 MB, thus 2 GB of server RAM
89 In addition, Apache with mod_perl occupies 20-30 MB RAM per process, so
90 few hundred extra megabytes of RAM would be good to have.
95 It is not recommended to use IDE disks. They are not designed for
96 continuous and intensive use. As experienced by Christian Schnidrig,
97 IDE disks don't live long under such load.
99 It is recommended to reduce the number of RRD files by grouping
100 the datasources. This reduces dramatically the number of read and write
101 operations during the update process.
103 As noted by Rodrigo Cunha, reducing the size of read-ahead in the filesystem
104 may lead to significant optimisation of disk cache usage. RRD update process
105 reads only a short header in the beginnin of RRD file, and the rest of
106 readahead data is never reused. On Linux, the following command would
107 set the readahead size to 4 KB, which equals to i386 page size:
109 /sbin/hdparm -a 4 /dev/sda
111 For servers with dozens of thousands RRD files, it is recommended to use
112 hashed data directories. Then the data directories will form a structure of
113 256 directories, with hash function based on hostnames. See I<Torrus SNMP
114 Discovery User Guide> for more details.
116 Spreading the data files over several physical disks is also a good plus.
120 =head2 Operating System Tuning
122 Depending on the number of trees and processes that run on a single server,
123 you might require to increase the maximum number of filehandles that
124 may be opened at the same time, system-wide and per process.
125 See the manuals for your operating system for more details.
128 =head2 Torrus Configuration Recommendatations
130 =head3 BerkeleyDB configuration tuniung
132 When using lots of collectors and/or lots of HTTP processes, it is
133 important to increase the size of BerkeleyDB lock region.
136 db_stat -h @dbhome@ -c
138 would show you the current number of locks and lockers, and their maximum
139 quantities during the database history.
140 The maximum numbers of lock objects and lockers can be tuned by creating the
141 file F<DB_CONFIG> in the database home directory, F<@dbhome@>.
142 The following settings would work fine with about 20 collector processes
143 and 5 HTTP daemon processes:
145 set_lk_max_lockers 6000
146 set_lk_max_locks 3000
148 It is also recommended to increase the cache size from default 256KB to some
149 bigger amount. Especially if the database has to hold large Torrus trees
150 (hundreds or thousands monitored devices). The following line in
151 F<DB_CONFIG> sets the cache size to 16MB:
153 set_cachesize 0 16777216 1
155 After updating F<DB_CONFIG>, stop all Torrus processes,
156 including HTTP server, then run
158 db_recover -h @dbhome@
160 Then start the processes again. Futher info is available at:
164 =item * General access method configuration (BDB Reference)
166 http://tinyurl.com/ybymk7t
168 =item * DB_CONFIG configuration file (BDB Reference)
170 http://tinyurl.com/y9qjodv
172 =item * Configuring locking: sizing the system (BDB Reference)
174 http://tinyurl.com/ya6dtww
176 =item * C API reference
178 http://tinyurl.com/yczgnab
183 =head3 XML compilation time
185 For large datasource trees, XML compilation may take dozens of minutes,
186 if not hours. Other processes are not suspended during the compilation, and
187 they use the previous configuration version.
189 For debugging and testing, it is recommended to create a new tree,
190 separate from large production trees. That would save you a lot of time and
191 would allow you to see the result of changes quickly.
195 =head3 Collector schedule tuning
197 The Torrus collector has a very flexible scheduling mechanism. Each data source
198 has its own pair of scheduler parameters. These parameters are I<period>
199 and I<timeoffset>. Period is usually set to default 300 seconds.
200 The time is divided into even intervals. For the default 5-minutes period,
201 each hour's intervals would start at 00, 05, 10, 15, etc. minutes.
202 The timeoffset determines the moment within each interval when the data source
203 should be collected. The default value for timeoffset is 10 seconds. This
204 means that the collector process would try to collect the values at
205 00:00:10, 00:05:10, ..., 23:55:10 every day.
207 Data sources with the same period and timeoffset values are grouped together.
208 The SNMP collector works asynchronously, and it tries to send as many SNMP
209 packets at the same time as possible. Due to the asynchronous architecture,
210 the collector is able to perform thousands of queries at the same time
211 with very small delay. Within the same collector process, a large number of
212 datasources configured with the same schedule is usually not a problem.
214 If you configured several datasource trees all with the same period and
215 timeoffset values, each collector process would start flooding the SNMP
216 packets to the network at the same time. This may lead to packet loss and
217 collector timeouts. In addition, all collector processes would try to update
218 the RRD files concurrently, and this would cause overall performance
219 degradation. Therefore, it is better to assign different timeoffset values
220 to different trees. This may be achieved by manually specifying the
221 C<collector-timeoffset> parameter in discovery configuration files.
223 In large installations, the collector schedules need thorough planning and
224 tuning to insure maximum performance and minimize load on the network devices'
225 CPUs. The C<torrus schedulerinfo> utility is designed to help you in
227 It shows two types of reports: configuration report gives you the idea
228 of how many datasources are queried at which moments in time. The runtime
229 report gives you realtime statistics of collector schedules, including
230 average and maximum running cycle, and statistics on missed or delayed cycles.
232 There is a feature that eases the load in large installations. With
233 dispersed timeoffsets enabled, the timeoffset for each datasource is
234 evenly assigned to one of allowed values, based on the name of the host,
235 and name of the interface. By default, these values are: 0, 30, 60, ..., 270.
236 With thousands of datasources, this feature smoothens the CPU and disk load
237 on Torrus server, and avoids CPU usage peaks on network devices with big number
238 of SNMP variables per device. It is recommended to analyse the current
239 scheduler statistics before using this feature. If you run several large
240 datasource trees, don't forget to plan and analyse the schedules for the whole
241 system, not just for one tree.
244 =head2 Distributed setup
246 =head3 NFS-based setup
248 The following setup allows you to distribute the load among several
251 Several Torrus (backend) servers which run collectors
252 and store RRD files in the local storage, shared by NFS.
253 The frontend server runs the Web interface, and probably some monitor
254 processes, accessing the data files by NFS.
256 It is possible to organize the directory structure so that each data file
257 would be seen at the same path on every server. Then you can keep identical
258 Torrus configurations on all servers, and launch the collector process only on
259 one of them. XML configuration files may be shared via NFS too.
261 Be aware that BerkeleyDB database home directory cannot be NFS-mounted.
262 See the following link for more details:
263 http://www.sleepycat.com/docs/ref/env/remote.html
265 Backend servers may run near the limits of their system capacities.
266 70-80% CPU usage should not be a problem. For the frontend machine,
267 it is preferred that at least 50% of average CPU time is idle.
272 Copyright (c) 2004-2005 Stanislav Sinyagin E<lt>ssinyagin@yahoo.comE<gt>
274 Copyright (c) 2004 Christian Schnidrig E<lt>christian.schnidrig@bluewin.chE<gt>