torrus/doc/scalability.pod.in

   1 #  Copyright (C) 2004  Stanislav Sinyagin
   2 #  Copyright (C) 2004  Christian Schnidrig
   3 #
   4 #  This program is free software; you can redistribute it and/or modify
   5 #  it under the terms of the GNU General Public License as published by
   6 #  the Free Software Foundation; either version 2 of the License, or
   7 #  (at your option) any later version.
   8 #
   9 #  This program is distributed in the hope that it will be useful,
  10 #  but WITHOUT ANY WARRANTY; without even the implied warranty of
  11 #  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  12 #  GNU General Public License for more details.
  13 #
  14 #  You should have received a copy of the GNU General Public License
  15 #  along with this program; if not, write to the Free Software
  16 #  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
  17
  18 # $Id: scalability.pod.in,v 1.1 2010-12-27 00:04:32 ivan Exp $
  19 # Stanislav Sinyagin <ssinyagin@yahoo.com>
  20 #
  21 #
  22
  23 =head1 Torrus Scalability Guide
  24
  25 =head2 Introduction
  26
  27 Installing Torrus in big enterprise or carrier networks requires special
  28 planning and design measures, in order to ensure its reliable and efficient
  29 function.
  30
  31
  32 =head2 Hardware Platform Recommendations
  33
  34 Hardware planning for large Torrus installations is of big importance.
  35 It is vital to understand the potential bottlenecks and performance limits
  36 before purchasing the hardware.
  37
  38 First of all, you need to estimate the number of devices that you are
  39 going to monitor, with some room for future growth. It is a good practice
  40 first to model the situation on a test server, and then project the
  41 results to a bigger number of network devices. The utilities that
  42 would help you in assessing the requirements are C<torrus configinfo> and
  43 C<torrus schedulerinfo>.
  44
  45 The resources for planning are the server CPU, RAM, and disks.
  46 While CPU and RAM are of great importance, it is the disk subsystem that
  47 often becomes the bottleneck.
  48
  49 =head3 CPU
  50
  51 For large installations, CPU power is one of the critical resources.
  52
  53 One of CPU-intensive processes is XML configuration compiler. A configuration
  54 for few hundred of nodes may take few dozens of minutes to compile. In some
  55 complicated configuration, it may require few hours to recompile the whole
  56 datasource tree. Here CPU power means literally your time while testing the
  57 configuration changes or troubleshooting a problem.
  58
  59 The SNMP collector is quite moderate in CPU usage, still when the number of
  60 SNMP variables reaches dozens of thousands, the CPU power becomes
  61 an important resource to pay attention to. In addition, the collector
  62 process initialization time can be quite CPU-intensive. This happens every
  63 time the collector process starts, or when the configuration has been
  64 recompiled.
  65
  66 The empiric estimation made by Christian Schnidrig is that one SNMP counter
  67 collection every 5 minutes occupies approximately 1.0e-5 of the
  68 Intel Xeon 2.8GHz time, including the OS overhead. For example,
  69 the Torrus collectors running on 60'000 counters would make the server
  70 busy at the average of 60%.
  71
  72
  73 =head3 Memory
  74
  75 The collector would need RAM space to store all the counters information,
  76 and of course it's undesirable to swap. In addition, the more RAM you have
  77 available for disk cache, the faster your collector may update the data files.
  78
  79 Each update of an RRD file consists of a number of operations: open a file,
  80 read the header, seek to the needed offset, and then write. With enough disk
  81 cache, it is possible that the read operations are made solely from RAM,
  82 and that significantly speeds up the collector running cycle.
  83
  84 According to Christian Schnidrig's empiric estimations, 30 KB RAM per counter
  85 should be enough to hold all the neccessary data, including the disk cache.
  86 For example, for 60'000 counters this gives 1'757 MB, thus 2 GB of server RAM
  87 should be enough.
  88
  89 In addition, Apache with mod_perl occupies 20-30 MB RAM per process, so
  90 few hundred extra megabytes of RAM would be good to have.
  91
  92
  93 =head3 Disk storage
  94
  95 It is not recommended to use IDE disks. They are not designed for
  96 continuous and intensive use. As experienced by Christian Schnidrig,
  97 IDE disks don't live long under such load.
  98
  99 It is recommended to reduce the number of RRD files by grouping
 100 the datasources. This reduces dramatically the number of read and write
 101 operations during the update process.
 102
 103 As noted by Rodrigo Cunha, reducing the size of read-ahead in the filesystem
 104 may lead to significant optimisation of disk cache usage. RRD update process
 105 reads only a short header in the beginnin of RRD file, and the rest of
 106 readahead data is never reused. On Linux, the following command would
 107 set the readahead size to 4 KB, which equals to i386 page size:
 108
 109  /sbin/hdparm -a 4 /dev/sda
 110
 111 For servers with dozens of thousands RRD files, it is recommended to use
 112 hashed data directories. Then the data directories will form a structure of
 113 256 directories, with hash function based on hostnames. See I<Torrus SNMP
 114 Discovery User Guide> for more details.
 115
 116 Spreading the data files over several physical disks is also a good plus.
 117
 118
 119
 120 =head2 Operating System Tuning
 121
 122 Depending on the number of trees and processes that run on a single server,
 123 you might require to increase the maximum number of filehandles that
 124 may be opened at the same time, system-wide and per process.
 125 See the manuals for your operating system  for more details.
 126
 127
 128 =head2 Torrus Configuration Recommendatations
 129
 130 =head3 BerkeleyDB configuration tuniung
 131
 132 When using lots of collectors and/or lots of HTTP processes, it is
 133 important to increase the size of BerkeleyDB lock region.
 134 The command
 135
 136   db_stat -h @dbhome@ -c
 137
 138 would show you the current number of locks and lockers, and their maximum
 139 quantities during the database history.
 140 The maximum numbers of lock objects and lockers can be tuned by creating the
 141 file F<DB_CONFIG> in the database home directory, F<@dbhome@>.
 142 The following settings would work fine with about 20 collector processes
 143 and 5 HTTP daemon processes:
 144
 145    set_lk_max_lockers   6000
 146    set_lk_max_locks     3000
 147
 148 It is also recommended to increase the cache size from default 256KB to some
 149 bigger amount. Especially if the database has to hold large Torrus trees
 150 (hundreds or thousands monitored devices). The following line in
 151 F<DB_CONFIG> sets the cache size to 16MB:
 152
 153    set_cachesize        0 16777216 1
 154
 155 After updating F<DB_CONFIG>, stop all Torrus processes,
 156 including HTTP server, then run
 157
 158   db_recover -h @dbhome@
 159
 160 Then start the processes again. Futher info is available at:
 161
 162 =over 4
 163
 164 =item * General access method configuration (BDB Reference)
 165
 166 http://tinyurl.com/ybymk7t
 167
 168 =item * DB_CONFIG configuration file (BDB Reference)
 169
 170 http://tinyurl.com/y9qjodv
 171
 172 =item * Configuring locking: sizing the system (BDB Reference)
 173
 174 http://tinyurl.com/ya6dtww
 175
 176 =item * C API reference
 177
 178 http://tinyurl.com/yczgnab
 179
 180 =back
 181
 182
 183 =head3 XML compilation time
 184
 185 For large datasource trees, XML compilation may take dozens of minutes,
 186 if not hours. Other processes are not suspended during the compilation, and
 187 they use the previous configuration version.
 188
 189 For debugging and testing, it is recommended to create a new tree,
 190 separate from large production trees. That would save you a lot of time and
 191 would allow you to see the result of changes quickly.
 192
 193
 194
 195 =head3 Collector schedule tuning
 196
 197 The Torrus collector has a very flexible scheduling mechanism. Each data source
 198 has its own pair of scheduler parameters. These parameters are I<period>
 199 and I<timeoffset>. Period is usually set to default 300 seconds.
 200 The time is divided into even intervals. For the default 5-minutes period,
 201 each hour's intervals would start at 00, 05, 10, 15, etc. minutes.
 202 The timeoffset determines the moment within each interval when the data source
 203 should be collected. The default value for timeoffset is 10 seconds. This
 204 means that the collector process would try to collect the values at
 205 00:00:10, 00:05:10, ..., 23:55:10 every day.
 206
 207 Data sources with the same period and timeoffset values are grouped together.
 208 The SNMP collector works asynchronously, and it tries to send as many SNMP
 209 packets at the same time as possible. Due to the asynchronous architecture,
 210 the collector is able to perform thousands of queries at the same time
 211 with very small delay. Within the same collector process, a large number of
 212 datasources configured with the same schedule is usually not a problem.
 213
 214 If you configured  several datasource trees all with the same period and
 215 timeoffset values, each collector process would start flooding the SNMP
 216 packets to the network at the same time. This may lead to packet loss and
 217 collector timeouts. In addition, all collector processes would try to update
 218 the RRD files concurrently, and this would cause overall performance
 219 degradation. Therefore, it is better to assign different timeoffset values
 220 to different trees. This may be achieved by manually specifying the
 221 C<collector-timeoffset> parameter in discovery configuration files.
 222
 223 In large installations, the collector schedules need thorough planning and
 224 tuning to insure maximum performance and minimize load on the network devices'
 225 CPUs. The C<torrus schedulerinfo> utility is designed to help you in
 226 this planning.
 227 It shows two types of reports: configuration report gives you the idea
 228 of how many datasources are queried at which moments in time. The runtime
 229 report gives you realtime statistics of collector schedules, including
 230 average and maximum running cycle, and statistics on missed or delayed cycles.
 231
 232 There is a feature that eases the load in large installations. With
 233 dispersed timeoffsets enabled, the timeoffset for each datasource is
 234 evenly assigned to one of allowed values, based on the name of the host,
 235 and name of the interface. By default, these values are: 0, 30, 60, ..., 270.
 236 With thousands of datasources, this feature smoothens the CPU and disk load
 237 on Torrus server, and avoids CPU usage peaks on network devices with big number
 238 of SNMP variables per device. It is recommended to analyse the current
 239 scheduler statistics before using this feature. If you run several large
 240 datasource trees, don't forget to plan and analyse the schedules for the whole
 241 system, not just for one tree.
 242
 243
 244 =head2 Distributed setup
 245
 246 =head3 NFS-based setup
 247
 248 The following setup allows you to distribute the load among several
 249 physical servers.
 250
 251 Several Torrus (backend) servers which run collectors
 252 and store RRD files in the local storage, shared by NFS.
 253 The frontend server runs the Web interface, and probably some monitor
 254 processes, accessing the data files by NFS.
 255
 256 It is possible to organize the directory structure so that each data file
 257 would be seen at the same path on every server. Then you can keep identical
 258 Torrus configurations on all servers, and launch the collector process only on
 259 one of them. XML configuration files may be shared via NFS too.
 260
 261 Be aware that BerkeleyDB database home directory cannot be NFS-mounted.
 262 See the following link for more details:
 263 http://www.sleepycat.com/docs/ref/env/remote.html
 264
 265 Backend servers may run near the limits of their system capacities.
 266 70-80% CPU usage should not be a problem. For the frontend machine,
 267 it is preferred that at least 50% of average CPU time is idle.
 268
 269
 270 =head1 Authors
 271
 272 Copyright (c) 2004-2005 Stanislav Sinyagin E<lt>ssinyagin@yahoo.comE<gt>
 273
 274 Copyright (c) 2004 Christian Schnidrig E<lt>christian.schnidrig@bluewin.chE<gt>