X-Git-Url: http://git.freeside.biz/gitweb/?p=freeside.git;a=blobdiff_plain;f=torrus%2Fdoc%2Fscalability.pod.in;fp=torrus%2Fdoc%2Fscalability.pod.in;h=6bdd51f444f4ed98770e634d22ac3cf7d2d37aae;hp=0000000000000000000000000000000000000000;hb=74e058c8a010ef6feb539248a550d0bb169c1e94;hpb=35359a73152b3d7a9ad5e3d37faf81f6fedb76e8 diff --git a/torrus/doc/scalability.pod.in b/torrus/doc/scalability.pod.in new file mode 100644 index 000000000..6bdd51f44 --- /dev/null +++ b/torrus/doc/scalability.pod.in @@ -0,0 +1,274 @@ +# Copyright (C) 2004 Stanislav Sinyagin +# Copyright (C) 2004 Christian Schnidrig +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. + +# $Id: scalability.pod.in,v 1.1 2010-12-27 00:04:32 ivan Exp $ +# Stanislav Sinyagin +# +# + +=head1 Torrus Scalability Guide + +=head2 Introduction + +Installing Torrus in big enterprise or carrier networks requires special +planning and design measures, in order to ensure its reliable and efficient +function. + + +=head2 Hardware Platform Recommendations + +Hardware planning for large Torrus installations is of big importance. +It is vital to understand the potential bottlenecks and performance limits +before purchasing the hardware. + +First of all, you need to estimate the number of devices that you are +going to monitor, with some room for future growth. It is a good practice +first to model the situation on a test server, and then project the +results to a bigger number of network devices. The utilities that +would help you in assessing the requirements are C and +C. + +The resources for planning are the server CPU, RAM, and disks. +While CPU and RAM are of great importance, it is the disk subsystem that +often becomes the bottleneck. + +=head3 CPU + +For large installations, CPU power is one of the critical resources. + +One of CPU-intensive processes is XML configuration compiler. A configuration +for few hundred of nodes may take few dozens of minutes to compile. In some +complicated configuration, it may require few hours to recompile the whole +datasource tree. Here CPU power means literally your time while testing the +configuration changes or troubleshooting a problem. + +The SNMP collector is quite moderate in CPU usage, still when the number of +SNMP variables reaches dozens of thousands, the CPU power becomes +an important resource to pay attention to. In addition, the collector +process initialization time can be quite CPU-intensive. This happens every +time the collector process starts, or when the configuration has been +recompiled. + +The empiric estimation made by Christian Schnidrig is that one SNMP counter +collection every 5 minutes occupies approximately 1.0e-5 of the +Intel Xeon 2.8GHz time, including the OS overhead. For example, +the Torrus collectors running on 60'000 counters would make the server +busy at the average of 60%. + + +=head3 Memory + +The collector would need RAM space to store all the counters information, +and of course it's undesirable to swap. In addition, the more RAM you have +available for disk cache, the faster your collector may update the data files. + +Each update of an RRD file consists of a number of operations: open a file, +read the header, seek to the needed offset, and then write. With enough disk +cache, it is possible that the read operations are made solely from RAM, +and that significantly speeds up the collector running cycle. + +According to Christian Schnidrig's empiric estimations, 30 KB RAM per counter +should be enough to hold all the neccessary data, including the disk cache. +For example, for 60'000 counters this gives 1'757 MB, thus 2 GB of server RAM +should be enough. + +In addition, Apache with mod_perl occupies 20-30 MB RAM per process, so +few hundred extra megabytes of RAM would be good to have. + + +=head3 Disk storage + +It is not recommended to use IDE disks. They are not designed for +continuous and intensive use. As experienced by Christian Schnidrig, +IDE disks don't live long under such load. + +It is recommended to reduce the number of RRD files by grouping +the datasources. This reduces dramatically the number of read and write +operations during the update process. + +As noted by Rodrigo Cunha, reducing the size of read-ahead in the filesystem +may lead to significant optimisation of disk cache usage. RRD update process +reads only a short header in the beginnin of RRD file, and the rest of +readahead data is never reused. On Linux, the following command would +set the readahead size to 4 KB, which equals to i386 page size: + + /sbin/hdparm -a 4 /dev/sda + +For servers with dozens of thousands RRD files, it is recommended to use +hashed data directories. Then the data directories will form a structure of +256 directories, with hash function based on hostnames. See I for more details. + +Spreading the data files over several physical disks is also a good plus. + + + +=head2 Operating System Tuning + +Depending on the number of trees and processes that run on a single server, +you might require to increase the maximum number of filehandles that +may be opened at the same time, system-wide and per process. +See the manuals for your operating system for more details. + + +=head2 Torrus Configuration Recommendatations + +=head3 BerkeleyDB configuration tuniung + +When using lots of collectors and/or lots of HTTP processes, it is +important to increase the size of BerkeleyDB lock region. +The command + + db_stat -h @dbhome@ -c + +would show you the current number of locks and lockers, and their maximum +quantities during the database history. +The maximum numbers of lock objects and lockers can be tuned by creating the +file F in the database home directory, F<@dbhome@>. +The following settings would work fine with about 20 collector processes +and 5 HTTP daemon processes: + + set_lk_max_lockers 6000 + set_lk_max_locks 3000 + +It is also recommended to increase the cache size from default 256KB to some +bigger amount. Especially if the database has to hold large Torrus trees +(hundreds or thousands monitored devices). The following line in +F sets the cache size to 16MB: + + set_cachesize 0 16777216 1 + +After updating F, stop all Torrus processes, +including HTTP server, then run + + db_recover -h @dbhome@ + +Then start the processes again. Futher info is available at: + +=over 4 + +=item * General access method configuration (BDB Reference) + +http://tinyurl.com/ybymk7t + +=item * DB_CONFIG configuration file (BDB Reference) + +http://tinyurl.com/y9qjodv + +=item * Configuring locking: sizing the system (BDB Reference) + +http://tinyurl.com/ya6dtww + +=item * C API reference + +http://tinyurl.com/yczgnab + +=back + + +=head3 XML compilation time + +For large datasource trees, XML compilation may take dozens of minutes, +if not hours. Other processes are not suspended during the compilation, and +they use the previous configuration version. + +For debugging and testing, it is recommended to create a new tree, +separate from large production trees. That would save you a lot of time and +would allow you to see the result of changes quickly. + + + +=head3 Collector schedule tuning + +The Torrus collector has a very flexible scheduling mechanism. Each data source +has its own pair of scheduler parameters. These parameters are I +and I. Period is usually set to default 300 seconds. +The time is divided into even intervals. For the default 5-minutes period, +each hour's intervals would start at 00, 05, 10, 15, etc. minutes. +The timeoffset determines the moment within each interval when the data source +should be collected. The default value for timeoffset is 10 seconds. This +means that the collector process would try to collect the values at +00:00:10, 00:05:10, ..., 23:55:10 every day. + +Data sources with the same period and timeoffset values are grouped together. +The SNMP collector works asynchronously, and it tries to send as many SNMP +packets at the same time as possible. Due to the asynchronous architecture, +the collector is able to perform thousands of queries at the same time +with very small delay. Within the same collector process, a large number of +datasources configured with the same schedule is usually not a problem. + +If you configured several datasource trees all with the same period and +timeoffset values, each collector process would start flooding the SNMP +packets to the network at the same time. This may lead to packet loss and +collector timeouts. In addition, all collector processes would try to update +the RRD files concurrently, and this would cause overall performance +degradation. Therefore, it is better to assign different timeoffset values +to different trees. This may be achieved by manually specifying the +C parameter in discovery configuration files. + +In large installations, the collector schedules need thorough planning and +tuning to insure maximum performance and minimize load on the network devices' +CPUs. The C utility is designed to help you in +this planning. +It shows two types of reports: configuration report gives you the idea +of how many datasources are queried at which moments in time. The runtime +report gives you realtime statistics of collector schedules, including +average and maximum running cycle, and statistics on missed or delayed cycles. + +There is a feature that eases the load in large installations. With +dispersed timeoffsets enabled, the timeoffset for each datasource is +evenly assigned to one of allowed values, based on the name of the host, +and name of the interface. By default, these values are: 0, 30, 60, ..., 270. +With thousands of datasources, this feature smoothens the CPU and disk load +on Torrus server, and avoids CPU usage peaks on network devices with big number +of SNMP variables per device. It is recommended to analyse the current +scheduler statistics before using this feature. If you run several large +datasource trees, don't forget to plan and analyse the schedules for the whole +system, not just for one tree. + + +=head2 Distributed setup + +=head3 NFS-based setup + +The following setup allows you to distribute the load among several +physical servers. + +Several Torrus (backend) servers which run collectors +and store RRD files in the local storage, shared by NFS. +The frontend server runs the Web interface, and probably some monitor +processes, accessing the data files by NFS. + +It is possible to organize the directory structure so that each data file +would be seen at the same path on every server. Then you can keep identical +Torrus configurations on all servers, and launch the collector process only on +one of them. XML configuration files may be shared via NFS too. + +Be aware that BerkeleyDB database home directory cannot be NFS-mounted. +See the following link for more details: +http://www.sleepycat.com/docs/ref/env/remote.html + +Backend servers may run near the limits of their system capacities. +70-80% CPU usage should not be a problem. For the frontend machine, +it is preferred that at least 50% of average CPU time is idle. + + +=head1 Authors + +Copyright (c) 2004-2005 Stanislav Sinyagin Essinyagin@yahoo.comE + +Copyright (c) 2004 Christian Schnidrig Echristian.schnidrig@bluewin.chE