# userguide.pod - Torrus user guide # Copyright (C) 2003 Stanislav Sinyagin # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. # $Id: userguide.pod.in,v 1.1 2010-12-27 00:04:32 ivan Exp $ # Stanislav Sinyagin # # =head1 Torrus User Guide =head2 Quick start guide The steps below will explain you how to make the thing running. B. Follow the I document, all prerequisits and necessary steps are described there. B. The executables reside in F<@pkgbindir@/>. You normally don't need to access this directory, because the commandline wrapper, C, is installed in a usual execution path (F<@bindir@>). All site-specific behaviour is controlled by configuration files in F<@siteconfdir@/>. Usually you need to change F only. In this file, you must list your XML configuration sources. The datasource trees configuration is read out of XML files. They are searched in several directories, normally F<@distxmldir@/> and F<@sitexmldir@/>. The first one contains files that come with Torrus distribution, and the second one is for your local site-specific XML files. Global site-specific XML configuration parameters may be defined in F. XML configuration is compiled into internal database representation by C command. The database itself is resided in F<@dbhome@/>, and must be writable by your Apache server (normally the installer takes care of it). It is safe to re-compile the configuration while the Torrus daemons are running. B. Torrus configuration consists of a number of I. Each tree is independent from the others. A tree may run one Collector and one Monitor process. Also the web interface access control lists differentiate the user rights by datasource trees. B. A tree defines the hierarchy of Torrus datasources. The structure of the tree is solely defined by XML configuration files. The tree consists of I, each being either a I or a I. Subtrees contain child subtrees and/or leaves. The leaf represents a datasource: normally this is a numerical value that changes over time. The leaf is the entity that may be presented as a graph. There are leaves of special type: I. They are not numerical values, and are designed for drawing several values in one graph. Each node has I, a string that consists of slashes and node names, and uniquely identifies this node. The path of a subtree always ends with slash, and the root of the tree has the path consisting of a single slash. B. The trees are defined in F. See I for a basic example of tree configuration. B. Currently only one type of data storage is supported: Round-robin database (RRD) files. See I manuals for more details. Each leaf represents a datasource stored in an RRD file. Of course, several leaves may refer to different datasources within the same RRD file. Even more, more than one leaf may refer to the same datasource within an RRD file. RRD files are created and updated either by C, or by some other external programs. B. If you only want to collect SNMP counters from some network devices' interfaces, there's a couple of tools called C and C. The first one creates a basic discovery instructions file, and the second one uses the discovery instructions to explore the SNMP device capabilities and information: interface names, input/output counters, CPU and memory usage, temperature sensors (for Cisco devices), and many other vendor-specific statistics sources. Torrus is much more than just an SNMP collector. So, when you decide to use it in a more advanced way, you will have to read the whole bit of this guide, and also I and probably some other documents too. B. By default, C will put all your devices into one hierarchy: ChostnameE/...>. The subtree name, C, may be changed with a command line option of C. This program may also read the device names (or IP addresses in case if you don't use DNS) from space-delimited text files. torrus genddx \ --hostfile=myrouters.txt \ --domain=example.net \ --community=MySecretSNMPCommunity \ --out=myrouters.ddx \ --discout=myrouters.xml \ --subtree=/My_Routers \ --datadir=/data1/torrus/collector_rrd torrus genddx \ --hostfile=myswitches.txt \ --domain=example.net \ --community=MySecretSNMPCommunity \ --out=myswitches.ddx \ --discout=myswitches.xml \ --subtree=/My_Switches \ --datadir=/data1/torrus/collector_rrd torrus devdiscover --in=myrouters.ddx torrus devdiscover --in=myswitches.ddx In the example above, the routers' and switches' names are read from F and F in the user's current directory. They form a hierarchy with two subtrees: C and C. C creates the discovery instruction XML files into F and F accordingly. By default, you would find them in F<@sitedir@/discovery/>. The result of C is the Torrus configuration files: F and F, placed into F<@sitexmldir@/>. The C will place the RRD files into F. Make sure that this directory exists, has enough free space, and is writable by C user. B the C utility is designed as a one-time helper, so that you create your basic discovery instructions files from scratch. Further on, the discovery files should be maintained separately. Another useful utility is called C. It can be used to generate a DDX file from a template and a list of SNMP hosts. It is very useful if you want to monitor many devices of similar type or function. You can also define a I file in your DDX file. C will create it after all devices would discovered, and it will contain EincludeE statements for all XML files. This makes it practical to use one XML file per SNMP host, and use the bundle file for inclusion in the tree configuration. B. For each tree, F<@siteconfdir@/torrus-siteconfig.pl> lists the XML files that have to be compiled for it. In the example above, you would add F and F into C array in the tree configuration. See I for more details on how C and C interact and how you can customize the discovery process. B: in most cases, your hierarchy division will be different. It might be arranged by geographical locations, or by customer names. There is a configuration statement that allows you to include other XML files into configuration, thus giving you a big flexibility in building the data hierarchies. B. After the XML configuration is prepared, you need to execute the compiler: torrus compile --tree=treename --verbose For most of the processes that you run within Torrus, you need to specify the tree name with C<--tree> option. Some proramms accept C<--all> option, which causes them to process all existing trees. With C<--verbose> option, the compiler tells you about the files being processed, and about some other actions that may take quite a long time. It will also tell you if there's any error in your configuration. B. The search database is updated by executing the following command: torrus bs --global --verbose For users that are allowed to display all the trees, you can enable the global search across all trees: torrus acledit --addgroup=staff --permit=GlobalSearch --for='*' B. Assuming that compilation went smoothly, you may now launch the data collector: torrus collector --tree=treename Without additional options, the collector will fork as a daemon process, and write only error messages in its log file, F<@logdir@/collector.treename.log>. There is a file that is created by C<./configure>, called F. You may place it into a directory where your system looks for startup scripts (F on Solaris and some Linuxes, F on FreeBSD). Probably you need to rename and edit the script before using. Note that it also executes another daemon, C. The C daemon is used for monitoring the thresholds in the data files. For more details, see the I, in the section about monitor definitions. B. By default, user authentication is enabled in the web interface. You can change this by setting C<$Torrus::CGI::authorizeUsers = 0> in your F. In order to get use of user authentication, you need to create I and I accounts. Each user belongs to one or more groups, and each group has access to a set of datasource trees. See I for a basic example. B. Provided that you followed the installation guide to the end, and your HTTP server is running, your Torrus hierarchy must be visible with your favorite web browser. =head2 Configuration guidelines In complete detail, the XML configuration is described in I. The guidelines below will help you to read that document. B. The tree structure is defined by the structure of CsubtreeE> and CleafE> XML elements. The rule is simple: child XML elements of a CsubtreeE> element define the child nodes in the configuration tree. B. Each node has a number of parameters. They are defined by CparamE> XML element. Parameters are inherited: the child node has all its parent's parameters, some of which may be overridden. B. The whole XML configuration is additive. It means that you may define your subtree several times across your XML configuration, and the new parameters and child nodes will be added to previously defined ones. B. Some pieces of configuration may be written as templates, and then re-used in multiple places. The C utility generates one large XML file back from the compiled configuration. Its main purpose is backup of the configuration, but it can also be used for studying the relationships between templates and input files. =head2 Handling SNMP errors During SNMP discovery process, some SNMP devices may not be reachable. By default, C reports the error, and does not write the output XML file containing that device. It also skips writing the bundle files that contain the output file affected. When C is executed with C<--forcebundle> option, the bundle files are written, and the output files related to the unreachable devices are skipped from the bundles. This ensures that we always get a configuration that may compile and run the collector. Another option, C<--fallback=DAYS>, if given together with C<--forcebundle>, tells the discovery engine to reuse old XML files if the related SNMP devices are not reachable and the files are not older than DAYS. If an SNMP device is unreachable by the moment of the collector initialization, the collector reports the error and waits for a period of time specified in C<$Torrus::Collector::SNMP::unreachableRetryDelay>, which is 10 minutes by default. It then tries to reach the device with the specified retry interval during some period of time, defined in C<$Torrus::Collector::SNMP::unreachableTimeout>, by default 6 hours. If the device is not available within the specified timeout, it is excluded from collection. It would be tried again on collector initialization only (at the collector process start or after recompiling the configuration). If a device is not reachable during the normal collector running cycle, it is retried in every collector's cycle (usually every 5 minutes), during the period defined in C<$Torrus::Collector::SNMP::unreachableTimeout>. It will be then excluded from configuration after the timeout. If a device hardware configuration changes after the C execution, the collector may not find some values in SNMP tables, such as interface names in ifTable. It then excludes such datasources from collection immediately. =head2 Tips and tricks =head3 Comments, descriptions, and legends C will extract some useful information from your SNMP devices, and place it in the XML configuration: =over 4 =item * Interface descriptions The value of the SNMP variable C (C<1.3.6.1.2.1.31.1.1.1.18>) will be used as interface comment. In Cisco IOS, this is controlled by C interface configuration command. =item * Location and contact Two other SNMP values: C (C<1.3.6.1.2.1.1.6.0>) and C (C<1.3.6.1.2.1.1.4.0>) will be used in the legend text for each device. In Cisco IOS, their values are controlled by C and C global configuration commands. =back =head3 Grouping the datasources alternatively In most cases, you would want to have several different groupings of your datasources. For instance, the default C gives only one level of freedom: the subtree name above the host level. It's reasonable to use this name for grouping by geographical location . Thus, the hierarchy would be characterised as C. Let's say you would like to have alternative grouping, such as: =over 4 =item * by customer connection: Each customer is identified by name, and you'd like to see statistics for all interfaces connected to a given customer; =item * by service: Your network is designed to provide various services, and you'd like to group your devices or interfaces by service; =item * by customer and location: For each customer, group the connection by geographical location. =back Torrus provides three different ways for organising your datasources: =over 4 =item * Aliases. With CaliasE> statement, you can add symbolic names to your nodes. If the new alias is defined as a reference to non-existing subtree, the new subtrees are created. Alias is only a symbolic link: when you click to the alias name in your browser, Torrus redirects it to the real datasource in its normal subtree. See the example in I. =item * ds-type=rrd-file You can create a leaf in some arbitrary place of your hierarchy that points to an existing RRD file. This RRD file may be updated by other datasource in your hierarchy. The advantage of such approach is that this leaf may have its own I and I parameters, alternative view parameters, etc. Switch name: rtr01; Interface: Fa0/1; In the example above, this leaf is defined somewhere in the hierarchy. It refers to the RRD file updated by Torrus SNMP collector. For more examples, see the template I in F. =item * Tokensets Tokenset is an arbitrary collection of datasource leaves. It is characterised by its name and description. There are two ways to add a leaf to a tokenset: by the parameter I, or by defining a monitor action. A tokenset is normally displayed in compact form: by default, 6-hour graphs are put by two in a row. =back =head3 Amending autogenerated XML files with XUpdate Sometimes there is a need to modify the configuration generated by C. Modifying the generated XML files by hand would not be a good option: it would need some manual work every time you update your hardware setup. A better approach would be to have the tools that would automate such configuration update. One of the possibilities for such automation would be XSLT Ehttp://www.w3.org/TR/xsltE. But it's rather complicated task to use XSLT for slight changes in XML files. A good approach has been made by XUpdate Working Group Ehttp://www.xmldb.org/xupdate/E. Their Working Draft document describes a language for XML editing commands. It allows to perform small updates to an existing XML document, like insertion of elements, updating of existing elements, or deleting. The only drawback is that the specification hasn't been updated since September 2000, and it contains some unclear statements, which make it difficult to implement compatible applications. In addition, there has been not enough efforts to adopt XUpdate as a W3C standard. However, this is the only kind-of-a-standard language for such tasks as XML editing commands. Thanks to Petr Pajas, there is an XUpdate implementaytion in Perl. C module is available at CPAN, and it installs a small commandline utility, C. In addition, Petr has created a set of utilities integrated into a single shell wrapper: Ehttp://xsh.sourceforge.netE. It is very useful for many different things, such as XPath expressions testing, and many others. A typical XUpdate instructions file would looke like follows: This file was modified with XUpdate script setmonitor.xupdate.xml ifErrors This example is part of Torrus distribution, and the file is named F. Your commands to apply these XUpdate instructions would be like torrus devdiscover --in=routers.ddx --out=routers.xml cd @sitexmldir@ xupdate -j @exmpdir@/setmonitor.xupdate.xml \ routers.xml > routers1.xml More XUpdate examples will be included in the future. =head3 Extracting the configuration skeleton Another aproach to amending the autogenerated confguration is as follows. Torrus distribution has a special-purpose XSLT template, F, designed to strip all parameters and template applications from a given XML configuration, and leave the tree structure only. Given that F is some autogenerated configuration, you may run xsltproc @scriptsdir@/xml/extract-skeleton.xsl routers.xml | \ xmllint --format --output routers-skeleton.xml - You can add your changes to the new file, F, and add it to your Torrus configuration. These changes may be performed manually or by means of XUpdate technique described above. =head3 Automating XML generation It is quite common task that you want Torrus to monitor certain set of devices, and C does not (yet) support them. Of course, it's quite a pain to maintain a manually written XML file, especially if the there are more than one devices of the same type. In such case you may benefit from the approach suggested by Christian Schnidrig: Imagine you have 50 I which are able to speak SNMP and which you would like to put into some Torrus tree structure. A good designer's approach would be to keep the data and the presentation separately. In addition, changing the presentation once would produce 50 changes accordingly. To do that, let's create two files: F and F. The first one would contain data about our devices: [% gizmos = [ { name => 'atwork' color => 'blue', location => 'Javastrasse 2, 8604 Hegnau' description => 'My gizmo @ Sun' community => 'blabla', hands => [ {name => 'Left'} {name => 'Right'} ], } { name => 'athome' color => 'gray', location => 'Riedstrasse 120, 8604 Hegnau' description => 'My gizmo @ Home' community => 'blabla', hands => [ {name => 'Upper'} {name => 'Lower'} ], } ] %] Then F would contain the XML template that would produce the Torrus configuration file: [% PROCESS $data %] [% FOREACH g = gizmos %] /ByName/[% g.name %]/ Description: [% g.description %] Location: [% g.location %] [% FOREACH h=$g.hands %] [% END %] See F and F for a more useful example of the described approach. At the end, you will generate the Torrus config with the C utility, which is the standard part of Template-Toolkit package: tpage --define data=gizmos.data gizmos.tmpl > gizmos.xml =head3 Several Torrus instances on one server Sometimes it is necessary to have a separate instance of Torrus for testing purposes on the same server as the production installation. In the example below, a completely autonomous installation of Torrus is installed in F directory on a FreeBSD system. =over 4 =item * Directory structure All files are located in subdirectories of F. No other directories are affected. This ensures that deinstallation would be easy and safe. Four subdirectories are created: =over 8 =item * F This directory contains Apache HTTP daemon configuration and logs. Create 3 subdirectories here: F, F, and F. =item * F This is the installation directory of Torrus. =item * F Directory for configuration files. =item * F Directory for logs, database and PID files. =item * F Collector will store RRD files here. =item * F Distribution files will be stored and unpacked here. =back =item * Installation procedure cd /usr/testtorrus/src gzip -dc torrus-1.0.0.tar.gz | tar xvf - cd torrus-1.0.0 ./configure pkghome=/usr/testtorrus/home \ sitedir=/usr/testtorrus/etc \ logdir=/usr/testtorrus/var/log \ piddir=/usr/testtorrus/var/run \ varprefix=/usr/testtorrus/var \ wrapperdir=/usr/testtorrus make install =item * Devdiscover configuration Use devdiscover as usual. Place your discovery instruction files in F, and make sure that C is set to F. =item * Apache configuration We reuse the same binaries and libraries as the main installation of Apache, but the daemon is launched with our special configuration. We assume that Apache is pre-configured for mod_perl. SSL support is not included in this example, but it's quite straightforward to implement if you need it. Create a copy of F and place it in F. With a text editor, replace the configutration options with the values given below: # Leave server root as it was in the original config. Apache uses # it for modules loading ServerRoot "/usr/local" # make sure that everything that apache writes # goes into our directories PidFile /usr/testtorrus/apache/var/httpd.pid ScoreBoardFile /usr/testtorrus/apache/var/httpd.scoreboard # Optional: limit the memory and CPU impact MinSpareServers 2 MaxSpareServers 5 StartServers 3 MaxClients 10 # We open our HTTP service on TCP port 8123. Choose other # port if this one is occupied Port 8123 # Not really necessary, but you might want to use it someday DocumentRoot "/usr/testtorrus/apache/htdocs" # Find the Directory options for the old htdocs, and # replace the path if you changed DocumentRoot above ... some default stuff here ... # Make sure the logs are written where we expect them to. ErrorLog /usr/testtorrus/apache/var/httpd-error.log CustomLog /usr/testtorrus/apache/var/httpd-access.log combined # TCP port number as above NameVirtualHost *:8123 # Quite standard virtual server configuration. Replace fake # domain names with your real ones. ServerAdmin root@myserver.com DocumentRoot /usr/testtorrus/home/web ServerName torrus.myserver.com CustomLog /usr/testtorrus/apache/var/torrus.myserver.com.log "combined" PerlModule Apache::PerlRun PerlRequire "/usr/testtorrus/home/conf_defaults/webmux.pl" Alias /plain/ "/usr/testtorrus/home/sup/webplain" SetHandler perl-script PerlHandler Torrus::ApacheHandler SetHandler default-handler Options None =item * Apache startup script Save the following script as F: #!/bin/sh case "$1" in start) /usr/local/sbin/httpd -f /usr/testtorrus/apache/etc/httpd.conf && \ echo 'apache started' ;; stop) [ -r /usr/testtorrus/apache/var/httpd.pid ] && \ kill `cat /usr/testtorrus/apache/var/httpd.pid` && \ echo 'apache stopped' ;; *) echo "Usage: `basename $0` {start|stop}" >&2 ;; esac exit 0 =back =head3 Changing the default short graph The default small graph in overviews and tokenset listings shows last 6 hours of data. It might be more convenient for you to graph last 24 hours, or even longer. To do so, you only need to change one parameter, C. You may change it on the top of the datasource tree, or even only for some parts of the tree. In F, there's a view defiition called C. It is exactly the same size as the 6-hours' C view, but it shows 24-hour graph. Somewhere in Torrus configuration, you may have: last24h-small,last24h,lastweek,lastmonth,lastyear The best place for this would be F. =head3 Watching the collector failures There is a script in Torrus distribution in F, which provides a simple way of telling if the collector runs right: it checks the modification time of RRD files, and if any file is older than given threshold, it sends an e-mail warning. Copy the script file to some place in your system and edit it so that it fits your requirements: you might want to change the maximum age parameter (default is 1 hour), the notification e-mail address, and the directory paths where to look for RRD files. Then I it so that it's executable, and add it to I. Depending on your operation requirements, it might run every hour, or few times a day, or even at business hours only. The script writes the number of aged files in the e-mail subject, and lists the file names in the body. In case of relatively large installation, you might want to amend the script, in order to avoid too large email messages. =head3 Viewing external RRD files Some external program may create its own RRD files, and you may want to display and monitor them in Torrus. Also some collector-generated RRDs may become outdated -- for example, after a module is removed from a router, and the interface counters not being updated any more. The easiest way to use such files would be to utilize the C command. It generates the XML configuration file that represents all RRD files found in a given directory. It can also scan the directory recursively. See also few examples in Torrus distribution. There are some templates for use with Smokeping, OpenNMS, and Flowscan. =head2 Torrus usage scenarios =head3 Scenario 1. Netflow Traffic Analyser Cisco routers are capable of exporting the traffic statistics data in I UDP packets. A I or I daemon collects Netflow packets into flow files. I software analyses the flow files and stores the statistics into numerous RRD files. Torrus is used to monitor the thresholds and diplay the graphs in convenient form. =head3 Scenario 2. Backbone Traffic Statistics I or I software is used to provide the list of all devices in the network. Torrus's C buids the XML configuration to monitor the router interfaces, CPU and memory usage, and temperature sensors. Data importing scripts generate configuration for alternative grouping of the datasources: by location; by customer connection; by device type; by service type; etc... =head2 Troubleshooting guidelines =head3 SNMP Error: Received tooBig(1) For some devices, the collector may issue the following error messages: [27-May-2004 10:15:17*] SNMP Error for XX.XX.XX.XX:161:public: Received tooBig(1) error-status at error-index 0 For better performance, SNMP collector sends several SNMP requests in one UDP datagram. The SNMP agent then tries to send the reply to all requests in a single datagram, and this error indicates the failure. In most cases, this is caused by the agent software limitations or bugs. The number of requests per datagram is controlled by the parameter C, and it may be set in the discovery input XML or in Torrus configuration XML. The default value is 40, and setting it to 10 generally works. =head3 Database lock troubleshooting It may happen sometimes, that a process accessing Torrus database terminates incorrectly, and the database becomes blocked. A typical symptom of this is that the command C does not print anything and stays running forever, occupying zero percent of CPU. The nice, and the preferred way to solve the problem is to use C utility from BerkeleyDB package. The brutal way is just to remove the databases and re-compile all the configuration. I The ACL database is not automatically backed up, and you need to take care of its backup before deleting the contents of the database. ## The nice way uses BerkeleyDB db_recover ## (might be located in /usr/local/BerkeleyDB.4.1/bin/) /etc/init.d/apache stop /etc/init.d/torrus stop db_recover -h @dbhome@ torrus compilexml --verbose --all /etc/init.d/torrus start /etc/init.d/apache start ## The brutal way /etc/init.d/apache stop /etc/init.d/torrus stop cd @dbhome@ rm -r * torrus compilexml --verbose /etc/init.d/torrus start /etc/init.d/apache start =head1 Author Copyright (c) 2002-2007 Stanislav Sinyagin Essinyagin@yahoo.comE