torrus/doc/devdoc/wd.distributed.pod

   1 #  Copyright (C) 2002  Stanislav Sinyagin
   2 #
   3 #  This program is free software; you can redistribute it and/or modify
   4 #  it under the terms of the GNU General Public License as published by
   5 #  the Free Software Foundation; either version 2 of the License, or
   6 #  (at your option) any later version.
   7 #
   8 #  This program is distributed in the hope that it will be useful,
   9 #  but WITHOUT ANY WARRANTY; without even the implied warranty of
  10 #  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  11 #  GNU General Public License for more details.
  12 #
  13 #  You should have received a copy of the GNU General Public License
  14 #  along with this program; if not, write to the Free Software
  15 #  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
  16
  17 # $Id: wd.distributed.pod,v 1.1 2010-12-27 00:04:36 ivan Exp $
  18 # Stanislav Sinyagin <ssinyagin@yahoo.com>
  19 #
  20 #
  21
  22 =head1 RRFW Working Draft: Distributed collector architecture
  23
  24 Status: pending implementation.
  25 Date: May 26, 2004. Last revised: June 14, 2004
  26
  27 =head2 Introduction
  28
  29 In large installations, one server has often not enough capacity
  30 to collect the data from all the data sources. In other cases,
  31 because of the network bandwidth or security restrictions it is
  32 preferrable to collect (SNMP) data locally on the site, and transfer
  33 the updates to the central location less frequently.
  34
  35 =head2 Terminology
  36
  37 We call I<Hub> servers those which run the user web interfaces and
  38 optionally threshold monitors. These are normally placed in the central
  39 location or NOC datacenter.
  40
  41 I<Spoke> servers are those running SNMP or other data collectors.
  42 They periodically transfer the data to Hub servers. One Spoke
  43 server may send copies of data to several Hub servers, and one
  44 Hub server may receive data from many Spoke servers.
  45
  46 In general, the property of being a Hub or a Spoke is local to a pair
  47 of servers and their datasource trees, and it only describes the functions
  48 of data collection and transfer. In complex installations, the same
  49 instance of RRFW may function as a Hub for some remote Spokes, and as a
  50 Spoke for some other Hubs simultaneousely.
  51
  52 We call I<Association> a set of attributes that describe a single connection
  53 between Hub and Spoke servers. These attributes are:
  54
  55 =over 4
  56
  57 =item * Association ID
  58
  59 Unique symbolic name across the whole range of interconnected servers.
  60
  61 =item * Hub server ID, Spoke server ID
  62
  63 Names of the servers, usually hostnames.
  64
  65 =item * Transport type
  66
  67 One of SSH, RSH, HTTP, etc.
  68
  69 =item * Transport mode
  70
  71 PUSH or PULL
  72
  73 =item * Transport parameters
  74
  75 Parameters needed for this transport connection, like login name, password,
  76 URL, etc.
  77
  78 =item * Compression type and level
  79
  80 Optional, gzip or bzip2 or something else, with compression levels from 1 to 9.
  81
  82 =item * Tree name on Hub server
  83
  84 Target datasource tree that will receive data from Spokes
  85
  86 =item * Subtree path on Hub server
  87
  88 The data updates from this association will be placed in a subtree
  89 under the specified path.
  90
  91 =item * Tree name on Spoke server
  92
  93 The tree where a collector runs and stores data into this association.
  94
  95 =item * Path translation rules
  96
  97 Datasource paths from Spoke server may be changed to look different
  98 in the tree of Hub server.
  99
 100 =back
 101
 102
 103 =head2 Transport
 104
 105 The modular architecture design should allow different types of data
 106 transfer. The default transport is Secure Shell version 2 (SSH). Other
 107 possible transports may be RSH, HTTP/HTTPS, rsync.
 108
 109 Two transport modes should be implemented: PUSH and PULL.
 110 In PUSH mode, Spoke servers initiate the data transfer and push the data to
 111 Hub servers. In PULL mode, Hub servers initiate the data
 112 transfer and ask Spokes for data updates. It should be possible
 113 to mix the transport modes for different Associations on the same
 114 server, but within each Association the mode should be strictly
 115 determined. The choice of transport mode should be based on local security
 116 policies, and server and network performance.
 117
 118 Optionally the compression method and level can be configured. Although
 119 SSH protocol supports its own compression, more aggressive compression
 120 methods may be used for the sake of better bandwidth usage.
 121
 122 Transport agents should notify the operator in cases of delivery failures.
 123
 124 =head2 Operation
 125
 126 For Spoke servers, distributed data transfer will be implemented as
 127 additional storage type. For Hub servers, this will be a new collector
 128 type.
 129
 130 Each data transfer is a concatenation of I<messages>. Messages
 131 may be of one of two types: I<CONFIG> and I<DATA>. Spoke server generates
 132 the messages and stores them for the transfer. Messages are delivered
 133 to Hub servers with a certain delay, but they are guaranteed to
 134 arrive in sequential order. For each pair of servers, messages are
 135 consecutively numbered. These numbers are used for failure detection.
 136
 137 A Spoke server keeps track of its configuration, and after each
 138 configuration change, it sends a CONFIG message. This message contains
 139 information about mapping between Spoke server tokens and datasource paths,
 140 and a limited set of parameters for displaying and monitoring the data.
 141
 142 After each collector cycle, Spoke server sends DATA messages.
 143 These messages contain the following information: timestamp of the
 144 update, token, and value. The format of the message should be designed
 145 to consume minimum bandwidth.
 146
 147 Hub server picks up the messages delivered by the transport agents.
 148 Upon receiving a CONFIG message, it sets a preconfigured delay, in order
 149 to collect as many as possible CONFIG messages. Then the data transfer agent
 150 generates a new XML configuration based on the messages, and starts
 151 the compilation of configuration. The DATA messages are queued for the
 152 collector to pick up and and store the values. It must be ensured that
 153 all DATA messages queued for the old configuration are processed before
 154 the compilation starts.
 155
 156 In case of fatal failure and loss of data, Hub server ignores all DATA
 157 messages until it gets a new CONFIG message. A periodic configuration update
 158 schedule should be defined. If no configuration changes occur within a
 159 certain period of time, Spoke server periodically sends the CONFIG messages
 160 with the same timestamp.
 161
 162
 163 =head2 Message format
 164
 165 Message is a text in email-like format: it starts with a header, followed by
 166 an empty line and the body. Single dot (.) in a line specifies the end of
 167 the message. Blocks within a CONFIG message are separated with semicolon (;),
 168 each block representing a single datasource leaf.
 169
 170 Example:
 171
 172  MsgID:100001
 173  Type:CONFIG
 174  Timestamp:1085528682
 175
 176  level2-token:T0005
 177  level2-path:/Routers/RTR1/Interface_Counters/Ethernet0/InOctets
 178  vertical-label:bps
 179  ....
 180  ;
 181  level2-token:T0006
 182  level2-path:/Routers/RTR1/Interface_Counters/Ethernet0/OutOctets
 183  vertical-label:bps
 184  .
 185  MsgID:100002
 186  Type:DATA
 187  Timestamp:1085528690
 188
 189  T0005:12345678
 190  T0006:987654321
 191  .
 192
 193
 194
 195
 196 =head1 Author
 197
 198 Copyright (c) 2004 Stanislav Sinyagin E<lt>ssinyagin@yahoo.comE<gt>