Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Nagios - System and Network Monitoring.pdf
Скачиваний:
314
Добавлен:
15.03.2015
Размер:
6.48 Mб
Скачать

15

Chapter

Distributed Monitoring

Passive service and host checks can be used to create a scenario in which several noncentral Nagios instances send their results to a central server. In general they transfer their results using the Nagios Service Check Acceptor (see Chapter 14); the central Nagios instance receives them through the External Command File interface and continues processing them as passive checks (see Chapter 13).

What is now missing is the mechanism that prepares each test result of a noncentral Nagios instance to be sent with NSCA. For such cases, Nagios provides the “obsessive” commands, OCSP (“Obsessive Compulsive Service Processor”) and OCHP (“Obsessive Compulsive Host Processor”), two commands designed specifically for distributed monitoring. In contrast to event handler (see Appendix B from page 409), which shows changes in status and only passes on check results if the status has changed, these two commands obsessively pass on every test result (Figure15.1).

265

15 Distributed Monitoring

Figure 15.1:

Distributed monitoring with Nagios

15.1 Switching On the OCSP/OCHP Mechanism

In order to use OCSP/OCHP, several steps are necessary. The mechanism is initially switched on (only) on the noncentral Nagios servers in the global configuration file /etc/nagios/nagios.cfg, where a global command for hosts (OCHP) and services (OCSP) is defined. This causes the noncentral Nagios instance to send every result to the central server.

In the service and host definitions you can additionally set whether the corresponding service or host should use the mechanism or not. For the central Nagios server to be able to use the results transferred, each service or host on it must finally be defined once again.

You should only switch on the two parameters obsess_over_services and obsess_ over_hosts in nagios.cfg if you really do want distributed monitoring:

# /etc/nagios/nagios.cfg

...

obsess_over_services=1 ocsp_command=submit_service_check ocsp_timeout=5 obsess_over_hosts=1 ochp_command=submit_host_check ochp_timeout=5

266

15.2 Defining OCSP/OCHP Commands

Every time a new test result arrives on the Nagios server, it calls the command object defined with ocsp_command or ochp_command. This causes an additional load on resources.

The two timeouts prevent Nagios from spending too much time on one command. If processing does not terminate (because the command itself does not receive a timeout and the central Nagios server does not react), then the process table of the noncentral Nagios instance would fill very quickly, and might overflow.

If you want to selectively exclude test results for specific services and hosts from transmission to the central Nagios server, the following parameters are used:

define host{

...

obsess_over_host=0

...

}

define service{

...

obsess_over_service=0

...

}

With a value of 1 the local Nagios instance sends the results of the host or service check to the central server, but with a value of 0, this does not happen. The 1 is the default for both obsess_over_host and obsess_over_service; if results are not to be transferred, then you have to specify the two parameters. This is always recommended if the central location is only responsible for particular things, and the remaining administration is carried out on site.

15.2 Defining OCSP/OCHP Commands

Defining the two commands with which the noncentral instances send their results to the Nagios main server in most cases involves scripts that are based on send_nsca (see also the example on page 254). For services, such a script would look like the following one, in this case called submit_service_check:

#!/bin/bash

# Script submit_service_check

PRINTF="/usr/bin/printf"

CMD="/usr/local/bin/send_nsca"

CFG="/etc/nagios/send_nsca.cfg"

HOST=$1

SRV=$2

267

15 Distributed Monitoring

RESULT=$3

OUTPUT=$4

$PRINTF "%b" "$HOST\t$SRV\t$RESULT\t$OUTPUT" | $CMD -H nagios -c $CFG

When run, the command expects four parameters on the command line in the correct order: the host monitored, the service name, the return value for the plugin opened (0 for OK, 1 for WARNING, etc.), and the one-line info text that is issued by the plugin. To format the data we use the printf function (man printf). The newly formatted string is finally passed on to send_nsca.

The equivalent script for OCHP (stored here in the file submit_host_check) looks something like this:

#!/bin/bash

# Script submit_host_check

PRINTF="/usr/bin/printf"

CMD="/usr/local/bin/send_nsca"

CFG="/etc/nagios/send_nsca.cfg"

HOST=$1

RESULT=$2

OUTPUT=$3

$PRINTF "%b" "$HOST\t$RESULT\t$OUTPUT" | $CMD -H nagios -c $CFG

The only thing missing is the specification of the service description.

It is best to store the two scripts, in conformity with the Nagios documentation, in a subdirectory eventhandlers (which normally needs to be created) in the plugin directory (usually /usr/local/nagios/libexec, but for some distributions this will be /usr/lib/nagios/plugins). You can retrieve this from the definition of the matching command object using the macro $USER1$. This is best defined in the misccommands.cfg file:

define command{

command_name submit_service_check

command_line $USER1$/eventhandlers/submit_check_result \ $HOSTNAME$ ’$SERVICEDESC$’ $SERVICESTATEID$ ’$SERVICEOUTPUT$’

define command{

command_name submit_host_check

command_line $USER1$/eventhandlers/submit_host_result \ $HOSTNAME$ $HOSTSTATEID$ ’$HOSTOUTPUT$’

If you use a separate file for this, you must make sure that Nagios will load this file by adding an entry to /etc/nagios/nagios.cfg. The single quotes surrounding the $SERVICEDESC$ macro and the two output macros in the command_line line are important. Their values sometimes contain empty spaces, which the command line would interpret as delimiters without the quotes.

268

15.3 Practical Scenarios

15.3 Practical Scenarios

One application for distributed monitoring is the monitoring of branches or external offices in which a noncentral Nagios installation is limited to running service and host checks and sending the results to the central instance. The noncentral instances do not need further Nagios functions, such as the notification system or the Web interface.

On the other hand, if administrators look after the networks at the distributed locations, while the central IT department only looks after special services, then the noncentral Nagios server is set up as a normal, full-fledged installation and selectively forwards only those check results over the OCSP/OCHP mechanism to the central office for which the specialists there are responsible.

Whatever the case, you must ensure that the host and service definition is available both noncentrally and centrally. This can be done quite simply using templates (Section 2.11 on page 54) and the cfg_dir directive (Section 2.1, page 38): you set up the definition so that the configuration files can be copied 1:1.

15.3.1 Avoiding redundancy in configuration files

In the following example we assume that the noncentral servers only perform host and service checks and send the results to the central server, and do not provide any other Nagios functions. The following directories are set up on the central host:

/etc/nagios/global

/etc/nagios/local

/etc/nagios/sites

/etc/nagios/sites/bonn

/etc/nagios/sites/frankfurt

/etc/nagios/sites/berlin

...

Each of the configurations used for a location lands in the directory /etc/nagios/ sites/location. After global, all the definitions follow that can be used identically at all locations (e.g., the command definitions in checkcommands.cfg). The directory local takes in specific definitions for the central server definitions. These include the templates for services and hosts, where distinction must be made between central and noncentral.

This directory is also created separately on the noncentral servers: only the folders global and sites/location are copied from the central instance to the branch offices.

269

15 Distributed Monitoring

The three directories are read in with the cfg_dir directive in /etc/nagios/nagios.cfg:

# -- /etc/nagios/nagios.cfg

...

cfg_dir=/etc/nagios/global cfg_dir=/etc/nagios/local cfg_dir=/etc/nagios/sites

...

Only settings that are identical for the noncentral and central page are used in the service definition:

# -- /etc/nagios/sites/bonn/services.cfg define service{

host_name bonn01 service_description HTTP

use

bonn-svc-template

...

 

check_command

check_http

...

 

}

The location-dependent parameters are dealt with by the templates.

15.3.2 Defining templates

In order that service definitions are identical on both the central and noncentral servers, the local templates must have the same names as the central ones. In addition you should ensure that the obligatory parameters (see Chapter 2 from page 37) are also all entered, even if they are not even required at one of the locations, because together, the template and service definitions must cover all obligatory parameters.

The following example shows a service template for one of the noncentral locations:

# -- On-Site configuration for the Bonn location define service{

name bonn-svc-template register 0

max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 active_checks_enabled 1 passive_checks_enabled 1 check_period 24x7

270

bonn-svc-template 0

15.3 Practical Scenarios

obsess_over_service 1 notification_interval 0 notification_period none notification_options n notifications_enabled 0 contact_groups dummy

}

The parameters that are important for the noncentral page are printed in bold type. Besides the parameters that refer to the test itself, the parameter obsess_over_ service must also not be left out. This ensures that the check results are sent to the central server.

notifications_enabled switches off notification in this case, since the local admins do not need to worry about error messages from services that are centrally monitored. Alternatively this can be done globally in the noncentral /etc/nagios/nagios. cfg.

register 0 ensures that the template is used exclusively as a template, so that Nagios does not interpret it as a separate service definition.

The counterpart with the same name on the central server looks something like this:

# -- Service template for the central Nagios server define service{

name register

max_check_attempts 3 normal_check_interval 5 retry_check_interval 1 active_checks_enabled 0 passive_checks_enabled 1 check_period none check_freshness 0 obsess_over_service 0 notification_interval 480 notification_period 24x7 notification_options u,c,r notifications_enabled 1 contact_groups admins

}

The parameter passive_checks_enabled is of importance here, as well as the configuration of the notification system. On the central side, the parameters involving the test itself come into play only if freshness checking is used (see Section 13.4 from page 243). This works only if the central Nagios server is itself in a position to actively test all services if there is any doubt. Since the check_command in this simple template solution is given in the location-dependent service definition,

271

15 Distributed Monitoring

which is identical on the noncentral and central servers, this will work only if the same command object can be used both centrally and noncentrally—if the object definitions in global/checkcommands.cfg match on both sides.

In the example, however, we completely switch off active tests of services at the Bonn location, with check_period none and check_freshness set to 0. The system described so far can also be applied to host checks, of course.

272

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]