Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Nagios - System and Network Monitoring.pdf
Скачиваний:
314
Добавлен:
15.03.2015
Размер:
6.48 Mб
Скачать

3

Chapter

Startup

Once Nagios and the plugins are installed, and Apache is set up for the Web interface, as well as the minimal configuration as described until now, operation of the system can get under way. If you have not already done so, it is recommended that you first spend a bit of time on the test for the check_icmp plugin, described in Section 1.2 (page 30), to check the initial configuration.

3.1 Checking the Configuration

The nagios program, which normally runs as a daemon and continually collects data, can also be used to test the configuration:

nagios@linux:˜$ /usr/local/nagios/bin/nagios -v /etc/nagios/nagios.cfg

[...]

Checking services...

61

3 Startup

Checked 1 services.

Checking hosts...

Warning: Host ’linux02’ has no services associated with it!

Checked 2 hosts.

Checking host groups...

Checked 1 host groups.

Checking service groups...

Checked 0 service groups.

Checking contacts...

Warning: Contact ’wob’ is not a member of any contact groups!

Checked 2 contacts.

Checking contact groups...

Checked 1 contact groups.

Checking service escalations...

Checked 0 service escalations.

Checking service dependencies...

Checked 0 service dependencies.

Checking host escalations...

Checked 0 host escalations.

Checking host dependencies...

Checked 0 host dependencies.

Checking commands...

Checked 22 commands.

Checking time periods...

Checked 4 time periods.

Checking extended host info definitions...

Checked 0 extended host info definitions.

Checking extended service info definitions...

Checked 0 extended service info definitions.

Checking for circular paths between hosts...

Checking for circular host and service dependencies...

Checking global event handlers...

Checking obsessive compulsive processor commands...

Checking misc settings...

Total Warnings: 2

Total Errors: 0

Things look okay - No serious problems were detected during the pre-flight check

Although warnings displayed here can in principle be ignored, this is not always what the inventor had in mind: perhaps you made a mistake in the configuration, and Nagios is ignoring a specific object, which you would actually like to use.

The first warning in the example refers to a host called linux02, which has not been allocated any services. Since Nagios works primarily with service checks, and uses host checks only if it needs them, a computer should basically always be allocated at least one service. Nagios issues a warning, as here, if no service at all has been defined for a particular host.

62

3.2 Getting Monitoring Started

It is also recommended, however, to always define a “PING” service for every host, although this is not absolutely essential. Even if the same plugin, check_icmp, is used here as with the host check, this is not the same thing: the host check is satisfied with a single response packet, after all, it only wants to find out if the host “is alive”. As a service check, check_icmp registers packet run times and loss rates, which can be used to draw conclusions, if necessary, concerning existing problems with a network card.

The second warning refers to a contact named wob, who, although defined, is not used, because he does not belong to any contact group.

In contrast to warnings, genuine errors must be eliminated, because Nagios will usually not start if the parser finds an error, as in the following example:

Error: Could not find any host matching ’linux03’

Error: Could not expand hostgroups and/or hosts specified in service (config file ’/etc/nagios/mysite/services.cfg’, starting on line 0)

***> One or more problems was encountered while processing the config

files...

Here the configuration mistakenly contains a host called linux03, for which there is no definition. If you read through the error message carefully, you will quickly realize that the error can be found in the file /etc/nagios/mysite/services.cfg.

In the definition of independencies (host and service dependencies, see Section 12.6 page 234) there is a fundamental risk that circular dependencies could be specified by mistake. Because Nagios cannot automatically resolve such dependencies, this is also checked before the start, and if necessary, an error is displayed.

When using the parents parameter, it is also possible that two hosts may inadvertently serve mutually as “parents”; Nagios also test this.

3.2 Getting Monitoring Started

3.2.1 Manual start

During the Nagios installation, the command

linux:src/nagios # make install-init

saves a startup script in the /etc/init.d directory. If the configuration test ran without error, Nagios is first started manually with this script:

linux:˜ # /etc/init.d/nagios start

63

3 Startup

3.2.2 Automatic start

If all runs smoothly here—which can be checked by running the Web interface (see Chapter 3.3)—you only need to ensure that the script is also started when the system boots. Symbolic links exist in the directories /etc/init.d/rc[235].d for this purpose:

linux:˜ # ln -s /etc/init.d/nagios /etc/init.d/rc2.d/S99nagios linux:˜ # ln -s /etc/init.d/nagios /etc/init.d/rc2.d/K99nagios

Corresponding links are also set in the subdirectories responsible for runlevels 3 and 5 rc3.d and rc5.d.

3.2.3 Making configuration changes come into effect

If configuration changes are made, it is not required, and not even recommended, that you restart Nagios each time. Instead, you just perform a reload:

linux:˜ # /etc/init.d/nagios reload

This causes Nagios to reread the configuration, end tests for hosts and services that no longer exist, and integrate new computers and services into the test. However, with each reload there is a renewed scheduling of checks, meaning that Nagios plans to carry out all tests afresh.

To prevent all tests from being started simultaneously at bootup, Nagios performs a so-called spreading. Here the server spreads the start times of the tests over a configurable period.1 For a large number of services, it can therefore take a while before Nagios continues the test for a specific service. For this reason you should never run reloads at short intervals: in the worst case, Nagios will not manage to perform some checks in the intervening period and will perform them only some time after the most recent reload.

Before being reloaded, the configuration is tested to eliminate any existing errors, as shown in Section 3.1.

3.3 Overview of the Web Interface

If you call the URL http://nagios-server/nagios in the browser when the Nagios daemon is running, you will be taken to the welcome screen shown in Figure 3.1.

1The relevant configuration parameters are called max_host_check_spread and max_ service_check_spread, see Appendix D.1, page 435.

64

3.3 Overview of the Web Interface

Figure 3.1:

The start screen

The so-called “tactical overview” (Tactical Overview), which can be reached via the first monitoring link in the left menu bar, is shown in Figure 3.2. It summarizes the status of all tested systems.

Figure 3.2:

“Tactical” overview of all systems and services to be monitored

65

3 Startup

Considerably more interesting in practice, however, is the display of the menu item Service Problems (Figure 3.3). It documents the services that are currently causing problems, those that are not in the OK status, in the very sense for which Nagios was conceived: to inform the administrator precisely of any problems.

Figure 3.3: Nagios: summary of all service problems

The first column names the host involved. If this has a gray background, Nagios can reach the computer in principle. If the host is “down” this can be seen by the red background. For services, red stands for CRITICAL and yellow for WARNING.

The second column provides the service name, the third column the staus again, in plain text. Column four specifies the time of the last check. Column five is interesting: it shows how long the current status has been going on.

The sixth column with the heading Attempt reveals how often Nagios has already performed the test (unsuccessfully): 3/3 means that the error status has been confirmed for the third time in succession, but that the test is only performed three times if there is an error (parameter max_check_attempts, see Section 2.3).

Figure 3.4: A summary of all hosts (extract)

66

3.3 Overview of the Web Interface

Finally, the last column passes on the information from the plugin to the administrator, to whom it describes the current status in more detail. The top line in Figure 3.3, for example, warns that only five percent of storage space is available in the /usr file system of the host eli11.

The Host Detail (Figure 3.4) and Service Detail overviews provide an overview of all hosts and services. In practice you will be looking more precisely for information, either via a single host or on a host group or service group The name in question is entered in the Show Host search field. Figure 3.5 shows this using the example of the eli11 host.

Figure 3.5:

All services for the host eli11

Figure 3.6: the host group

eli-linux in the grid view

67

3 Startup

Alternatively you can search for the names of host and service groups. An interesting variation here is to have a status grid output shown via the link Hostgroup Grid, which displays an overview of all hosts and their corresponding services, together with the status of these (Figure 3.6). Through the color of the service (green/yellow/red) you can quickly see at a glance whether there are problems in the service group or host group that you are viewing.

68

In More Detail . . .

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]