nagios搭建服务器监控系统

原创

yeli4017 2009-07-08 11:01:32 ©著作权

©著作权归作者所有：来自51CTO博客作者yeli4017的原创作品，请联系作者获取转载授权，否则将追究法律责任

首先安装需要的rpm包

yum -y install gcc gcc-c++ autoconf libjpeg libjpeg-devel libpng libpng-devel freetype freetype-devel libxml2 libxml2-devel zlib zlib-devel glibc glibc-devel glib2 glib2-devel bzip2 bzip2-devel ncurses ncurses-devel curl curl-devel e2fsprogs e2fsprogs-devel krb5 krb5-devel libidn libidn-devel openssl openssl-devel openldap openldap-devel nss_ldap openldap-clients openldap-servers perl gd gd-devel jpeg jpeg-devel libpng libpng-devel Net-snmp zlib freetype libart_lgpl cairo-devel pango-devel lrzsz*
useradd -m nagios 创建用户
mkdir -p /usr/local/nagios 创建安装目录
mkdir /usr/local/rrdtool 同上
grep ^User /etc/httpd/conf/httpd.conf 查看apache启动用户
我的是apache,下面将这个用户加入nagios组
usermod -G nagios apache apache加入nagios组
chown -R nagios.nagios /usr/local/nagios 修改目录属主组

4.所用软件
nagios-2.9.tar.gz
nagios-plugins-1.4.9.tar.gz
nrpe-2.8.1.tar.gz
NSClient++-0.2.7.zip

###############################
rpm -qa|grep php 查询php安装包
##############################

Net-snmp (check_snmp使用，没有安装的话，nagios不会编译出check_snmp脚本)
./configure --prefix=/usr
make && make install
CGI.pm模块 (nagiosgraph使用，没有安装的话，rrdtools画不出曲线图)
perl Makefile.pl
make && make install
Rrdtools (rrd数据库，rrdtools工具)
./configure --prefix=/usr/local/rrdtool
make && make install

1.安装nagios主程序
解压缩
tar -zxvf nagios-2.9.tar.gz
cd nagios-2.9
./configure --prefix=/usr/local/nagios
make all
make install
make install-init
make install-commandmode && make install-config

2.安装插件
解压缩
tar -zxvf nagios-plugins-1.4.9.tar.gz
cd nagios-plugins-1.4.9
./configure --prefix=/usr/local/nagios
make && make install

3.修改apache配置
修改apache的配置文件,增加nagios的目录,并且访问此目录需要进行身份验证
vi /usr/local/apache2/conf/httpd.conf,在最后增加如下内容
#setting for nagios 20090707
ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin
<Directory "/usr/local/nagios/sbin">
    Options ExecCGI
    AllowOverride None
    Order allow,deny
    Allow from all
    AuthName "Nagios Access"
    AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd
    Require valid-user
</Directory>

Alias /nagios /usr/local/nagios/share
<Directory "/usr/local/nagios/share">
    Options None
    AllowOverride None
    Order allow,deny
    Allow from all
    AuthName "Nagios Access"
    AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd
    Require valid-user
</Directory>

apache配置php
修改Apache的配置文件httpd.conf：
# vi /usr/local/apache/conf/httpd.conf
在AddType application/x-gzip .gz .tgz下，添加下面的配置项：
AddType application/x-httpd-php .php
AddType application/x-httpd-php-source .phps

添加index.php
###########################################
tar -zxvf p_w_picpathpak-base.tar.gz
cp -r base /usr/local/nagios/share/p_w_picpaths/logos/ 图片支持

###########################################

/usr/bin/htpasswd -c /usr/local/nagios/etc/htpasswd yelj 创建web用户
查看认证文件的内容
less /usr/local/nagios/etc/htpasswd
yelj:OmWGEsBnoGpIc 前半部分是用户名test,后面是加密后的密码
到这里nagios的安装也就基本完成了,你可以通过web来访问了.
http://192.168.0.111/nagios 会弹出对话框要求输入用户名密码

3)修改配置文件
修改nagios的主配置文件nagios.cfg
vi nagios.cfg
注释行 #cfg_file=/usr/local/nagios/etc/localhost.cfg[2],然后把下面几行的注释去掉：
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
cfg_file=/usr/local/nagios/etc/objects/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/objects/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/objects/hosts.cfg
cfg_file=/usr/local/nagios/etc/objects/services.cfg
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
//监视时段配置文件路径
注释掉就说明不使用这个文件,去掉了注释就是需要使用这个文件.
改check_external_commands=0为check_external_commands=1 .这行的作用是允许在web界面下执行重启nagios、停止主机/服务检查等操作。
把command_check_interval的值从默认的1改成command_check_interval=10s（根据自己的情况定这个命令检查时间间隔，不要太长也不要太短）。
主配置文件要改的基本上就是这些，通过上面的修改，发现/usr/local/nagios/etc并没有文件hosts.cfg等一干文件，怎么办？稍后手动创建它们。
:
修改CGI脚本控制文件cgi.cfg
vi cgi.cfg
第二个要修改的配置文件是cgi.cfg,它的作用是控制相关cgi脚本。先确保use_authentication=1。曾看过不少的文章，都是建议把use_authentication的值设置成”0”来取消验证，这是一个十分糟糕的想法。
接下来修改default_user_name=test ,再后面的修改在下表列出：
authorized_for_system_information=nagiosadmin,test
authorized_for_configuration_information=nagiosadmin,test
authorized_for_system_commands=test //多个用户之间用逗号隔开
authorized_for_all_services=nagiosadmin,test
authorized_for_all_hosts=nagiosadmin,test
authorized_for_all_service_commands=nagiosadmin,test
authorized_for_all_host_commands=nagiosadmin,test
那么上述用户名打那里来的呢？是执行命令
增加验证用户
/usr/bin/htpasswd -c /usr/local/nagios/etc/htpasswd yelj所生成的，这个要注意，不能随便加没有存在的验证用户，为了安全起见，不要添加过多的验证用户。

定义监控时间段,创建配置文件timeperiods.cfg
[root@localhost etc]# vi timeperiods.cfg
define timeperiod{
        timeperiod_name         24x7   //时间段的名称,这个地方不要有空格
        alias                   24 Hours A Day,7Days A Week
        sunday                  00:00-24:00
        monday                  00:00-24:00
        tuesday                 00:00-24:00
        wednesday               00:00-24:00
        thursday                00:00-24:00
        friday                  00:00-24:00
        saturday                00:00-24:00
        }
定义了一个监控时间段,它的名称是24x7,监控的时间是每天全天24小时

定义联系人,创建配置文件contacts.cfg
[root@localhost etc]# vi contacts.cfg
define contact{
        contact_name                    test   //联系人的名称,这个地方不要有空格
        alias                           sys admin
        service_notification_period     24x7
        host_notification_period        24x7
        service_notification_options    w,u,c,r
        host_notification_options       d,u,r
        service_notification_commands   notify-by-email
        host_notification_commands      host-notify-by-email
        email                           yeli4017@163.com
        }
下面就可以将多个联系人组成一个联系人组,创建文件contactgroups.cfg
[root@localhost etc]# vi contactgroups.cfg
define contactgroup{
        contactgroup_name               sagroup
        alias                   System Administrators
        members                 test
}
定义被监控主机,创建文件hosts.cfg
[root@localhost etc]# vi hosts.cfg
define host{
        host_name                       nagios-server
//被监控主机的名称,最好别带空格
        alias                           nagios server
        //别名
        address                         192.168.0.111
        //被监控主机的IP地址,我现在暂时先填本机的IP
        check_command                   check-host-alive
        //监控的命令check-host-alive,这个命令来自commands.cfg,用来监控主机是否存活
        max_check_attempts              5
        //检查失败后重试的次数
        check_period                    24x7
        //检查的时间段24x7,同样来自于我们之前在timeperiods.cfg中定义的
contact_groups                  sagroup
        //联系人组,上面在contactgroups.cfg中定义的sagroup
notification_interval           10
        //提醒的间隔,每隔10秒提醒一次
notification_period             24x7
        //提醒的周期, 24x7,同样来自于我们之前在timeperiods.cfg中定义的
notification_options            d,u,r
//指定什么情况下提醒,具体含义见之前contacts.cfg部分的介绍
        }

[root@localhost etc]# vi hostgroups.cfg
define hostgroup{
        hostgroup_name          sa-servers
        alias                   sa Servers
        members                 nagios-server
        }

在运行nagios之前首先做测试
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
看到下面这些信息就说明没问题了
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
如果有问题的话就可以按照输出信息来排查

作为守护进程后台启动nagios
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

1).监控nagios-server的ftp
编辑services.cfg 增加下面的内容,基本上就是copy上节我们定义监控主机存活的代码.略做修改.
define service{
        host_name               nagios-server
        service_description     check ftp
        check_command           check_ftp
        max_check_attempts      5
        normal_check_interval   3
        retry_check_interval    2
        check_period            24x7
        notification_interval   10
        notification_period     24x7
        notification_options    w,u,c,r
        contact_groups          sagroup
        }

2).监控dbpi的ssh
define service{
        host_name               dbpi
service_description     check-ssh
check_command           check_tcp!22
        max_check_attempts      5
        normal_check_interval   3
        retry_check_interval    2
        check_period            24x7
        notification_interval   10
        notification_period     24x7
        notification_options    w,u,c,r
        contact_groups          sagroup
        }

3).监控yahoon的IIS
define service{
        host_name               yahoon
        service_description     check-http
        check_command           check_http
        max_check_attempts      5
        normal_check_interval   3
        retry_check_interval    2
        check_period            24x7
        notification_interval   10
        notification_period     24x7
        notification_options    w,u,c,r
        contact_groups          sagroup
        }

4).监控nagios-sever的根分区的使用情况.
define service{
        host_name               nagios-server
        service_description     check disk
        check_command           check_local_disk!10%!5%!/
        max_check_attempts      5
        normal_check_interval   3
        retry_check_interval    2
        check_period            24x7
        notification_interval   10
        notification_period     24x7
        notification_options    w,u,c,r
        contact_groups          sagroup
        }
修改了配置文件,当然就要重新启动了,简单的方法杀掉nagios进程,然后重新启动
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
3安装nrpe
解压缩
tar -zxvf nrpe-2.8.1.tar.gz
cd nrpe-2.8.1
编译
./configure
安装check_nrpe这个插件
make install-plugin
之前说过监控机需要安装check_nrpe这个插件,被监控机并不需要,我们在这里安装它是为了测试的目的
安装deamon

在被监控主机上
1增加用户
useradd -m -s /sbin/nologin nagios

2安装nagios插件
解压缩
tar -zxvf nagios-plugins-1.4.9.tar.gz
cd nagios-plugins-1.4.9
编译安装
./configure
make
make install
这一步完成后会在/usr/local/nagios/下生成两个目录libexec和share
[root@dbpi local]# ls /usr/local/nagios/
libexec share
修改目录权限
chown nagios.nagios /usr/local/nagios
chown -R nagios.nagios /usr/local/nagios/libexec

3安装nrpe
解压缩
tar -zxvf nrpe-2.8.1.tar.gz
cd nrpe-2.8.1
编译
./configure
make all
make install-daemon
make install-daemon-config

现在再查看nagios目录就会发现有4个目录了
[root@dbpi nrpe-2.8.1]# ls /usr/local/nagios/
bin      etc      libexec share
按照安装文档的说明,是将NRPE deamon作为xinetd下的一个服务运行的.在这样的情况下xinetd就必须要先安装好,不过一般系统已经默认装了
4.安装xinetd脚本
[root@dbpi nrpe-2.8.1]# make install-xinetd
输出如下
/usr/bin/install -c -m 644 sample-config/nrpe.xinetd /etc/xinetd.d/nrpe
可以看到创建了这个文件/etc/xinetd.d/nrpe
编辑这个脚本
vi /etc/xinetd.d/nrpe
# default: on
# description: NRPE (Nagios Remote Plugin Executor)
service nrpe
{
        flags           = REUSE
        socket_type     = stream
        port            = 5666
        wait            = no
        user            = nagios
        group           = nagios
        server          = /usr/local/nagios/bin/nrpe
        server_args     = -c /usr/local/nagios/etc/nrpe.cfg --inetd
        log_on_failure += USERID
        disable         = no
        only_from       = 127.0.0.1在后面增加监控主机的地址0.111,以空格间隔
}
改后
     only_from       = 127.0.0.1 192.168.0.111

编辑/etc/services文件,增加NRPE服务
vi /etc/services
增加如下
# Local services
nrpe            5666/tcp                        # nrpe
重启xinetd服务
[root@dbpi nrpe-2.8.1]# service xinetd restart
Stopping xinetd: [ OK ]
Starting xinetd: [ OK ]

查看NRPE是否已经启动
[root@dbpi nrpe-2.8.1]# netstat -at|grep nrpe
tcp        0      0 *:nrpe                  *:*                     LISTEN
[root@dbpi nrpe-2.8.1]# netstat -an|grep 5666
tcp        0      0 0.0.0.0:5666            0.0.0.0:*               LISTEN
可以看到5666端口已经在监听了

5.测试NRPE是否则正常工作
之前我们在安装了check_nrpe这个插件用于测试,现在就是用的时候.执行
/usr/local/nagios/libexec/check_nrpe -H localhost
会返回当前NRPE的版本
[root@dbpi nrpe-2.8.1]# /usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.8.1
也就是在本地用check_nrpe连接nrpe daemon是正常的
注:为了后面工作的顺利进行,注意本地防火墙要打开5666能让外部的监控机访问

/usr/local/nagios/libexec/check_nrpe –h查看这个命令的用法
可以看到用法是check_nrpe –H 被监控的主机 -c要执行的监控命令
注意:-c后面接的监控命令必须是nrpe.cfg文件中定义的.也就是NRPE daemon只运行nrpe.cfg中所定义的命令

查看NRPE的监控命令
cd /usr/local/nagios/etc
vi nrpe.cfg
找到下面这段话
# The following examples use hardcoded command arguments...
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
红色部分是命令名,也就是check_nrpe 的-c参数可以接的内容,等号=后面是实际执行的插件程序(这与commands.cfg中定义命令的形式十分相似,只不过是写在了一行).也就是说check_users就是等号后面/usr/local/nagios/libexec/check_users -w 5 -c 10的简称.
我们可以很容易知道上面这5行定义的命令分别是检测登陆用户数,cpu负载,hda1的容量,僵尸进程,总进程数.各条命令具体的含义见插件用法(执行”插件程序名 –h”)
由于-c后面只能接nrpe.cfg中定义的命令,也就是说现在我们只能用上面定义的这五条命令.我们可以在本机实验一下.执行
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_users
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_load
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_hda1
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_zombie_procs
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_total_procs

在运行nagios的监控主机上
之前已经将nagios运行起来了,现在要做的事情是:
– 安装check_nrpe插件
– 在commands.cfg中创建check_nrpe的命令定义,因为只有在commands.cfg中定义过的命令才能在services.cfg中使用
–      创建对被监控主机的监控项目
安装check_nrpe插件
[root@server1 yahoon]# tar -zxvf nrpe-2.8.1.tar.gz
[root@server1 yahoon]# cd nrpe-2.8.1
[root@server1 nrpe-2.8.1]# ./configure
[root@server1 nrpe-2.8.1]# make all
[root@server1 nrpe-2.8.1]# make install-plugin
只运行这一步就行了,因为只需要check_nrpe插件

在dbpi上我们刚装好了nrpe,现在我们测试一下监控机使用check_nrpe与被监控机运行的nrpedaemon之间的通信.
[root@server1 nrpe-2.8.1]# /usr/local/nagios/libexec/check_nrpe -H 192.168.0.100
NRPE v2.8.1
看到已经正确返回了NRPE的版本信息,说明一切正常.

在commands.cfg中增加对check_nrpe的定义
vi /usr/local/nagios/etc/commands.cfg
在最后面增加如下内容
########################################################################
#
# 2007.9.5 add by yahoon
# NRPE COMMAND
#
########################################################################
# 'check_nrpe ' command definition
define command{
        command_name check_nrpe
        command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
        }
意义如下
command_name check_nrpe
定义命令名称为check_nrpe,在services.cfg中要使用这个名称.
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
这是定义实际运行的插件程序.这个命令行的书写要完全按照check_nrpe这个命令的用法.不知道用法的就用check_nrpe –h查看
-c后面带的$ARG1$参数是传给nrpe daemon执行的检测命令,之前说过了它必须是nrpe.cfg中所定义的那5条命令中的其中一条.在services.cfg中使用check_nrpe的时候要用!带上这个参数
下面就可以在services.cfg中定义对dbpi主机cpu负载的监控
define service{
        host_name               dbpi
被监控的主机名,这里注意必须是linux且运行着nrpe,而且必须是hosts.cfg中定义的
        service_description     check-load
        监控项目的名称
        check_command           check_nrpe!check_load
        监控命令是check_nrpe,是在commands.cfg中定义的,带的参数是check_load,是在nrpe.cfg中定义的
        max_check_attempts      5
        normal_check_interval   3
        retry_check_interval    2
        check_period            24x7
        notification_interval   10
        notification_period     24x7
        notification_options    w,u,c,r
        contact_groups          sagroup
        }
像这样将其余四个监控项目加进来.

之前我们说过了,今天还有一个任务是要监控dbpi的swap使用情况.但是很遗憾,在nrpe.cfg中默认没有定义这个监控功能的命令.怎么办手动在nrpe.cfg中添加,也就是自定义NRPE命令.
现在我们要监控swap分区,如果空闲空间小于20%则为警告状态—warning;如果小于10%则为严重状态—critical.我们可以查得需要使用check_swap插件,完整的命令行应该是下面这样.
/usr/local/nagios/libexec/check_swap -w 20% -c 10%

在被监控机上增加check_swap命令的定义
vi /usr/local/nagios/etc/nrpe.cfg
增加下面这一行
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%
我们知道check_swap现在就可以作为check_nrpe的-c的参数使用了
修改了配置文件,当然要重启.但是
如果你是以独立的daemon运行的nrpe,那么需要手动重启.
如果你是在xinetd或者inetd下面运行的,则不需要.
由于我们是xinetd下运行的,所以不需要重启服务

在监控机上增加这个监控项目
define service{
        host_name               dbpi
        service_description     check-swap
        check_command           check_nrpe!check_swap
        max_check_attempts      5
        normal_check_interval   3
        retry_check_interval    2
        check_period            24x7
        notification_interval   10
        notification_period     24x7
        notification_options    w,u,c,r
        contact_groups          sagroup
        }
1.重启nagios的方法
之前我说重启nagios的时候都是用的杀进程的方式,其实也可以不这么做.如果在安装nagios的时候安装了启动脚本就可以使用/etc/init.d/nagios restart 还可以带的参数有stop, start,status
如果报错了,有可能是脚本里面的路径设置错误,解决办法
vi /etc/init.d/nagios
将prefix=/usr/local/nagiosaa改为安装的目录/etc/init.d/nagios
注:在nagios安装的时候说是将脚本安装到了/etc/rc.d/init.d,其实这和/etc/init.d是一个目录

2.不以xinetd的方式运行nrpe
因为我们按照nrpe的安装文档安装下来,nrpe是在xinetd下面运行的,个人比较喜欢像nagios那样以单独的daemon来运行.这样比较好控制.
方法:
编辑 /etc/services将nrpe注释掉
# Local services
#nrpe           5666/tcp                        # nrpe
编辑 nrpe.cfg,增加监控主机的地址
# NOTE: This option is ignored if NRPE is running under either inetd or xinetd
allowed_hosts=127.0.0.1,192.168.0.111
注意两个地址以逗号隔开
以单独的daemon启动nrpe
[root@dbpi etc]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
查看
[root@dbpi etc]# ps -ef|grep nrpe
nagios   22125     1 0 14:04          00:00:00 [nrpe]
[root@dbpi nagios]# netstat -an|grep 5666
tcp        0      0 0.0.0.0:5666            0.0.0.0:*               LISTEN
说明已经正常启动了
在/etc/rc.d/rc.local里面加入下面一行就实现开机启动nrpe了
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg –d
同理要开机运行nagios就在/etc/rc.d/rc.local里面增加下面这行
/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

3.有关于check_load的用法及意义
这个插件是用来检测系统当前的cpu负载,使用的方法为
check_load [-r] -w WLOAD1,WLOAD5,WLOAD15 -c CLOAD1,CLOAD5,CLOAD15
在unix里面负载的均值通常表示是1分钟,5分钟,15分钟内平均有多少进程处于等待状态.
例如check_load -w 15,10,5 -c 30,25,20这个命令的意义如下
当1分钟多于15个进程等待,5分钟多于10个,15分钟多于5个则为warning状态
当1分钟多于30个进程等待,5分钟多于25个,15分钟多于20个则为critical状态

wget http://oss.oetiker.ch/rrdtool/pub/rrdtool-1.3.7.tar.gz
安装php
yum install php-gd php

apache配置php
修改Apache的配置文件httpd.conf：
# vi /usr/local/apache/conf/httpd.conf
在AddType application/x-gzip .gz .tgz下，添加下面的配置项：
AddType application/x-httpd-php .php
AddType application/x-httpd-php-source .phps

/etc/init.d/httpd restart 重启apache服务

wget http://downloads.sourceforge.net/pnp4nagios/pnp-0.4.3.tar.gz

PNP是一个小巧的开源软件包，它是基于PHP和PERL，利用rrdtool将Nagios采集的数据绘制成图表

安装rrdtools软件包
安装rrdtool之前确保安装了这些库 zlib libpng freetype libart_lgpl cairo-devel pango-devel

tar zxvf rrdtools
./configure --prefix=/usr/local/rrdtool
make && make install
#########################################################

RRDs Perl Modules: *** NOT FOUND ***

环境：监控主机，安装pnp插件

解决办法：
cp -r /usr/local/rrdtool/lib/perl/5.8.8/i386-linux-thread-multi/* /usr/lib/perl5/5.8.8/i386-linux-thread-multi/

##########################################################
安装pnp软件包
tar zxvf pnp
./configure --with-rrdtool=/usr/local/rrdtool/bin/rrdtool --with-perfdata-dir=/usr/local/nagios/share/perfdata
make all
make install

2、配置nagios配置文件
修改主配置文件nagios.cfg
在定义host或service中都有一个定义项，名为process_perf_data，其值可以定义为0或1，其作用是是否启用Nagios的数据输出功能。如果你将此项赋值为1，那么Nagios就会将收集的数据写入到某个文件中，以备提取
vi /usr/local/nagios/etc/nagios.cfg
process_performance_data=1
service_perfdata_command=process-service-perfdata    #将些句前面的注释去掉
对某个监控对象做数据图表，则需在所对应的host或者service定义中
process_perf_data 1
define host{
                use                            generic-host
                host_name                  web-1.72
                alias                           web-server
                address                       192.168.1.72
                check_command            check-host-alive
                max_check_attempts     10
                check_period                24x7
                notification_interval       20
                notification_period         24x7
                notification_options        d,r
                contact_groups             admins
                process_perf_data         1
                }

Nagios就会调用相应的命令来输出数据了。Nagios的command定义中默认有一项“process-service-perfdata”，该命令声明了Nagios输出哪些值到输出的文件中。不过其定义相对简单，PNP提供了一个perl脚本，更详尽的定义了一个输出数据的方法。如果要使用PNP的话，我们需要在command的定义中，将“process-service-perfdata”命令对应的执行命令行的内容替换成该脚本：

define command{
        command_name process-service-perfdata
        command_line /usr/local/nagios/libexec/p`ss_perfdata.pl
}
process_perfdata.pl文件是安装了pnp软件包,自动在/usr/local/nagios/libexec目录下生成的。
3、检查nagios配置是否正确
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
重新载加nagios配置，并重启服务!
/etc/init.d/nagios reload
/etc/init.d/nagios restart
4、测试
在IE里输入http://locallost/nagios/pnp

nagios参数整理

define host{          #定义一个被监控的主机。
host_name             #标识主机的名字。我们用这个名字在host group和service里标识这个主机。一个主机能定义多个服务。使用适当时，宏$HOSTNAME$里存放了这一项的值。
alias                 #主机的一个完整名字或描述。主要是和使你能理容易的标识一个主机。使用适当时，宏$HOSTALIAS$里存放了这一项的值。
address               #主机的地址。一般而言是主机的IP。当然，你也能够使用一个FQDN来标识你的主机，在没有可访问DNS服务器服务的情况下这种方法会引起问题。使用适当时，宏$ HOSTADDRESS $里存放了这一项的值。
max_check_attempts    #在检测返回结果不是OK时，nagios重试检测命令的次数。设置这个值为1会导致nagios一次也不重试就报警。
check_period          #这一项用一个time period项的名字来定义在哪段时间内激活对这台主机的主动检测。time period是定义在别的文件里的配置项,我们可以在这里用名字来引用她。
contact_groups        #这是一个联系组列表。我们用联系组的名字来引用她们。多个联系组间用“，”来分隔。
notification_interval #当一个服务仍然down或unreachable时，我们间隔多久重发一次通知给联系组。
notification_period   #用一个time period定义来标识什么时间段内给联系组送通知。这里我们用time period定义的名字来引用她。
notification_options #用来决定发送通知的时机。选项有：d = 当有down状态时发送通知，u = 当有unreachable状态时发送通知, r = 当有服务recoveries时发送通知，f = 当主机启动或停机时发送通知。如果你给一个n选项，那么永远不会发送通知。
}

define hostgroup{       #这段是用来定义一个被监控的主机组。
hostgroup_name #主机组名称，通常定义得较短
alias           #主机组别名，通常定义得较长
members           #主机组成员
}

define service{       #这段是用来定义一个被监控的服务。
host_name             #主机名称
service_description   #服务描述
check_command         #执行命令
max_check_attempts    #最大失败尝试次数,值为1时只报警不重新检测
normal_check_interval #常规检测间隔时间，默认为60分钟（常规检测是指无论服务状态是否正常，检测次数达到“最大次数”时）
retry_check_interval #失败尝试间隔时间，默认为60分钟（失败尝试是指服务状态不正常，检查次数达到“最大次数”时）
check_period          #检测时间段
notification_interval #当服务状态不正常时，重复发送提醒的最短间隔时间（分钟），值为0时不重复通知。
notification_period   #通知联系人时间段
notification_options #通知联系人选项，w警告，u未知，c危急，f启动和停止，n不发送通知
contact_groups        #联系人组
}

define servicegroup{    #这段是用来定义一个被监控的服务组。
servicegroup_name        #服务组名称，通常定义得较短
alias                    #服务组别名，通常定义得较长
members                  #服务组成员
}

define contact{              #这段是用来定义一个联系人。
contact_name                 #一个联系人的简称。他会在定义contactgroup时被引用到。在相应的环境中，宏定义$CONTACTNAME$会包含这个值。
alias                        #一个联系人的具体的描述。在相应的环境中，宏定义$CONTACTALIAS$会包含这个值。
host_notification_period     #能够通知Contact中定义的那个简称联系人，关于主机有问题或者恢复正常状态的时间段。你可以把他想象成能够通知Contact关于主机的在线时间。
service_notification_period #能够通知Contact中定义的那个简称联系人，关于服务的问题或恢复正常的时间段。
host_notification_options    #主机在什么状态下会给联系人发通知。各个参数的描述如下：d=当主机的状态处于down时，发送通知；f=当主机状态处于stop时发送通知。r=当主机恢复up状态时发送通知。n=什么状态下都不发送通知(w-warning , u-unknown,c-critical,r-recovery;d-down,u-unreachable)。
service_notification_options #服务在什么状态下会给联系人发通知。各个参数的描述如下：w=当服务处于警告状态时发送通知 u=当服务的状态处于unknown时，发送通知；f=当服务状态处于启动和停止时发送通知。c=当服务处于Critical状态时发送通知。n=什么状态下都不发送通知。
host_notification_commands   #一个通知联系人关于主机问题或恢复正常的联系手段的一个列表。多个手段之间用逗号隔开。
service_notification_commands#一个通知联系人关于服务问题或恢复正常的联系手段的一个列表。多个手段之间用逗号隔开。
email                        #联系人的email地址。这个将取决于你是如何定义你的notification commands.它可以用来给联系人发送紧急邮件。在相应的环境中。宏定义$CONTACTEMAIL$将会包含它的值。
pager        #了传真地址
}

define contactgroup{      #这段是用来定义一个联系人组。
contactgroup_name      #联系组名称，通常定义得较短
alias                  #联系组别名，通常定义得较长
members               #联系组成员
}

define command{
command_name #定义命令的简称
command_line #定义当服务进行时Nagios要执行的动作。在命令执行以前，所有合法的宏都要被他们的值代替。
}