近期公司需要需要搭建一个nagios的监控系统,以下是搭建的配置和过程。


操作系统:Ubuntu 12.04.3 LTS

内核:3.2.0-29-generic

位数:x86_64


首先安装nagios安装环境,apache+php+mysql

apt-get install apache2 libapache2-mod-php5 php5 mysql-server mysql-client libssl0.9.8 libssl-dev openssl gcc make libmysqlclient-dev

1,服务器端安装

tar zxvf nagios-3.0.2.tar.gz
cd nagios-3.0.2
./configure --prefix=/usr/local/nagios
make all
make install
make install-init
make install-config
make install-commandmode

chkconfig --add nagios
chkconfig --level 35 nagios on
chkconfig --list nagios

tar zxvf nagios-plugins-1.4.16.tar.gz
cd nagios-plugins-1.4.16
./configure --prefix=/usr/local/nagios
make && make install

关于错误:
Validate.xs:208:5: error: duplicate case value
Validate.xs:205:5: error: previously used here
问题出在Params::Validate 0.88上,经测试,0.90没有问题,所以,替换nagios plugin里的Params::Validate为0.90
下载http://cpan.metacpan.org/authors/id/D/DR/DROLSKY/Params-Validate-0.90.tar.gz
拷贝到目录nagios-plugins-1.4.16/perlmods,删除之前的0.88版本



编译httpd.conf文件里面添加如下内容:

ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin" 
<Directory "/usr/local/nagios/sbin"> 
 AuthType Basic 
 Options ExecCGI 
 AllowOverride None 
 Order allow,deny 
 Allow from all 
 AuthName "Nagios Access" 
 AuthUserFile /usr/local/nagios/etc/htpasswd //用于此目录访问身份验证的文件 
 Require valid-user 
</Directory> 
Alias /nagios "/usr/local/nagios/share" 
<Directory "/usr/local/nagios/share"> 
 AuthType Basic 
 Options None 
 AllowOverride None 
 Order allow,deny 
 Allow from all 
 AuthName "nagios Access" 
 AuthUserFile /usr/local/nagios/etc/htpasswd 
 Require valid-user 
</Directory> 

添加nagios的web访问账号和密码 可以自定义:
/usr/local/apache2/bin/htpasswd -c /usr/local/nagios/etc/htpasswd david

vi /usr/local/nagios/etc/cgi.cfg
把use_authentication=1修改为use_authentication=0 不然会报错
nagios web界面提示
It appears as though you do not have permission to view information for any of the services you requested...

authorized_for_system_commands=nagiosadmin,www --www为http访问授权用户
authorized_for_all_services=nagiosadmin,www
authorized_for_all_hosts=nagiosadmin,www
authorized_for_all_service_commands=nagiosadmin,www
authorized_for_all_host_commands=nagiosadmin,www

tar -zxvf nrpe-2.8.1.tar.gz
cd nrpe-2.8.1
./configure
make all
make install
make install-plugin
make install-daemon
make install-daemon-config

chown -R nagios:nagios /usr/local/nagios/

编辑command.cfg文件
添加以下内容
define command{
 command_name check_nrpe
 command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t 30
 }


这样服务器端要安装的软件就安全齐全了
启动nagios服务:
service nagios start
Nagios 启动的时候有下面的警告:
Starting nagios:No directory, logging in with HOME=/
done.
解决办法:
修改/etc/passwd
nagios:x:1001:1001::/home/nagios:/bin/sh 改为:
nagios:x:1001:1001::/usr/local/nagios:/bin/sh


开始配置服务器端的配置文件。

首先修改nrpe的配置文件nrpe.cgi文件
allowed_hosts=127.0.0.1,192.168.1.11,192.168.1.8 #多个ip地址用逗号隔开,这个是nrpe允许哪些ip地址来获取数据。(添服务器端ip地址)

不然会出现如下报错:
CHECK_NRPE: Error -Could not complete SSL handshake错误
这个报错主要是因为该服务器有多IP导致,解决办法就是将服务器的所有IP都增加到nrpe.cfp配置文件中的参数allowed_hosts中便可解决

开启nrpe服务:
启动方式:/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

在nagios.cgi配置文件里面可以自定义配置文件或者目录如:
# You can specify individual object config files as shown below:
cfg_file=/usr/local/nagios/etc/objects/commands.cfg
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/objects/templates.cfg
#cfg_file=/usr/local/nagios/etc/objects/service.cfg
cfg_dir=/usr/local/nagios/etc/objects/service
我直接添加了一个service目录,以后再里面添加配置文件方便一些
在service目录下面添加配置文件192.168.1.101.cfg #记住配置文件的后缀必须是.cfg格式不然无法识别配置文件

define host{
 use linux-server
 host_name 192.168.1.101
 address 192.168.1.101
 } 


define service{
 use generic-service
 host_name 192.168.1.101
 service_description current load
 check_command check_nrpe!check_load
 }

define service{
 use generic-service
 host_name 192.168.1.101
 service_description current disk
 check_command check_nrpe!check_hda1
 }

define service{
 use generic-service
 host_name 192.168.1.101
 service_description current procs
 check_command check_nrpe!check_total_procs
 }

define service{
 use generic-service
 host_name 192.168.1.101
 service_description current users
 check_command check_nrpe!check_users
 }

define service{
 use generic-service
 host_name 192.168.1.101
 service_description crurrent zombie procs
 check_command check_nrpe!check_zombie_procs
 }

define service{
 use generic-service
 host_name 192.168.1.101
 service_description nginx
 check_command check_http!-H bj.hao123.com.cn -u /1.html 
 }


define service{
 use generic-service
 host_name 192.168.1.101
 service_description tomcat
 check_command check_http!-H bj.hao123.com.cn -u /test_jk.jsp
 }




define service{
 use generic-service
 host_name 192.168.1.101
 service_description ssh
 check_command check_ssh
}


define service{
 use generic-service
 host_name 192.168.1.101
 service_description ping
 check_command check_ping!100.0,20%!200.0,50%
}


define service{
 use generic-service
 host_name 192.168.1.101
 service_description current swap
 check_command check_nrpe!check_swap
 }

define service{
 use generic-service
 host_name 192.168.1.101
 service_description mysql
 check_command check_mysql!192.168.1.101!3306!nagios!nagios123
 }

上面的配置文件就监控了一些服务,如果还有需要监控的服务直接在里面添加就可以。

在安装服务器端nagios的时候报了一些错误,现在总结一下看大家是否有遇见的。
ubuntu系统没安装libmysqlclient-dev,编译nagios时会发现在nagios/libexec没有check_mysql等命令
vi command.cfg
define command{
 command_name check_mysql
 command_line $USER1$/check_mysql -H $ARG1$ -P $ARG2$ -u $ARG3$ -p $ARG4$
 }


2,在客户端安装nagios客户端

tar zxvf nagios-plugins-1.4.16.tar.gz
cd nagios-plugins-1.4.16/perlmods
rm -rf Params-Validate-0.8*
wget http://cpan.metacpan.org/authors/id/D/DR/DROLSKY/Params-Validate-0.90.tar.gz
cd ..
./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-gourp=nagios --enable-perl-modules
make && make install

cd ..

tar -zxvf nrpe-2.8.1.tar.gz
cd nrpe-2.8.1
./configure
make all
make install
make install-plugin
make install-daemon
make install-daemon-config 

chown -R nagios:nagios /usr/local/nagios/

sed -i 's#allowed_hosts=127.0.0.1#allowed_hosts=192.168.1.102,127.0.0.1#g' /usr/local/nagios/etc/nrpe.cfg 

/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

如果要监控客户端上面的mysql就需要在客户端的mysql里面建立检测的账号
建立监控mysql数据库
create database nagios;
grant select on nagios.* to nagios@'%' identified by 'nagios123';
flush privileges;
select User,Host from mysql.user;

如果在监控mysql的时候报错:

ERROR 1045 (28000): Access denied for user 'nagios'@'localhost' (using password: YES)

解决办法:

修改配置文件my.cnf

bind-address = 127.0.0.1 为

bind-address = 0.0.0.0


3,nagios的邮件报警配置

在nagios服务器端安装:

wget http://caspian.dotconf.net/menu/Software/SendEmail/sendEmail-v1.56.tar.gz
tar -zxvf sendEmail-v1.56.tar.gz && cd sendEmail-v1.56
cp sendEmail /usr/local/bin
chmod 0755 /usr/local/bin/sendEmail


/usr/local/bin/sendEmail -f wangba123@163.com -t wangba@126.com -s smtp.163.com -u "send by 123123" -xu wangba123 -xp wangba234 -m "222222222222222222222222222222211111111111111111111111111111"

各参数含义如下:

-f 表示发送者的邮箱, 可随意设置

-t 表示接收者的邮箱

-s 表示SMTP服务器的域名或者IP

-u 表示邮件的主题

-m 表示邮件的内容

-xu 表示SMTP验证的用户名, 如果SMTP服务器需要验证的话就加上该参数, 一般发给外网用户则需要 -->163邮箱的账号

-xp 表示SMTP验证的密码, 如果SMTP服务器需要验证的话就加上该参数, 一般发给外网用户则需要 --> 163邮箱密码


测试成功后,修改command.cfg文件

define command{
 command_name notify-host-by-email
 command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: 
$HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/local/bin/sendEmail -f wangba123@163.com -t $CONTACTEMA -u "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" -s smtp.163.com -xu wangba123 -xp wangba234
 }

# 'notify-service-by-email' command definition
define command{
 command_name notify-service-by-email
 command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\
nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT
$" | /usr/local/bin/sendEmail -f wangba123@163.com -t $CONTACTEMAIL$ -u "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERV
ICEDESC$ is $SERVICESTATE$ **" -s smtp.163.com -xu wangba123 -xp wangba234
 }


到此邮件报警设置完成


4,nagios 增加Pnp画图的支持


首先安装rrdtool

apt-install rrdtool librrds-perl php5-gd


tar zxvf pnp-0.4.12.tar.gz

cd pnp-0.4.12
./configure --with-nagios-user=nagios --with-nagios-group-nagios --with-rrdtool=/usr/bin/rrdtool --with-perfdata=/usr/local/nagios/share/perfdata
make 
make all 
make install 
make install-config 
make install-init



要产生图形数据还需在commands.cfg中重定义命令:注释掉原来的 process-service-perfdata命令不然会冲突

define command{ 
command_name process-service-perfdata 
command_line /usr/local/nagios/libexec/process_perfdata.pl 
}


编辑 nagios.cfg 修改以下内容

#process_performance_data=0
process_performance_data=1
service_perfdata_command=process-service-perfdata

在安装rrdtool的过程中也许会出现如下错误:

RRDs... ***FAIL***

可安装一下两个包解决:

sudo apt-get install rrdtool

sudo apt-get install librrds-perl

以上能解决问题



若发现GD... ***FAIL***

则安装一下包解决:

apt-get install libgd-gd2-perl

apt-get install php5-gd



nagios中 Status Map 无法显示的解决办法

查看/usr/local/nagios/sbin目录下没有statusmap.cgi文件。所以导致的Status Map无法显示,

没有这个statusmap.cgi的原因是因为没有gd-devel包,我的服务器上面安装了gd包的但是还是没有,

直接重新编译源码包

./configure --prefix=/usr/local/nagios

make all 之后就别make install了 直接进入 源码包里面的cgi目录下面查看是否有statusmap.cgi文件。

发现真有这个文件 直接把文件拷贝到/usr/local/nagios/sbin目录下这样就不会覆盖原来的配置。问题解决。重启nagios





转载于:https://blog.51cto.com/lpy123/1316121