一、Nagios实现原理
Nagios是一款开源免费的网络监视工具,能有效的监控Windows、Linux和Unix的主机状态,交换机路由器等网络设备,打印机等,但它本身并不包括这部分功能,所有的监控、检测功能都是通过各种插件来完成的,nagios会周期性的自动调用插件去检查服务器状态,同时nagios会维持一个队列,所有的插件返回来的状态信息都进入队列,nagios每次都从队首读取信息,经过处理后将状态信息通过web显示出来,在系统或服务状态出现异常时发出邮件或者短信报警,第一时间通知运维人员。简单来说,它是一个监视系统运行状态和网络信息的监视系统,能监视所指定的本地或远程主机及服务,同时提供异常通知的功能。
Nagios监控的功能
1)监控网络监控:SMTP、POP3、HTTP、ICMP、FTP、SSH等;
2)主机资源监控:cpu load、disk usage、system logs;
3)通过配置Nagios运程执行插件远程执行脚本;
4)可以指定自己编写的Plugin通过网络收集数据监控任何情况,如温度、警告等;
Nagios组件:
Nagios-plugins是官方提供的一套插件程序,nagios监控主机的功能其实就是通过执行插件程序来实现的。它默认安装路径为/usr/local/nagios/libexec,在这个目录下存在nagios自带的插件,check-disk是检查磁盘空间的插件,check-load是检查cpu负载的等等。
Nagios附加组件NRPE,check-nrpe插件位于监控主机上,nrpe daemon运行于远程主机上,nrpe的检测方式分为两种:
1)直接检测:nrpe位于远程主机上,当监控主机将监控请求发送给nrpe后,nrpe调用插件来完成监控;
2)间接检测:当运行nagios的监控主机无法访问到某台被监控主机,但是运行NRPE的机器可以访问到时,nrpe就可以充当一个代理,将监控请求发送到被监控主机(被监控主机与监控主机通常在一个网段内)。
Nagios监控远程主机的或者资源情况,一般过程:
1)nagios会运行check-nrpe插件,告诉它要检查什么;
2)check-nrpe插件会通过SSL远程连接到nrpe daemon;
3)nrpe daemon会运行相应的nagios插件来执行本地的服务和状态;
4)nrpe daemon将检查结果返回给主机端的check-nrpe插件,插件将状态信息交给nagios状态队列中;
5)最后Nagios一次读取队列中的信息,将结果显示出来。
Nagios识别四种状态返回信息:0-绿色表示状态正常,1-***表示警告、2-红色表示严重错误、3-深***表示未知错误,nagios根据插件返回回来的值,判断监控对象的状态,并通过web显示出来,以供管理员及时发现错误。
二、Nagios监控本机服务配置
实验环境:172.25.16.2 监控主机
172.25.16.3 被监控主机
iptables和selinux关闭
1.在nagios服务端安装依赖包
[root@server2 ~]# yum install -y gcc glibc glibc-common gd gd-devel xinetd openssl-devel
[root@server2 ~]# rpm -qa gcc glibc glibc-common gd gd-devel xinetd openssl-devel
gcc-4.4.7-4.el6.x86_64
xinetd-2.3.14-39.el6_4.x86_64
glibc-common-2.12-1.132.el6.x86_64
glibc-2.12-1.132.el6.x86_64
openssl-devel-1.0.1e-15.el6.x86_64
gd-2.0.35-11.el6.x86_64
gd-devel-2.0.35-11.el6.x86_64
[root@server2 nagios-cn-3.2.3]# useradd -d /usr/local/nagios/ -M nagios
2.安装nagios
[root@server2 ~]# tar jxf nagios-cn-3.2.3.tar.bz2
[root@server2 ~]# cd nagios-cn-3.2.3
[root@server2 nagios-cn-3.2.3]# yum install -y perl-ExtUtils-Embed
[root@server2 nagios-cn-3.2.3]# ./configure --prefix=/usr/local/nagios --enable-embedded-perl //安装perl模块
General Options:
-------------------------
Nagios executable: nagios
Nagios user/group: nagios,nagios
Command user/group: nagios,nagios
Embedded Perl: yes, with caching
Event Broker: yes
Install ${prefix}: /usr/local/nagios
Lock file: ${prefix}/var/nagios.lock
Check result directory: ${prefix}/var/spool/checkresults
Init directory: /etc/rc.d/init.d
Apache conf.d directory: /etc/httpd/conf.d
Mail program: /bin/mail
Host OS: linux-gnu
Web Interface Options:
------------------------
HTML URL: http://localhost/nagios/
CGI URL: http://localhost/nagios/cgi-bin/
Traceroute (used by WAP):
3.安装nagios-plugin插件
[root@server2 ~]# tar zxf nagios-plugins-2.0.3.tar.gz
[root@server2 ~]# cd nagios-plugins-2.0.3
[root@server2 nagios-plugins-2.0.3]# rpm -q mysql-devel
package mysql-devel is not installed
[root@server2 nagios-plugins-2.0.3]# ./configure --enable-perl-modules --enable-libtap
[root@server2 conf.d]# make && make install
[root@server2 nagios-plugins-2.0.3]# cd /etc/httpd/conf.d/
[root@server2 conf.d]# ls
nagios.conf README welcome.conf
[root@server2 conf.d]# vim nagios.conf
4.创建系统用户nagios
[root@server2 nagios-cn-3.2.3]# useradd -d /usr/local/nagios/ -M nagios
[root@server2 nagios]# chown -R nagios.nagios * #使nagios用户对nagios目录可写
[root@server2 nagios]# id apache
uid=48(apache) gid=48(apache) groups=48(apache)
[root@server2 nagios]# usermod -G nagios apache #使apache用户对nagios目录可写
[root@server2 nagios]# id apache
uid=48(apache) gid=48(apache) groups=48(apache),1001(nagios)
5.创建nagios监控页面的访问用户和访问秘密
[root@server2 conf.d]# cat /usr/local/nagios/etc/htpasswd.users
nagiosadmin:gCWSDnqEHR45c
[root@server2 conf.d]# htpasswd -m /usr/local/nagios/etc/htpasswd.users nagiosadmin
New password:
Re-type new password:
Updating password for user nagiosadmin
6.创建hosts.cfg和services.cfg文件,修改配置文件nagios.cfg
[root@server2 objects]# cp -p localhost.cfg hosts.cfg
[root@server2 objects]# cp -p localhost.cfg services.cfg
[root@server2 ~]# vim /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/commands.cfg
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/objects/templates.cfg
cfg_file=/usr/local/nagios/etc/objects/hosts.cfg
cfg_file=/usr/local/nagios/etc/objects/services.cfg
7.修改配置文件host.cfg
[root@server2 ~]# vim /usr/local/nagios/etc/objects/hosts.cfg
define host{ #本主机定义
use linux-server
host_name server2.example.com
alias Manager
address 172.25.16.2
icon_p_w_picpath switch.gif
statusmap_p_w_picpath switch.gd2
2d_coords 500,200
3d_coords 500,200,100
}
# Define an optional hostgroup for Linux machines
define hostgroup{ #本主机组定义
hostgroup_name linux-servers ; #The name of the hostgroup
alias Linux Servers ; #Long name of the group
members * ; # Comma separated list of hosts that belong to this group
}
8.修改service.cfg配置文件
[root@server2 ~]# vim /usr/local/nagios/etc/objects/services.cfg
define servicegroup{ #定义服务组
servicegroup_name 系统负荷检查
alias 负荷检查
members server2.example.com,进程总数,server2.example.com,登录用户数,server2.example.com,根分区,server2.example.com,交换空间利用率
}
define service{ #定义服务PING
use local-service ; Name of service template to use
host_name *
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
define service{ #定义根分区
use local-service ; Name of service template to use
host_name server2.example.com
service_description 根分区
check_command check_local_disk!20%!10%!/
}
define service{
use local-service ; Name of service template to use
host_name server2.example.com
service_description 登录用户数
check_command check_local_users!20!50
}
define service{
use local-service ; Name of service template to use
host_name server2.example.com
service_description 进程总数
check_command check_local_procs!250!400!RSZDT
}
[root@server2 objects]# ll templates.cfg timeperiods.cfg contacts.cfg commands.cfg
-rw-rw-r--. 1 nagios nagios 7790 Aug 26 21:28 commands.cfg
-rw-rw-r--. 1 nagios nagios 2166 Aug 28 18:39 contacts.cfg
-rw-rw-r--. 1 nagios nagios 10887 Aug 26 21:28 templates.cfg
-rw-rw-r--. 1 nagios nagios 3209 Aug 26 21:28 timeperiods.cfg
9.监测nagios配置文件的正确性及重启
[root@server2 ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Total Warnings: 0
Total Errors: 0
[root@server2 objects]# /etc/init.d/nagios reload
Running configuration check...done.
Reloading nagios configuration...done.
[root@server2 nagios]# /etc/init.d/nagios restart
#在客户端登陆服务器查看监测的主及服务状态http://172.25.16.2/nagios
三、nagios监控Mysql主机上的mysql服务
1.在被监控主机上安装mysql
[root@server3 ~]# yum install -y mysql-server
[root@server3 ~]# /etc/init.d/mysqld start
[root@server3 ~]# mysql_secure_installation
2.创建监控监测账户nagios
[root@server3 ~]# mysql -uroot -pwestos
mysql> create database nagios;
mysql> grant all on nagios.* to nagios@'172.25.16.2' identified by 'westos';
Query OK, 0 rows affected (0.01 sec)
mysql> flush peivileges;
3.监测nagios主机是否可以连接到mysql主机上的mysql服务
[root@server2 libexec]# mysql -h 172.25.16.3 -unagios -pwestos nagios
[root@server2 libexec]# ./check_mysql -H 172.25.16.3 -n
MySQL OK - Version: 5.1.71 (protocol 10)
[root@server2 libexec]# ./check_mysql -H 172.25.16.3 -u nagios -p westos -d nagios
Uptime: 454 Threads: 1 Questions: 34 Slow queries: 0 Opens: 16 Flush tables: 1 Open tables: 9 Queries per second avg: 0.74|Connections=14c;;; Open_files=18;;; Open_tables=9;;; Qcache_free_memory=0;;; Qcache_hits=0c;;; Qcache_inserts=0c;;; Qcache_lowmem_prunes=0c;;; Qcache_not_cached=0c;;; Qcache_queries_in_cache=0;;; Queries=34c;;; Questions=34c;;; Table_locks_waited=0c;;; Threads_connected=1;;; Threads_running=1;;; Uptime=454c;;;
4.修改配置文件,添加对mysql服务监控的定义
[root@server2 objects]# vim hosts.cfg #添加下面定义
define host{
use linux-server
host_name server3.example.com
alias Mysql服务器
parents server2.example.com
address 172.25.16.3
icon_p_w_picpath switch.gif
statusmap_p_w_picpath switch.gd2
2d_coords 400,300
3d_coords 400,300,100
}
[root@server2 objects]# vim services.cfg
define service{
use local-service ; Name of service template to use
host_name server3.example.com
service_description Mysql
check_command check_mysql!nagios!westos!nagios
notifications_enabled 0
}
[root@server2 objects]# vim commands.cfg
# 'check_mysql' command definition
define command{
command_name check_mysql
command_line $USER1$/check_mysql -H $HOSTADDRESS$ -u $ARG1$ -p $ARG2$-d $ARG3$
}
5.监测语法正确并重载
[root@server2 objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
[root@server2 objects]# /etc/init.d/nagios restart
#在客户端测试登陆服务器,查看监测的服务