什么是Nagios?
Nagios是一款用于系统和网络监控的应用程序。它可以在你设定的条件下对主机和服务进行监控,在状态变差和变好的时候给出告警信息。
Nagios最初被设计为在Linux系统之上运行,然而它同样可以在类Unix的系统之上运行。
Nagios更进一步的特征包括:
1. 监控网络服务(SMTP、POP3、HTTP、NNTP、PING等);
2. 监控主机资源(处理器负荷、磁盘利用率等);
3. 简单地插件设计使得用户可以方便地扩展自己服务的检测方法;
4. 并行服务检查机制;
5. 具备定义网络分层结构的能力,用"parent"主机定义来表达网络主机间的关系,这种关系可被用来发现和明晰主机宕机或不可达状态;
6. 当服务或主机问题产生与解决时将告警发送给联系人(通过EMail、短信、用户定义方式);
7. 具备定义事件句柄功能,它可以在主机或服务的事件发生时获取更多问题定位;
8. 自动的日志回滚;
9. 可以支持并实现对主机的冗余监控;
10. 可选的WEB界面用于查看当前的网络状态、通知和故障历史、日志文件等;
2.2. 系统需求
Nagios所需要的运行条件是机器必须可以运行Linux(或是Unix变种)并且有C语言编译器。你必须正确地配置TCP/IP协议栈以使大多数的服务检测可以通过网络得以进行。
你需要但并非必须正确地配置Nagios里的CGIs程序,而一旦你要使用CGI程序时,你必须要安装以下这些软件...
1. 一个WEB服务(最好是Apache)
2. Thomas Boutell制作的gd库版本应是1.6.3或更高(在CGIs程序模块statusmap和trends这两个模块里需要这个库)
安装Nagios
1、准备工作
a.安装依赖的程序
]# yum -y install httpd gcc glibc glib-common gd gd-devel
b.创建用户和组
[root@localhost ~]# useradd nagios
[root@localhost ~]# groupadd nagcmd
[root@localhost ~]# usermod -G nagcmd nagios
[root@localhost ~]# usermod -G nagcmd apache
2、正式开始安装nagios程序
]# tar zxvf nagios-3.2.0.tar.gz
]# cd nagios-3.2.0
]# ./configure --with-command-group=nagcmd
]# make all
如下提示会出现,根据提示操作
*** Compile finished ***
If the main program and CGIs compiled without any errors, you
can continue with installing Nagios as follows (type 'make'
without any arguments for a list of all possible options):
make install
- This installs the main program, CGIs, and HTML files
make install-init
- This installs the init script in /etc/rc.d/init.d
make install-commandmode
- This installs and configures permissions on the
directory for holding the external command file
make install-config
- This installs *SAMPLE* config files in /usr/local/nagios/etc
You'll have to modify these sample files before you can
use Nagios. Read the HTML documentation for more info
on doing this. Pay particular attention to the docs on
object configuration files, as they determine what/how
things get monitored!
make install-webconf
- This installs the Apache config file for the Nagios
web interface
*** Support Notes *******************************************
If you have questions about configuring or running Nagios,
please make sure that you:
- Look at the sample config files
- Read the HTML documentation
- Read the FAQs online at http://www.nagios.org/faqs
before you post a question to one of the mailing lists.
Also make sure to include pertinent information that could
help others help you. This might include:
- What version of Nagios you are using
- What version of the plugins you are using
- Relevant snippets from your config files
- Relevant error messages from the Nagios log file
For more information on obtaining support for Nagios, visit:
http://www.nagios.org/support/
*************************************************************
Enjoy.
make install ------- /usr/local/nagios/share/ 监控站点页面
make install-init ----- /etc/init.d/nagios
make install-commandmode
make install-config ------ /usr/local/nagios/etc/ nagios的主配置文件
make install-webconf ------ /etc/httpd/conf.d/nagios.conf
读这个文件,可以看到页面别名及监控配置
/etc/httpd/conf.d/nagios.conf
需要配置验证的用户和密码
[root@localhost nagios-3.2.0]# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin
New password:
Re-type new password:
Adding password for user nagiosadmin
[root@localhost nagios-3.2.0]# htpasswd /usr/local/nagios/etc/htpasswd.users user1
New password:
Re-type new password:
Adding password for user user1
[root@localhost nagios-3.2.0]# cat /usr/local/nagios/etc/htpasswd.users
nagiosadmin:FuD2.sNj9En4c
user1:hDlnlLQBCPmCA
vim /etc/httpd/conf/http.conf
391 DirectoryIndex index.php index.html index.html.var
]# service httpd restart
由于 监控站点是php+cgi写的。所以需要apache支持php,cgi
安装apache时加上cgi模块(apache本身的模块)的支持
安装php包生成php模块(第三方模块),给apache用
]# rpm -qf /etc/httpd/modules/libphp5.so
php-5.1.6-27.el5
]# rpm -qf /etc/httpd/modules/mod_cgi.so
httpd-2.2.3-43.el5
]# yum -y install php
]# service httpd restart
停止 httpd: [确定]
启动 httpd: [确定]
http://192.168.1.254/nagios/
配置nagios
1、安装监控插件
]# tar zxvf nagios-plugins-1.4.13.tar.gz
]# cd nagios-plugins-1.4.13
]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios
]# make && make install
会在此目录下生成插件文件
]# ls /usr/local/nagios/libexec/
check_apt check_imap check_pop
check_breeze check_ircd check_procs
check_by_ssh check_ldap check_real
check_clamd check_ldaps check_rpc
check_cluster check_load check_sensors
check_dhcp check_log check_smtp
check_dig check_mailq check_ssh
check_disk check_mrtg check_swap
check_disk_smb check_mrtgtraf check_tcp
check_dns check_nagios check_time
check_dummy check_nntp check_udp
check_file_age check_nt check_ups
check_flexlm check_ntp check_users
check_ftp check_ntp_peer check_wave
check_http check_ntp_time negate
check_icmp check_nwstat urlize
check_ide_smart check_oracle utils.pm
check_ifoperstatus check_overcr utils.sh
check_ifstatus check_ping
]#
利用这些插件提供的功能来去监控
监控主机的私有服务CPU/DISK
临近主机的公共服务 HTTP FTP SAMBA....
1、如何监控本机
2、如何监控其它主机
监控本机
监控主机的私有服务CPU/DISK
监控主机的公共服务 HTTP FTP SAMBA....
1、如何监控本机
cd /usr/local/nagios/etc/objects
commands.cfg
里面定义了命令名 和命令语法
---- 将在localhost.cfg之类里面使用命令名
use 命令名 ---- 定义好的名称
contacts.cfg
里面定义了联系人的名字和邮件地址
----将在其它配置文件中使用联系人名
use 联系人名
templates.cfg
里面定义了一些模板
-------将在localhost.cfg之类的文件中使用模板名称
use 模板名称
timeperiods.cfg
里面定义了时间周期 workhours / 24x7
use 周期名称
监控本机的文件,默认有一个:
/usr/local/nagios/etc/objects/localhost.cfg
localhost.cfg
[root@localhost objects]# cat /usr/local/nagios/etc/objects/localhost.cfg
# HOST DEFINITION
# Define a host for the local machine
define host{
use linux-server ; Name of host template to use
host_name localhost
alias localhost
address 127.0.0.1
}
use xxx 是在其它几个.cfg文件中定义过的。
# HOST GROUP DEFINITION
define hostgroup{
hostgroup_name linux-servers ; The name of the hostgroup
alias Linux Servers ; Long name of the group
members localhost ; Comma separated list of hosts that belong to this group
}
# SERVICE DEFINITIONS
# Define a service to "ping" the local machine
define service{
use local-service ; Name of service template to use
host_name localhost
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
命令必须 commands.cfg中定义过
# Define a service to check the disk space of the root partition
# on the local machine. Warning if < 20% free, critical if
# < 10% free space on partition.
define service{
use local-service ; Name of service template to use
host_name localhost
service_description Root Partition
check_command check_local_disk!20%!10%!/
}
# Define a service to check the number of currently logged in
# users on the local machine. Warning if > 20 users, critical
# if > 50 users.
define service{
use local-service ; Name of service template to use
host_name localhost
service_description Current Users
check_command check_local_users!20!50
}
# Define a service to check the number of currently running procs
# on the local machine. Warning if > 250 processes, critical if
# > 400 users.
define service{
use local-service ; Name of service template to use
host_name localhost
service_description Total Processes
check_command check_local_procs!250!400!RSZDT
}
define service{
use local-service ; Name of service template to use
host_name localhost
service_description Total Running Processes
check_command check_local_procs!3!5!R
}
# Define a service to check the load on the local machine.
define service{
use local-service ; Name of service template to use
host_name localhost
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
}
# Define a service to check the swap usage the local machine.
# Critical if less than 10% of swap is free, warning if less than 20% is free
define service{
use local-service ; Name of service template to use
host_name localhost
service_description Swap Usage
check_command check_local_swap!20!10
}
# Define a service to check SSH on the local machine.
# Disable notifications for this service by default, as not all users may have SSH enabled.
define service{
use local-service ; Name of service template to use
host_name localhost
service_description SSH
check_command check_ssh
notifications_enabled 0
}
# Define a service to check HTTP on the local machine.
# Disable notifications for this service by default, as not all users may have HTTP enabled.
#第一步:必须在/usr/local/nagios/etc/objects/commands.cfg中定义check_http命令,定义如下:
#define command{
# command_name check_http
# command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
# }
#define command{
# command_name check_dns
# command_line $USER1$/check_dns -H $HOSTADDRESS$ $ARG1$
#}
define service{
use local-service ; Name of service template to use
host_name localhost
service_description HTTP
check_command check_http!localhost!-u /test.html -t 3 -s "TEST"
#你一定要确保命令在命令行下是能执行成功
#[root@localhost libexec]# ./check_http -I localhost -u /test.html -t 3 -s "TEST"
#HTTP OK HTTP/1.1 200 OK - 0.001 second response time |time=0.000792s;;;0.000000 size=266B;;;0
notifications_enabled 0
}
define service{
use local-service ; Name of service template to use
host_name localhost
service_description DNS
check_command check_dns!localhost!-s localhost -w 2 -c 10
notifications_enabled 0
}
#你一定要确保命令在命令行下是能执行成功
#[root@localhost libexec]# ./check_dns -H localhost -s localhost -w 1 -c 3
#DNS WARNING: 1.006 second response time. localhost returns 127.0.0.1|time=1.006259s;;;0.000000
~
check_dns/check_http命令是自己在 commands.cfg中定义的。
$USER1$/check_dns 、 $USER1$/check_http插件必须存在!!!!!
$USER1$是在resources.cfg中定义好的宏。
]# grep USER1 /usr/local/nagios/etc/resource.cfg
$USER1$=/usr/local/nagios/libexec
[root@localhost nagios]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
[root@localhost nagios]# /etc/init.d/nagios start
Starting nagios: done.
[root@localhost nagios]#
http://192.168.1.254/nagios/index.php
注意1::::
DNS必须启动,并且 能解析localhost域名,以下命令才能执行成功
[root@localhost libexec]# ./check_dns -H localhost -s localhost -w 1 -c 3
#DNS WARNING: 1.006 second response time. localhost returns 127.0.0.1|time=1.006259s;;;0.000000
注意2::::
HTTPD服务
cd /var/www/html
touch test.html
echo TEST >> test.html
vim /etc/httpd/conf/http.conf
DirectoryIndex test.html index.php index.html index.html.var
service httpd restart
注意3::::
检查了SSH
你需要把sshd启动
监控其它主机
1 、
]# grep 101 /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/192.168.1.101.cfg
2、
]# cd /usr/local/nagios/etc/objects/
[root@localhost objects]# cp -a localhost.cfg 192.168.1.101.cfg
3、编辑192.168.1.101.cfg定义如何监控
libexec]# ./check_dns -H www.baidu.com -s 192.168.1.101 -w 1 -c 3
192。168。1。101上确实配置了DNS,能解析www.baidu.com
其它主机的公有服务监控192.168.1.101.cfg
[root@localhost objects]# grep ^[^#] 192.168.1.101.cfg
define host{
use linux-server ; Name of host template to use
host_name baidu
alias 101 host
address 192.168.1.101
}
define hostgroup{
hostgroup_name linux-servers1 ; The name of the hostgroup
alias Linux Servers ; Long name of the group
members baidu ; Comma separated list of hosts that belong to this group
}
define service{
use local-service ; Name of service template to use
host_name baidu
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
define service{
use local-service ; Name of service template to use
host_name baidu
service_description SSH
check_command check_ssh
notifications_enabled 0
}
define service{
use local-service ; Name of service template to use
host_name baidu
service_description HTTP
check_command check_http!localhost!-u /test.html -t 3 -s "TEST"
notifications_enabled 0
}
define service{
use local-service ; Name of service template to use
host_name baidu
service_description DNS
check_command check_dns!www.baidu.com!-s dns.up1.com -w 5 -c 10
notifications_enabled 0
}
关于use xxx
是使用已经定义好的名称!!!!
[root@localhost objects]# pwd
/usr/local/nagios/etc/objects
[root@localhost objects]# grep 'name.*local-service' *
templates.cfg: name local-service ; The name of this service template
[root@localhost objects]# vim templates.cfg
[root@localhost objects]# grep 'name.*generic-service' *
templates.cfg: name generic-service ; The 'name' of this service template
利用NRPE插件监控其它主机的私有服务
[192.168.1.254] ------------ 192.168.1.101
monitor host remote host
check_nrpe NRPE进程
192。168。1。101
配置NRPE程序,让其自行取私有信息,最后交给监控主机
1、创建nagios用户
2、安装nagios-plugin包
]# tar zxvf nagios-plugins-1.4.13.tar.gz
]# cd nagios-plugins-1.4.13
]# ./configure --with-nagios-user=nagios --with-nagios-group=nagios
]# make
]# make install
生成了如下插件文件
/usr/local/nagios/libexec/check_*
3、安装xinetd服务,因为nrpe服务是受xinetd管理的服务
4、安装nrpe
]# tar zxvf nrpe-2.12.tar.gz
]# cd nrpe-2.12
]# evince docs/NRPE.pdf
]# make all 编译
]# make install-plugin 生成/usr/local/nagios/libexec/check_nrpe
]# make install-daemon 生成 /usr/local/nagios/bin/nrpe
]#make install-daemon-config 生成nrpe程序的配置文件 /usr/local/nagios/etc/nrpe.cfg
]# make install-xinetd 生成/etc/xinetd.d/nrpe
]# vim /etc/xinetd.d/nrpe
only_from = 127.0.0.1 192.168.1.254
]# vim /etc/service
nrpe 5666/tcp # NRPE
5、配置NRPE,定义监控本机的哪些服务!
vim /usr/local/nagios/etc/nrpe.cfg
]#vim /usr/local/nagios/etc/nrpe.cfg
command[check_users]=/usr/local/nagios/libexec/check_users -w 2 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_disk_boot]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda1
command[check_disk_root]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda2
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_run_procs]=/usr/local/nagios/libexec/check_procs -w 3 -c 10 -s R
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 50% -c 30%
6、启动服务
[root@15 ~]# /etc/init.d/xinetd restart
[root@15 ~]# netstat -tnlp | grep 5666
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 20956/xinetd
7、测试
在本机检测是否可以连接到127。0。0。1的5666端口
]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
NRPE v2.12
在本机检测nrpe.cfg文件中定义的command是否可用
]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_users
USERS WARNING - 5 users currently logged in |users=5;2;10;0
]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_load
OK - load average: 0.13, 0.04, 0.17|load1=0.130;15.000;30.000;0; load5=0.040;10.000;25.000;0; load15=0.170;5.000;20.000;0;
]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_disk_boot
DISK OK - free space: /boot 82 MB (88% inode=99%);| /boot=11MB;78;88;0;98
]# /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_disk_root
DISK OK - free space: / 24436 MB (53% inode=97%);| /=21495MB;38744;43587;0;48431
++++++++++++++++++++++++++
监控服务器
192。168。1。254
]# yum -y install openssl-devel
1、安装Nrpe插件 ---- /usr/local/nagios/libexec/check_nrpe
]# tar zxvf nrpe-2.12.tar.gz
]# cd nrpe-2.12
]# evince docs/NRPE.pdf
]# make all 编译
]# make install-plugin 生成 /usr/local/nagios/libexec/check_nrpe
ls /usr/local/nagios/libexec/check_nrpe
检测:尝试连接NRPE 5666端口,看是否OK
]# /usr/local/nagios/libexec/check_nrpe -H 192.168.1.15
NRPE v2.12
]# /usr/local/nagios/libexec/check_nrpe -H 192.168.1.15 -c check_users
USERS WARNING - 5 users currently logged in |users=5;2;10;0
2、定义监控配置文件
]# grep 15 /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/192.168.1.15.cfg
3、 定义监控的服务
]# cp -a localhost.cfg 192.168.1.15.cfg
]# vim /usr/local/nagios/etc/objects/commands.cfg
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
定义主配置文件:vim /usr/local/nagios/etc/objects/192.168.1.15.cfg
见下一小节
[root@localhost nrpe-2.12]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
[root@localhost nrpe-2.12]# service nagios restart
Running configuration check...done.
Stopping nagios: done.
Starting nagios: done.
192.168.1.15.cfg
define host{
use linux-server ; Name of host template to use
host_name 1.15
alias 15
address 192.168.1.15
}
define hostgroup{
hostgroup_name linux-servers2 ; The name of the hostgroup
alias Linux Servers ; Long name of the group
members 1.15 ; Comma separated list of hosts that belong to this group
}
# SERVICE DEFINITIONS
# Define a service to "ping" the local machine
define service{
use local-service ; Name of service template to use
host_name 1.15
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
# Define a service to check the disk space of the root partition
# on the local machine. Warning if < 20% free, critical if
# < 10% free space on partition.
define service{
use local-service ; Name of service template to use
host_name 1.15
service_description Root Partition
check_command check_nrpe!check_disk_root
}
define service{
use local-service ; Name of service template to use
host_name 1.15
service_description Boot Partition
check_command check_nrpe!check_disk_boot
}
# Define a service to check the number of currently logged in
# users on the local machine. Warning if > 20 users, critical
# if > 50 users.
define service{
use local-service ; Name of service template to use
host_name 1.15
service_description Current Users
check_command check_nrpe!check_users
}
# Define a service to check the number of currently running procs
# on the local machine. Warning if > 250 processes, critical if
# > 400 users.
define service{
use local-service ; Name of service template to use
host_name 1.15
service_description Total Processes
check_command check_nrpe!check_total_procs
}
define service{
use local-service ; Name of service template to use
host_name 1.15
service_description Total Running Processes
check_command check_nrpe!check_run_procs
}
define service{
use local-service ; Name of service template to use
host_name 1.15
service_description Total Zombie Processes
check_command check_nrpe!check_zombie_procs
}
# Define a service to check the load on the local machine.
define service{
use local-service ; Name of service template to use
host_name 1.15
service_description Current Load
check_command check_nrpe!check_load
}
# Define a service to check the swap usage the local machine.
# Critical if less than 10% of swap is free, warning if less than 20% is free
define service{
use local-service ; Name of service template to use
host_name 1.15
service_description Swap Usage
check_command check_nrpe!check_swap
}
# Define a service to check SSH on the local machine.
# Disable notifications for this service by default, as not all users may have SSH enabled.
define service{
use local-service ; Name of service template to use
host_name 1.15
service_description SSH
check_command check_ssh
notifications_enabled 0
}
# Define a service to check HTTP on the local machine.
# Disable notifications for this service by default, as not all users may have HTTP enabled.
#define command{
# command_name check_http
# command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
# }
#define command{
# command_name check_dns
# command_line $USER1$/check_dns -H $HOSTADDRESS$ $ARG1$
#}
define service{
use local-service ; Name of service template to use
host_name 1.15
service_description HTTP
check_command check_http!192.168.1.15!-u /test.html -t 3 -s "TEST"
#[root@localhost libexec]# ./check_http -I localhost -u /test.html -t 3 -s "TEST"
#HTTP OK HTTP/1.1 200 OK - 0.001 second response time |time=0.000792s;;;0.000000 size=266B;;;0
notifications_enabled 1
}
define service{
use local-service ; Name of service template to use
host_name 1.15
service_description DNS
check_command check_dns!192.168.1.15!-s 192.168.1.15 -w 2 -c 10
notifications_enabled 1
}
#对于监控DNS来讲,必须配置DNS服务器能正反向解析自已
dns.baidu.com IN A 192.168.1.15
15 IN PTR dns.baidu.com