nagios 搭建

精选转载

weili163 2013-04-22 09:33:46 博主文章分类：linux

文章标签 nagios nagios搭建 文章分类 服务器

Install nagios manual

1. Install OS

2. Software package no full install, according system default to install software packages.

3. Assign to IP address

4. Set up YUM source

4.1 [root@localhost ~]# cd /etc/yum.repos.d/

4.2 Set up mount directory

[root@localhost ~]# mkdir /aaa

直接挂载光盘，不用查看

[root@localhost ~]# mount /dev/cdrom /aaa/

查看挂载好的光盘

[root@localhost ~]# mount

[root@localhost mnt]# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/mapper/VolGroup00-LogVol00

11G 1.9G 8.1G 19% /

/dev/sda1 99M 12M 82M 13% /boot

tmpfs 420M 0 420M 0% /dev/shm

/dev/hdc 2.8G 2.8G 0 100% /mnt

4.3 编辑yum配置文件

[root@localhost yum.repos.d]# cp -rap rhel-debuginfo.repo rhel-debuginfo.repo.bk

[root@localhost Server]# vim /etc/yum.repos.d/rhel-debuginfo.repo

[rhel-suibianqiming]

name=Sui bian qi ming

baseurl=file:///aaa/Server

enabled=1

gpgcheck=0

4.4 YUM source 已经安装成功，下面可以用YUM命令安装所有光盘里面的文件了

4.5 清空yum缓存

[root@localhost yum.repos.d]# yum clean all

Loaded plugins: rhnplugin, security

Cleaning up Everything

[root@localhost yum.repos.d]

4.6 [root@localhost yum.repos.d]# yum update –y

4.7 查看已安装的软件包和未安装的软件包

[root@localhost Server]# yum grouplist

4.8 安装Available Groups:下面的所有软件包：

Available Groups:

Authoring and Publishing

DNS Name Server

Development Libraries

Engineering and Scientific

GNOME Desktop Environment

GNOME Software Development

Java Development

KDE (K Desktop Environment)

KDE Software Development

MySQL Database

News Server

OpenFabrics Enterprise Distribution

PostgreSQL Database

Web Server

X Software Development

Done

4.9 安装举例

[root@localhost Server]# yum groupinstall –y "X Software Development"

4.10接下来安装所有你需要的可安装软件包。

5. 以rpm方式安装LAMP环境

6. [root@localhost yum.repos.d]# yum install *gd*

7. [root@localhost yum.repos.d]# yum install -y *http*

8. [root@localhost yum.repos.d]# yum install -y *mysql*

9. [root@localhost yum.repos.d]# yum install -y *php*

10. [root@localhost yum.repos.d]# yum install *perl* -y

11. [root@localhost yum.repos.d]# yum install -y *zlib*

12. [root@localhost yum.repos.d]# yum install *lib* -y

13. [root@localhost nagios-3.2.0]# yum install *c++*

14. [root@localhost nagios-3.2.0]# yum install *gcc*

15. [root@localhost yum.repos.d]# yum install gcc gcc-c++ wget bison mysql-devel mysql-server php php-mysql php-pear php-pear-DB php-mbstring nano tftp-server httpd make ncurses-devel libtermcap-devel sendmail sendmail-cf caching-nameserver sox newt-devel libxml2-devel libtiff-devel php-gd audiofile-devel gtk2-devel subversion kernel-devel –y

16. [root@localhost yum.repos.d]# yum install e2fsprogs-devel keyutils-libs-devel krb5-devel libogg libselinux-devel libsepol-devel libxml2-devel libtiff-devel gmp php-pear php-pear-DB php-gd php-mysql php-pdo kernel-devel ncurses-devel audiofile-devel libogg-devel openssl-devel mysql-devel zlib-devel perl-DateManip sendmail-cf sox –y

17. [root@localhost yum.repos.d]# yum install *apr* -y

18. [root@localhost yum.repos.d]# yum install freetype*

19. [root@localhost yum.repos.d]# mkdir /var/lib/mysql

20. [root@localhost yum.repos.d]# chown -R mysql.mysql /var/lib/mysql

21. [root@localhost yum.repos.d]# /etc/init.d/httpd start

22. [root@localhost yum.repos.d]# chkconfig --level 35 httpd on

23. [root@localhost yum.repos.d]# mysql_install_db

24. [root@localhost yum.repos.d]# chown -R mysql.mysql /var/lib/mysql

25. [root@localhost yum.repos.d]# /etc/init.d/mysqld start

26. Starting MySQL: [ OK ]

27. [root@localhost yum.repos.d]# mysqladmin -uroot password 123456 \\设置mysql密码为123456

28. [root@localhost yum.repos.d]# cp -rap /usr/share/doc/mysql-server-5.0.77/my-medium.cnf /etc/my.cnf

29. [root@localhost yum.repos.d]# /etc/init.d/httpd restart

30. Stopping httpd: [ OK ]

31. Starting httpd: [ OK ]

32. [root@localhost yum.repos.d]# vim /var/www/html/index.php

<? phpinfo(); ?>

33. [root@localhost yum.repos.d]# vim /var/www/html/mysql.php

<?php

$link=mysql_connect("localhost","root","123456");

if(!$link) echo "FAILD!";

else echo "OK!";

34. 用浏览器访问，会出现ok的界面。

35. 此时，以rpm方式安装的lamp环境已经搭建完成了！这是nagios的基础组成部分，下面进行安装。

36.安装nagios服务器

37. 把nagios-3.2.0.tar.gz nagios-plugins-1.4.13.tar.gz nrpe-2.12.tar.gz 这三个文件copy到 /var/local/src下。

38. [root@localhost nagios-3.2.0]# useradd nagios

39. [root@localhost nagios-3.2.0]# mkdir /usr/local/nagios

40. [root@localhost nagios-3.2.0]# chown nagios.nagios /usr/local/nagios

41. [root@localhost nagios-3.2.0]# ll /usr/local

42.

43. [root@localhost src]# cd nagios-3.2.0/

44. [root@localhost nagios-3.2.0]# ./configure --prefix=/usr/local/nagios

45. [root@localhost nagios-3.2.0]# make all

46. [root@localhost nagios-3.2.0]# make install

47. [root@localhost nagios-3.2.0]# make install-init

48. [root@localhost nagios-3.2.0]# make install-commandmode

49. [root@localhost nagios-3.2.0]# make install-config

50. [root@localhost nagios-3.2.0]# cd ..

51. [root@localhost src]# ls

52. nagios-3.2.0 nagios-plugins-1.4.13 nrpe-2.12

53. [root@localhost src]# cd nagios-plugins-1.4.13/

54. [root@localhost nagios-plugins-1.4.13]# ./configure --prefix=/usr/local/nagios/

55. [root@localhost nagios-plugins-1.4.13]# make

56. [root@localhost nagios-plugins-1.4.13]# make install

57. [root@localhost nagios-plugins-1.4.13]# ls /usr/local/nagios/libexec/

58. 会显示安装的插件文件,即所有的插件都安装在libexec这个目录下

59. 将apache的运行用户加到nagios组里面

从httpd.conf中过滤出当前的apache运行用户

grep ^User /usr/local/apache2/conf/httpd.conf

我的是apache,下面将这个用户加入nagios组

60. [root@localhost nagios-plugins-1.4.13]# grep ^User /etc/httpd/conf/httpd.conf

User apache

61. [root@localhost nagios-plugins-1.4.13]# usermod -G nagios apache

62. 修改apache配置

修改apache的配置文件,增加nagios的目录,并且访问此目录需要进行身份验证

63. [root@localhost nagios-plugins-1.4.13]# vim /etc/httpd/conf/httpd.conf

在最下面添加：

ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin

Options ExecCGI

AllowOverride None

Order allow,deny

Allow from all

AuthName "Nagios Access"

AuthType Basic

AuthUserFile /usr/local/nagios/etc/htpasswd

##//authen

Require valid-user

</Directory>

Alias /nagios /usr/local/nagios/share

Options None

AllowOverride None

Order allow,deny

Allow from all

AuthName "Nagios Access"

AuthType Basic

AuthUserFile /usr/local/nagios/etc/htpasswd

##//authen

Require valid-user

</Directory>

增加验证用户

也就是通过web访问nagios的时候,必须要用这个用户登陆.在这里我们增加用户test:密码为12345

64. [root@localhost nagios-plugins-1.4.13]# htpasswd -c /usr/local/nagios/etc/htpasswd test

可以用 which命令查看 htpasswd命令的位置，如果是源码安装的apache这里就写htpasswd的全路径。用户是test，命令结束会提示你输入密码，这里输入123456即可。

65. 查看认证文件的内容

66. [root@localhost nagios-plugins-1.4.13]# less /usr/local/nagios/etc/htpasswd

67. 到这里nagios的安装也就基本完成了,你可以通过web来访问了.

下面来配置nagios

在server上

在Nagios里面定义了一些基本的对象,一般用到的有:

联系人	contact	出了问题像谁报告?一般当然是系统管理员了
监控时间段	timeperiod	7X24小时不间断还是周一至周五,或是自定义的其他时间段
被监控主机	host	所需要监控的服务器,当然可以是监控机自己
监控命令	command	nagios发出的哪个指令来执行某个监控,这也是自己定义的
被监控的服务	service	例如主机是否存活,80端口是否开,磁盘使用情况或者自定义的服务等

另外,多个被监控主机可以定义为一个主机组,多个联系人可以被定义为一个联系人组,多个服务还能定义成一个服务组呢.

所有这些对象绝对多数都是需要我们手动定义的,这就是nagios的安装显得复杂的地方.其实了解了原理,做一遍之后余下的工作就是复制粘贴了.下面就开始动手

68. 进入配置目录

[root@localhost etc]# ls

cgi.cfg htpasswd nagios.cfg objects resource.cfg

69. 编辑主配置文件

[root@localhost etc]# vim /usr/local/nagios/etc/nagios.cfg

70. 注释这一行cfg_file=/usr/local/nagios/etc/objects/localhost.cfg

#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg

71. 加上如下几行

cfg_file=/usr/local/nagios/etc/objects/contactgroups.cfg

cfg_file=/usr/local/nagios/etc/objects/hosts.cfg

cfg_file=/usr/local/nagios/etc/objects/hostgroups.cfg

cfg_file=/usr/local/nagios/etc/objects/services.cfg

最后形成如下配置

log_file=/usr/local/nagios/var/nagios.log

# OBJECT CONFIGURATION FILE(S)

# These are the object configuration files in which you define hosts,

# host groups, contacts, contact groups, services, etc.

# You can split your object definitions across several config files

# if you wish (as shown below), or keep them all in a single config file.

# You can specify individual object config files as shown below:

cfg_file=/usr/local/nagios/etc/objects/commands.cfg

cfg_file=/usr/local/nagios/etc/objects/contacts.cfg

cfg_file=/usr/local/nagios/etc/objects/contactgroups.cfg

cfg_file=/usr/local/nagios/etc/objects/hosts.cfg

cfg_file=/usr/local/nagios/etc/objects/hostgroups.cfg

cfg_file=/usr/local/nagios/etc/objects/services.cfg

cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg

cfg_file=/usr/local/nagios/etc/objects/templates.cfg

# Definitions for monitoring the local (Linux) host

#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg

# Definitions for monitoring a Windows machine

#cfg_file=/usr/local/nagios/etc/objects/windows.cfg

# Definitions for monitoring a router/switch

#cfg_file=/usr/local/nagios/etc/objects/switch.cfg

# Definitions for monitoring a network printer

#cfg_file=/usr/local/nagios/etc/objects/printer.cfg

72. 改check_external_commands=0为check_external_commands=1 .这行的作用是允许在web界面下执行重启nagios、停止主机/服务检查等操作。

73. 把command_check_interval的值从默认的1改成command_check_interval=10s（根据自己的情况定这个命令检查时间间隔，不要太长也不要太短）。

74. 修改 CGI脚本控制文件cgi.cfg

75. [root@localhost etc]# vim cgi.cfg

76. 把下列项都添加上用户test，如下：

authorized_for_system_information=nagiosadmin,test

authorized_for_configuration_information=nagiosadmin,test

authorized_for_system_commands=nagiosadmin,admin

authorized_for_all_services=nagiosadmin,test

authorized_for_all_hosts=nagiosadmin,test

authorized_for_all_service_commands=nagiosadmin,test

authorized_for_all_host_commands=nagiosadmin,test

77. 那么上述用户名打那里来的呢？是执行命令 /usr/local/apache/bin/htpasswd –c /usr/local/nagios/etc/htpasswd test 所生成的，这个要注意，不能随便加没有存在的验证用户，为了安全起见，不要添加过多的验证用户。

78. [root@localhost etc]# cd objects/

79. 编辑联系人（主要查看邮件地址是否正确，可以编辑多个联系人）

[root@localhost objects]# vim contacts.cfg

define contact{

contact_name test //联系人的名称,这个地方不要有空格

alias sys admin

service_notification_period 24x7

host_notification_period 24x7

service_notification_options w,u,c,r

host_notification_options d,u,r

service_notification_commands notify-service-by-email

host_notification_commands notify-host-by-email

email yahoon@test.com

pager 1338757xxxx

address1 xxxxx.xyyy@icq.com

address2 555-555-5555

}

创建了一个名为test的联系人,下面列出其中重要的几个选项做说明

service_notification_period 24x7

服务出了状况通知的时间段,这个时间段就是上面在timeperiods.cfg中定义的.

host_notification_period 24x7

主机出了状况通知的时间段, 这个时间段就是上面在timeperiods.cfg中定义的

service_notification_options w,u,c,r

当服务出现w—报警(warning),u—未知(unkown),c—严重(critical),或者r—从异常情况恢复正常,在这四种情况下通知联系人.

host_notification_options d,u,r

当主机出现d—当机(down),u—返回不可达(unreachable),r—从异常情况恢复正常,在这3种情况下通知联系人

service_notification_commands notify-by-email

服务出问题通知采用的命令notify-by-email,这个命令是在commands.cfg中定义的,作用是给联系人发邮件.至于commands.cfg之后将专门介绍

host_notification_commands host-notify-by-email

同上,主机出问题时采用的也是发邮件的方式通知联系人

email yahoon@test.com

很明显,联系的人email地址

pager 1338757xxxx

联系人的手机,如果支持短信的通知的话,这个就很有用了.

alias是联系人别名,address是地址意义不大.

按照上面的方式简单的复制修改就可以创建多个联系人了.

80. 定义监控时间段 ,创建配置文件timeperiods.cfg

81. Vim /usr/local/nagios/etc/objects/timeperiods.cfg

查找有没有如下定义：

define timeperiod{

timeperiod_name 24x7

alias 24 Hours A Day,7Days A Week

sunday 00:00-24:00

monday 00:00-24:00

tuesday 00:00-24:00

wednesday 00:00-24:00

thursday 00:00-24:00

friday 00:00-24:00

saturday 00:00-24:00

}

下面就可以将多个联系人组成一个联系人组,创建文件contactgroups.cfg

[root@localhost etc]# vi contactgroups.cfg

define contactgroup{

contactgroup_name sagroup

//联系人组的名称,同样不能空格

alias System Administrators //别名

members test

//组的成员,来自于上面定义的contacts.cfg,如果有多个联系人则以逗号相隔

}

如下：

define contactgroup{

contactgroup_name sagroup

alias System Administrators

members test,nagiosadmin

}

定义被监控主机,创建文件hosts.cfg

[root@localhost etc]# vi hosts.cfg

define host{

host_name nagios-server

//被监控主机的名称,最好别带空格

alias nagios server

//别名

address 192.168.0.111

//被监控主机的IP地址,我现在暂时先填本机的IP

check_command check-host-alive

//监控的命令check-host-alive,这个命令来自commands.cfg,用来监控主机是否存活

max_check_attempts 5

//检查失败后重试的次数

check_period 24x7

//检查的时间段24x7,同样来自于我们之前在timeperiods.cfg中定义的

contact_groups sagroup

//联系人组,上面在contactgroups.cfg中定义的sagroup

notification_interval 10

//提醒的间隔,每隔10秒提醒一次

notification_period 24x7

//提醒的周期, 24x7,同样来自于我们之前在timeperiods.cfg中定义的

notification_options d,u,r

//指定什么情况下提醒,具体含义见之前contacts.cfg部分的介绍

}

如下：

define host{

host_name nagios-server

alias nagios server

address 30.232.120.40

check_command check-host-alive

max_check_attempts 5

check_period 24x7

contact_groups sagroup

notification_interval 10

notification_period 24x7

notification_options d,u,r

}

通过简单的复制修改就可以定义多个主机了.我们在这加上另外两台机器:

名为dbpi的linux主机,ip为192.168.0.111

名为yahoon的xp主机,ip为192.168.0.28

与联系人可以组成联系人组一样,多个主机也可以组成主机组.创建文件hostgroups.cfg

[root@localhost etc]# vi hostgroups.cfg

define hostgroup{

hostgroup_name sa-servers //主机组名称

alias sa Servers //别名

members nagios-server

//组的成员主机,多个主机以逗号相隔,必须是上面hosts.cfg中定义的

}

如下：

define hostgroup{

hostgroup_name sa-servers

alias sa Servers

members nagios-server

}

82.下面是最关键的了,用nagios主要是监控一台主机的各种信息,包括本机资源,对外的服务等等.这些在nagios里面都是被定义为一个个的项目(nagios称之为服务,为了与主机提供的服务相区别,我这里用项目这个词),而实现每个监控项目,则需要通过commands.cfg文件中定义的命令.

83.例如我们现在有一个监控项目是监控一台机器的web服务是否正常, 我们需要哪些元素呢?最重要的有下面三点:首先是监控哪台机,然后是这个监控要用什么命令实现,最后就是出了问题的时候要通知哪个联系人?

84. 定义监控的项目,也叫服务,创建services.cfg

85.[root@localhost etc]# vi services.cfg

#service definition

define service{

host_name nagios-server

//被监控的主机,hosts.cfg中定义的

service_description check-host-alive

//这个监控项目的描述(也可以说是这个项目的名称),可以空格,我们这里定义的是监控这个主机是不是存活

check_command check-host-alive

//所用的命令,是commands.cfg中定义的

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

//监控的时间段,是timeperiods.cfg中定义的

notification_interval 10

notification_period 24x7

//通知的时间段, ,是timeperiods.cfg中定义的

notification_options w,u,c,r

//在监控的结果是wucr时通知联系人,具体含义看前文.

contact_groups sagroup

//联系人组,是contactgroups.cfg中定义的

}

如下：

#service definition

define service{

host_name nagios-server

service_description check-host-alive

check_command check-host-alive

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

86.可以看到基本上所有的成员都是已经定义的.同样的将上面的内容复制修改,就可以加上另外两个监控项目:分别监控yahoon和dbpi是否存活

87.这样整个的配置过程就结束了.虽然功能很简单,但是已经为以后扩展打下了良好的基础.可以放心的告诉大家,以后的工作基本都是复制粘贴啦!!!

测试nagios server

88. 在运行nagios之前首先做测试

89./usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

90.看到下面这些信息就说明没问题了

Total Warnings: 0

Total Errors: 0

Things look okay - No serious problems were detected during the pre-flight check

91.如果有问题的话就可以按照输出信息来排查

92. 作为守护进程后台启动nagios

93./usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

94.切换到插件目录

[root@localhost libexec]# cd /usr/local/nagios/libexec/

95.这个目录有所有的插件，也就是命令，就是monitor的功能，查看帮助文档用命令：

[root@localhost libexec]# ./check_disk –h

96.只要是 –h就可以查看，内容很详细，举例：

[root@localhost libexec]# ./check_disk -w 10% -c 5% /

DISK OK - free space: / 6064 MB (59% inode=93%);| /=4077MB;9623;10158;0;10693

[root@localhost libexec]# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/mapper/VolGroup00-LogVol00

11G 4.0G 6.0G 41% /

/dev/sda1 99M 12M 82M 13% /boot

tmpfs 420M 0 420M 0% /dev/shm

[root@localhost libexec]#

97.这是插件的手动调用方式，下面讲monitor自动调用插件的使用方式。

首先理解自动调用命令的过程：

/usr/local/nagios/libexec/下面的所有文件 ------ 提供了所有的插件就是命令库文件，就是命令程序

/usr/local/nagios/etc/objects/commands.cfg --- 提供了如何使用这个命令，就是把所有参数都加上，就是monitor具体怎么用这条命令来执行操作，比如说：

define command{

command_name check-host-alive

command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1

这里有命令名和命令执行方式。

最后就是谁来调用这个命令了，就在services.cfg这个文件里调用这个命令名称啦。看下图：

services.cfg定义监控项目用某个命令

↓

这个命令必须在commands.cfg中定义

↓

定义这个命令时使用了libexec下的插件

98.下面监控nagios-server 的ftp

编辑services.cfg 增加下面的内容,基本上就是copy上节我们定义监控主机存活的代码.略做修改.

define service{

host_name nagios-server

要监控的机器,给出机器名,注意必须是hosts.cfg中定义的

service_description check ftp

给这个监控项目起个名字吧,任意起,你自己懂就行

check_command check_ftp

所用的命令,当然必须是commands.cfg中定义了的

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

99. 监控dbpi的ssh

define service{

host_name dbpi

意义同上

service_description check-ssh

意义同上

check_command check_tcp!22

ssh所用的tcp的22号端口,我就用commands中定义的check_tcp命令.至于!22的意思不用我说了吧.

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

100. 下面自己定义一个，根据上面的关系表。就以查看http服务举个例子。

101. [root@localhost objects]# vim commands.cfg

102. 找到相关字段：

# 'check_http' command definition

define command{

command_name check_http

command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$

}

发现名字是check_http

用法得到/usr/local/nagios/libexec/目录下执行命令：

[root@localhost libexec]# ./check_http –h

可以查看到具体用法，规则如下：

我现在来独立执行它,例如查看根分区的使用情况,执行

[root@server1 libexec]# ./check_disk -w 10% -c 5% /

命令的含义是检查分区/的使用情况,若剩余10%以下,为警告状态(warning),5%以下为严重状态(critical),

执行后我们会看到下面这条信息

DISK WARNING - free space: / 487 MB (6% inode=78%);| /=7449MB;7524;7942;0;8361

说明当前是warning的状态,空闲空间只有6%了.如果nagios收到这些状态结果就会采取报警等措施了

或许在这里大家又迷糊了,我们在定义某个监控项目时,所用的监控命令都是来自commands.cfg的,这和这些插件有什么关系???想到了吧,commands.cfg中定义的监控命令就是使用的这些插件.举个例子,之前我们已经不止一次用到了check-host-alive这个命令,打开commands.cfg就可以看到这个命令的定义,如下:

################################################################################

# SAMPLE HOST CHECK COMMANDS

################################################################################

# This command checks to see if a host is "alive" by pinging it

# The check must result in a 100% packet loss or 5 second (5000ms) round trip

# average time to produce a critical error.

# Note: Only one ICMP echo packet is sent (determined by the '-p 1' argument)

# 'check-host-alive' command definition

define command{

command_name check-host-alive

command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1

}

command_name check-host-alive

这句话的意思是定义的命令名是check-host-alive,也就是我们在services.cfg中使用的名称

执行的操作是

$USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1

其中$USER1$是在resource.cfg文件中定义的,代表插件的安装路径.就如我们上面看到的那样$USER1$=/usr/local/nagios/libexec,至于$HOSTADDRESS$,则默认被定义为监控主机的地址.

简单的说,我们在services.cfg中定义了对dbpi执行check-host-alive命令,实际上就是执行了

/usr/local/nagios/libexec/ check_ping -H dbpi 的ip地址 -w 3000.0,80% -c 5000.0,100% -p 1

实际上check-host-alive只是这一长串命令的简称而已,而在services.cfg中都是使用简称的.

在commands.cfg中定义了很多这样的命令简称.基本上我们常用的监控项目都包含了,例如ftp,http,本地的磁盘,负载等等.

我们再看一个命令,check_local_disk定义如下

# 'check_local_disk' command definition

define command{

command_name check_local_disk

command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$

}

check_local_disk实际上是执行的check_disk插件.这里的$ARG1$, $ARG2$, $ARG3$是什么意思呢?在之前我们已经提到了这个check_disk这个插件的用法,-w的参数指定磁盘剩了多少是警告状态,-c的参数指定剩多少是严重状态,-p用来指定路径.

在使用check-host-alive的时候,只需要在services.cfg中直接写上这个命令名check-host-alive.后面没任何的参数.而使用check_local_disk则不同,在services.cfg中这要这么写

check_local_disk!10%!5%!/

在命令名后面用!分隔出了3个参数,10%是$ARG1$的值,5%是$ARG2$的值,/ 是$ARG3$的值,

103. [root@localhost objects]# vim services.cfg

104. 随便复制一段，然后修改，比如复制这一段：

define service{

host_name nagios-server

service_description check ftp

check_command check_ftp

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

105. 根据每一项逐一修改。

106. 命令调用就在check_command ，这一字段，最后改成：

define service{

host_name nagios-server

service_description check http

check_command check_http

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

107. 重启服务

108. [root@localhost objects]# ps aux | grep nagios

nagios 2555 0.0 0.1 12908 1200 ? Ssl 20:53 0:01 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

109. [root@localhost objects]# kill 2555

110. [root@localhost objects]# /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

111. 过几分钟在查看服务，就会多出刚才定义的那几个服务了，yeah！！！

112. 这样一个自动监控http的服务就添加好了，根据如上描述可以添加所有服务。

通过nrpe获得本地信息

113. 在nagios server上安装check_nrpe插件，在客户机上安装nrpe插件和nagios-plugs

114. 在被监控主机上

115. 增加用户

116. [root@dbpi root]# useradd nagios

117. 设置密码

118. [root@dbpi root]# passwd nagios

119. [root@localhost src]# cd nagios-plugins-1.4.13/

120. [root@localhost nagios-plugins-1.4.13]# ./configure

121. [root@localhost nagios-plugins-1.4.13]# make

122. [root@localhost nagios-plugins-1.4.13]# make install

123. 这一步完成后会在/usr/local/nagios/下生成两个目录libexec和share

124. [root@localhost nagios-plugins-1.4.13]# ls /usr/local/nagios/

libexec share

125. 修改目录权限

126. [root@localhost nagios-plugins-1.4.13]# chown nagios.nagios /usr/local/nagios/

127. [root@localhost nagios-plugins-1.4.13]# chown -R nagios.nagios /usr/local/nagios/libexec/

128. 安装nrpe

129. [root@localhost src]# cd nrpe-2.12/

130. [root@localhost nrpe-2.12]# ./configure

131. [root@localhost nrpe-2.12]# make all

132. [root@localhost nrpe-2.12]# make install-plugin

133. [root@localhost nrpe-2.12]# make install-daemon

134. [root@localhost nrpe-2.12]# make install-daemon-config

135. 现在再查看nagios目录就会发现有4个目录了

136. [root@localhost nrpe-2.12]# ls /usr/local/nagios/

bin etc libexec share

137. [root@localhost nrpe-2.12]# make install-xinetd

138. [root@localhost nrpe-2.12]# vim /etc/xinetd.d/nrpe

139.# default: on

140.# description: NRPE (Nagios Remote Plugin Executor)

141.service nrpe

142.{

143.flags = REUSE

144.socket_type = stream

145.port = 5666

146.wait = no

147.user = nagios

148.group = nagios

149.server = /usr/local/nagios/bin/nrpe

150.server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd

151.log_on_failure += USERID

152.disable = no

153.only_from = 127.0.0.1在后面增加监控主机的地址0.111,以空格间隔

154.}

改后

only_from = 127.0.0.1 30.203.222.90

155. 保存退出。

156. 编辑 /etc/services文件,增加NRPE服务

157. [root@localhost nrpe-2.12]# vim /etc/services

158. 增加如下

# Local services

nrpe 5666/tcp # nrpe

159. 重启xientd服务

[root@localhost nrpe-2.12]# service xinetd restart

Stopping xinetd: [ OK ]

Starting xinetd: [ OK ]

160. 查看 NRPE是否已经启动

[root@localhost nrpe-2.12]# netstat -at | grep nrpe

getnameinfo failed

tcp 0 0 *:nrpe *:* LISTEN

[root@localhost nrpe-2.12]# netstat -an | grep 5666

tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN

[root@localhost nrpe-2.12]#

161. 测试nrpe插件是否正常工作

162. [root@localhost nrpe-2.12]# /usr/local/nagios/libexec/check_nrpe -H localhost

NRPE v2.12

163. 返回版本信息则证明：在本地用check_nrpe连接 nrpe daemon是正常的。

164. /usr/local/nagios/libexec/check_nrpe –h查看这个命令的用法

可以看到用法是check_nrpe –H 被监控的主机 -c要执行的监控命令

注意:-c后面接的监控命令必须是nrpe.cfg文件中定义的.也就是NRPE daemon只运行nrpe.cfg中所定义的命令

165. 查看 NRPE的监控命令（文件写出了他所能执行的命令，以及命令名）

166. [root@localhost nrpe-2.12]# cd /usr/local/nagios/etc/

167. [root@localhost etc]# vim nrpe.cfg

找到下面这段话

# The following examples use hardcoded command arguments...

command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10

command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20

command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hda1

command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z

command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200

168. 红色部分是命令名,也就是check_nrpe 的-c参数可以接的内容,等号=后面是实际执行的插件程序(这与commands.cfg中定义命令的形式十分相似,只不过是写在了一行).也就是说check_users就是等号后面/usr/local/nagios/libexec/check_users -w 5 -c 10的简称.

169. 我们可以很容易知道上面这5行定义的命令分别是检测登陆用户数,cpu负载,hda1的容量,僵尸进程,总进程数.各条命令具体的含义见插件用法(执行”插件程序名 –h”)

170. 由于-c后面只能接nrpe.cfg中定义的命令,也就是说现在我们只能用上面定义的这五条命令.我们可以在本机实验一下.执行

/usr/local/nagios/libexec/check_nrpe -H localhost -c check_users

/usr/local/nagios/libexec/check_nrpe -H localhost -c check_load

/usr/local/nagios/libexec/check_nrpe -H localhost -c check_hda1

/usr/local/nagios/libexec/check_nrpe -H localhost -c check_zombie_procs

/usr/local/nagios/libexec/check_nrpe -H localhost -c check_total_procs

在监控主机上(nagios server)

171. 安装check_nrpe插件

172. 在commands.cfg中创建check_nrpe的命令定义,因为只有在commands.cfg中定义过的命令才能在services.cfg中使用

173. 创建对被监控主机的监控项目

174. [root@localhost src]# cd nrpe-2.12/

175. [root@localhost nrpe-2.12]# ./configure

176. [root@localhost nrpe-2.12]# make all

177. [root@localhost nrpe-2.12]# make install-plugin

178. 在监控主机上测试与被监控主机的通信

179. [root@localhost nrpe-2.12]# /usr/local/nagios/libexec/check_nrpe -H 30.233.222.70

NRPE v2.12

180. 返回版本证明通信正常。

181. 在 commands.cfg中增加对check_nrpe的定义

182. [root@localhost nrpe-2.12]# vim /usr/local/nagios/etc/objects/commands.cfg

183. 在最后面增加如下内容

########################################################################

# 2007.9.5 add by yahoon

# NRPE COMMAND

########################################################################

# 'check_nrpe ' command definition

define command{

command_name check_nrpe

command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$

}

184. 意义如下

command_name check_nrpe

定义命令名称为check_nrpe,在services.cfg中要使用这个名称.

command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$

这是定义实际运行的插件程序.这个命令行的书写要完全按照check_nrpe这个命令的用法.不知道用法的就用check_nrpe –h查看

185. -c后面带的$ARG1$参数是传给nrpe daemon执行的检测命令,之前说过了它必须是nrpe.cfg中所定义的那5条命令中的其中一条.在services.cfg中使用check_nrpe的时候要用!带上这个参数

186. 下面就可以在services.cfg中定义对dbpi主机cpu负载的监控

187.define service{

188.host_name dbpi

189.被监控的主机名,这里注意必须是linux且运行着nrpe,而且必须是hosts.cfg中定义的

190.service_description check-load

191.监控项目的名称

192.check_command check_nrpe!check_load

193.监控命令是check_nrpe,是在commands.cfg中定义的,带的参数是check_load,是在nrpe.cfg中定义的

194.max_check_attempts 5

195.normal_check_interval 3

196.retry_check_interval 2

197.check_period 24x7

198.notification_interval 10

199.notification_period 24x7

200.notification_options w,u,c,r

201.contact_groups sagroup

202.}

像这样将其余四个监控项目加进来.

203. [root@localhost nrpe-2.12]# vim /usr/local/nagios/etc/objects/services.cfg

204. 说明：这里就监控了load这一项，其余还有很多项，到nagios的客户机上，找文件/usr/local/nagios/etc/nrpe.cfg，在文件的最下面就有很多的定义，并且把定义执行了命令，只要在这个文件里（services.cfg）使用这些命令就行了。注意所使用的命令实在客户机上而不是在主机上。这个之前有说明的，但是没有说的这么明白。

205. 下面是我的配置文件：

define service{

host_name nagios-server

service_description check-load

check_command check_nrpe!check_load

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

define service{

host_name nagios-server

service_description check-users

check_command check_nrpe!check_users

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

define service{

host_name nagios-server

service_description check-zombie-procs

check_command check_nrpe!check_zombie_procs

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

define service{

host_name nagios-server

service_description check-total-procs

check_command check_nrpe!check_total_procs

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

define service{

host_name nagios-server

service_description check-sda1

check_command check_nrpe!check_sda1

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

206. 之前我们说过了,今天还有一个任务是要监控dbpi的swap使用情况.但是很遗憾,在nrpe.cfg中默认没有定义这个监控功能的命令.怎么办?手动在nrpe.cfg中添加,也就是自定义NRPE命令.

207. 现在我们要监控swap分区,如果空闲空间小于20%则为警告状态—warning;如果小于10%则为严重状态—critical.我们可以查得需要使用check_swap插件,完整的命令行应该是下面这样.

208. /usr/local/nagios/libexec/check_swap -w 20% -c 10%

209. 增加一个没有提前默认配置好的功能：

210. 思路：

在被监控机上，找到命令，在这个目录下/usr/local/nagios/libexec

查看命令如何使用：

[root@localhost libexec]# ./check_swap –h

手动使用命令：

[root@localhost libexec]# /usr/local/nagios/libexec/check_swap -w 20% -c 10%

SWAP OK - 100% free (1151 MB out of 1151 MB) |swap=1151MB;230;115;0;1151

[root@localhost libexec]#

命令执行成功。

把命令定义到nrpe.cfg文件中。（被监控机上操作）

[root@localhost libexec]# vim /usr/local/nagios/etc/nrpe.cfg

增加下面这一行：（command[check_swpa]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%）

command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%

我们知道check_swap现在就可以作为check_nrpe的-c的参数使用了

修改了配置文件,当然要重启.但是

如果你是以独立的daemon运行的nrpe,那么需要手动重启.

如果你是在xinetd或者inetd下面运行的,则不需要.

由于我们是xinetd下运行的,所以不需要重启服务

在监控服务器上增加这个监控项目：

[root@localhost nrpe-2.12]# vim /usr/local/nagios/etc/objects/services.cfg

define service{

host_name nagios-server

service_description check-swap

check_command check_nrpe!check_swap

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

现在服务器上手动执行以下，看看swap这个功能好不好用。

211. 所有的配置文件已经修改好了,现在重启nagios.杀掉nagios进程,然后再重启，就OK啦！！！

212. 重启nagios服务器

/etc/init.d/nagios restart

总结一下文章的调用关系：

本地信息的调用关系：通过客户端软件调用信息（nrpe）

Server端的services.cfg(vim /usr/local/nagios/etc/objects/services.cfg)

--调用-- server端的commands.cfg文件(vim /usr/local/nagios/etc/objects/commands.cfg)

—-调用—client端的nrpe.cfg文件(vim /usr/local/nagios/etc/nrpe.cfg)

---调用—client端的libexec目录(cd /usr/local/nagios/libexec)下的真实程序

---调用---具体执行程序了。

远程信息的调用关系：(不使用客户端软件（nrpe）)

Server端的services.cfg --调用-- server端的commands.cfg文件---调用—server端的libexec目录下的真实程序----具体执行程序了

命令名称必须一致！！！

附录:

1.重启nagios的方法

之前我说重启nagios的时候都是用的杀进程的方式,其实也可以不这么做.如果在安装nagios的时候安装了启动脚本就可以使用/etc/init.d/nagios restart 还可以带的参数有stop, start,status

如果报错了,有可能是脚本里面的路径设置错误,解决办法

vi /etc/init.d/nagios

将prefix=/usr/local/nagiosaa改为安装的目录/etc/init.d/nagios

注:在nagios安装的时候说是将脚本安装到了/etc/rc.d/init.d,其实这和/etc/init.d是一个目录

2.不以xinetd的方式运行nrpe

因为我们按照nrpe的安装文档安装下来,nrpe是在xinetd下面运行的,个人比较喜欢像nagios那样以单独的daemon来运行.这样比较好控制.

方法:

编辑 /etc/services将nrpe注释掉

# Local services

#nrpe 5666/tcp # nrpe

编辑 nrpe.cfg,增加监控主机的地址

# NOTE: This option is ignored if NRPE is running under either inetd or xinetd

allowed_hosts=127.0.0.1,192.168.0.111

注意两个地址以逗号隔开

以单独的daemon启动nrpe

[root@dbpi etc]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

查看

[root@dbpi etc]# ps -ef|grep nrpe

nagios 22125 1 0 14:04 ? 00:00:00 [nrpe]

[root@dbpi nagios]# netstat -an|grep 5666

tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN

说明已经正常启动了

在/etc/rc.d/rc.local里面加入下面一行就实现开机启动nrpe了

/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg –d

同理要开机运行nagios就在/etc/rc.d/rc.local里面增加下面这行

/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

3.有关于check_load的用法及意义

这个插件是用来检测系统当前的cpu负载,使用的方法为

check_load [-r] -w WLOAD1,WLOAD5,WLOAD15 -c CLOAD1,CLOAD5,CLOAD15

在unix里面负载的均值通常表示是1分钟,5分钟,15分钟内平均有多少进程处于等待状态.

例如check_load -w 15,10,5 -c 30,25,20这个命令的意义如下

当1分钟多于15个进程等待,5分钟多于10个,15分钟多于5个则为warning状态

当1分钟多于30个进程等待,5分钟多于25个,15分钟多于20个则为critical状态

Nagios monitoring windows server

说明：

监控Windows，不用services.cfg这个配置文件，而是使用windows.cfg这个配置文件，也不使用hosts.cfg和hostgroups.cfg配置文件，都在windows.cfg这个配置文件里面定义。打开它即可。配置如下：

[root@corshetlpro01 objects]# cd /usr/local/nagios/etc/objects

[root@corshetlpro01 objects]# vim commands.cfg

查看是否有如下项：

define command{

command_name check_nt

command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$

}

保存退出。

vim /usr/local/nagios/etc/nagios.cfg

# Definitions for monitoring the local (Linux) host

cfg_file=/usr/local/nagios/etc/objects/localhost.cfg

# Definitions for monitoring a Windows machine

cfg_file=/usr/local/nagios/etc/objects/windows.cfg #去掉这句话的注释

打开模块后配置windows.cfg

vim /usr/local/nagios/etc/objects/windows.cfg

修改这里的配置，把host_name修改成需要监控的windows机器名。然后配置host和hostgroup。

define host{

use windows-server ; Inherit default values from a template

host_name corshetlstg01 ; The name we're giving to this host

alias My Windows Server ; A longer name associated with the host

address 3.242.120.10 ; IP address of the host

}

define hostgroup{

hostgroup_name windows-servers ; The name of the hostgroup

alias Win servers ; Long name of the group

}

define service{

use generic-service

host_name corshetlstg01

service_description Uptime

check_command check_nt!UPTIME

}

# Create a service for monitoring CPU load

# Change the host_name to match the name of the host you defined above

define service{

use generic-service

host_name corshetlstg01

service_description CPU Load

check_command check_nt!CPULOAD!-l 5,80,90

}

# Create a service for monitoring memory usage

# Change the host_name to match the name of the host you defined above

define service{

use generic-service

host_name corshetlstg01

service_description Memory Usage

check_command check_nt!MEMUSE!-w 80 -c 90

# Create a service for monitoring C:\ disk usage

# Change the host_name to match the name of the host you defined above

define service{

use generic-service

host_name corshetlstg01

service_description C:\ Drive Space

check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90

}

define service{

use generic-service

host_name corshetlstg01

service_description D:\ Drive Space

check_command check_nt!USEDDISKSPACE!-l d -w 80 -c 90

}

define service{

use generic-service

host_name corshetlstg01

service_description E:\ Drive Space

check_command check_nt!USEDDISKSPACE!-l e -w 80 -c 90

}

define service{

host_name corshetlstg01

service_description PING

check_command check_ping!100.0,20%!500.0,60%

}

保存退出即可。

[root@corshetlpro01 libexec]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

检查下语法，然后进行修改。

重启：

/etc/init.d/nagios restart

/etc/init.d/apachectl restart

killall -HUP nagios

重启命令

server端：

killall -HUP nagios

/etc/init.d/nagios restart

/etc/init.d/apachectl restart

client端：

/etc/init.d/xinetd restart

/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

测试nagios服务器语法：

[root@corshetlpro01 objects]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

上一篇：./configure,make,make install的作用

下一篇：heartbeat+ldirectord中遇到的问题

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

nagios 搭建

nagios 搭建

51CTO博客