Linux监控  

Nagios  

 

1 什么是监控? 监视控制

 

2 监控谁?        各种服务器

 

3 监控什么?     网络流量(eth0,eth1)   服务的状态(运行,停止)  硬件资源 cpu  内存 存储    

                          系统运行情况(总数,运行,休眠,僵尸)   

 

 

 

[root@room1pc01 桌面]# uptime

 09:15:02 up  15min,  4 users,  load average: 0.01, 0.10, 0.08

                  (数越大,说明在线时间越长,越好)

 

[root@room1pc01 桌面]# top  (看cpu)

 

 

 

4 如何监控?    使用命令   编写脚本     监控软件

 

自动监控:

计划任务+监控脚本 (chkconfig crond on)

搭建监控服务器(软件  Nagios  Cacti  Zabbix)

 

 

5如何接收报警信息(邮件,短信,微信,即时消息)

 

++++++++++++++++++++++++++++++++

配置监控服务器的步骤:

1.  部署服务运行环境 httpd(nginx或Apache,php) yum -y install httpd php

2.  安装提供服务的软件

2.1 安装准备

2.2 安装监控服务软件 和 监控插件软件

2.3 修改服务配置文件

2.4 启动监控服务

3.  配置监控服务

3.1 配置监控远端服务器

3.2 配置监控自己(特殊情况,公司小只有一个网站服务器时,网站和监控服务配在一台上,所以监控本机网站)

不监控本机的交换分区

监控引导分区的使用情况

监控ftp的运行状态

指定接收报警信息的邮箱地址是nagios@localhost

4  配置监控报警

5  查看监控信息

 

 

 

 

设置查看访问监控页面的认证用户名和密码(不能随便指) nagiosadmin

nagios服务的监控过程:服务运行时调用插件,调用插件时,可以设置插件监控的阀值,nagios服务把监控到的值和插件指定的值比较,根据比较结果显示监控状态》

插件监控的阀值种类:警告值 错误值(值的多少由运维人员指定)

监控状态:  ok正常  warning警告状态  critical 严重错误   unknow配置错误  pending正在监控

监控到的数据小于警告值 ok   大于 警告值且小于错误值显示warning    大于错误值 显示critical

nagios服务默认监控本机哪些资源?

cpu负载

登录系统总用户数

网站服务运行状态

主机是否在线

跟分区使用量

sshd服务运行状态

 

[root@99 libexec]# ./check_users -h (每个插件都会提供帮助)

 

 

 

_____________________________________________________

———————————————————————————————————————————————————-

搭建Nagios服务器 (LAP)环境 99监控服务器  真机客户机

————————————————————————————————————————————————————

_____________________________________________________

[root@99 ~]# yum -y httpd php

 

[root@99 ~]# service httpd restart ,chkconfig httpd on

停止 httpd:                                               [失败]

正在启动 httpd:httpd: Could not reliably determine the server's fully qualified domain name, using 0.0.0.99 for ServerName

                                                           [确定]

[root@99 ~]# echo 123 >/var/www/html/index.html

[root@99 ~]# yum -y install elinks

 

[root@99 ~]# elinks --dump http://localhost

   123

 

[root@99 ~]# vim /var/www/html/test.php

<?php

phpinfo();

?>

[root@room1pc01 桌面]# firefox http:192.168.4.99/test.php  (浏览器中输这个一样http://192.168.4.99/test.php )

 

 

[root@99 ~]# unzip nagios.zip

[root@99 ~]# cd nagios

[root@99 nagios]# ls

nagios-3.2.1.tar.gz           nrpe-2.12.tar.gz

nagios-plugins-1.4.14.tar.gz  ntop-3.3.7.tar.gz

 

[root@99 nagios]# yum -y install gcc gcc-c++

 

[root@99 nagios]# useradd nagios  (默认服务,监控进程的用户名,组)

[root@99 nagios]# groupadd nagcmd

[root@99 nagios]# usermod -G nagcmd nagios

 

[root@99 nagios-3.2.1]# ./configure --help

 

[root@99 nagios-3.2.1]# ./configure --with-nagios-user=nagios --with-nagios-group=nagcmd   --with-command-user=nagios  --with-command-group=nagcmd

[root@99 nagios-3.2.1]# make all

 

[root@99 nagios-3.2.1]# make install

 

[root@99 nagios-3.2.1]# ls /usr/local/nagios

bin  libexec  sbin  share  var

[root@99 nagios-3.2.1]# make install-init

/usr/bin/install -c -m 755 -d -o root -g root /etc/rc.d/init.d

/usr/bin/install -c -m 755 -o root -g root daemon-init /etc/rc.d/init.d/nagios

 

*** Init script installed ***

 

[root@99 nagios-3.2.1]# ls /etc/rc.d/init.d/nagios  (启动脚本安装在这)

/etc/rc.d/init.d/nagios

[root@99 nagios-3.2.1]# make install-commandmode

/usr/bin/install -c -m 775 -o nagios -g nagcmd -d /usr/local/nagios/var/rw

chmod g+s /usr/local/nagios/var/rw

 

*** External command directory configured ***

 

[root@99 nagios-3.2.1]# make install-config

 

[root@99 nagios-3.2.1]# ls /usr/local/nagios/etc

cgi.cfg  nagios.cfg  objects  resource.cfg

[root@99 nagios-3.2.1]# make install-webconf

 

[root@99 nagios-3.2.1]# ll /etc/rc.d/init.d/nagios (启动脚本这有)

-rwxr-xr-x. 1 root root 5178 3月   9 02:00 /etc/rc.d/init.d/nagios

[root@99 nagios-3.2.1]# ll /etc/init.d/nagios   (这下面也有启动脚本)

-rwxr-xr-x. 1 root root 5178 3月   9 02:00 /etc/init.d/nagios

 

[root@99 nagios-3.2.1]# /etc/init.d/nagios   status (查看服务状态)

No lock file found in /usr/local/nagios/var/nagios.lock

[root@99 nagios-3.2.1]# /etc/init.d/nagios   start  (启动)

Starting nagios: done.

[root@99 nagios-3.2.1]# /etc/init.d/nagios   status (再查看)

nagios (pid 6807) is running.

 

[root@room1pc01 桌面]# http://192.168.4.99/nagios

[root@99 nagios-3.2.1]# service httpd restart

[root@room1pc01 桌面]# http://192.168.4.99/nagios  打开的浏览器 选择serves选项

 

 

[root@99 nagios-3.2.1]# vim /etc/httpd/conf.d/nagios.conf

 22    AuthUserFile /usr/local/nagios/etc/htpasswd.users (查看认证用户在的文件)

 

[root@99 nagios-3.2.1]# ls /usr/local/nagios/etc/htpasswd.users (文件不存在 需要创建)

ls: 无法访问/usr/local/nagios/etc/htpasswd.users: 没有那个文件

[root@99 nagios-3.2.1]# which htpasswd

/usr/bin/htpasswd

[root@99 nagios-3.2.1]# rpm -qf /usr/bin/htpasswd

httpd-tools-2.2.15-45.el6.x86_64

 

[root@99 nagios-3.2.1]# htpasswd -h (查看帮助)

 

[root@99 nagios-3.2.1]# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin (创建监控认证的用户密码,写入到文件)

New password:

Re-type new password:

Adding password for user nagiosadmin

[root@99 nagios-3.2.1]# ls /usr/local/nagios/etc/htpasswd.users (文件创建成功)

/usr/local/nagios/etc/htpasswd.users

 

[root@99 nagios-3.2.1]# cat /usr/local/nagios/etc/htpasswd.users (查看内容 用户)

nagiosadmin:SdiDPECEPUFkM

 

[root@room1pc01 桌面]# http://192.168.4.99/nagios

输入帐号  密码

 

 

 

按装插件:(用图形界面监控,最上面是命令操作显示)

[root@99 nagios-3.2.1]# cd /usr/local/nagios/

[root@99 nagios]# tar -zxvf nagios-plugins-1.4.14.tar.gz

 

[root@99 nagios]# cd nagios-plugins-1.4.14

 

[root@99 nagios-plugins-1.4.14]# ./configure && make && make install

 

[root@99 nagios-plugins-1.4.14]# ls /usr/local/nagios/libexec/ (查看插件是否安装好)

 

[root@room1pc01 桌面] 浏览器 刷新 再点serves 等一下

 

 

监控插件的使用

/usr/local/nagios/libexec/插件名 -h

 

[root@99 libexec]# ./check_users -w 1  -c 2

USERS CRITICAL - 3 users currently logged in |users=3;1;2;0 (报错)

 

[root@99 libexec]# ./check_http -h

 

[root@99 libexec]# ./check_http -I 192.168.4.254

HTTP WARNING: HTTP/1.1 403 Forbidden - 5159 bytes in 0.047 second response time |time=0.047352s;;;0.000000 size=5159B;;;0

[root@99 libexec]# ./check_http -I 192.168.4.254 -p 80

HTTP WARNING: HTTP/1.1 403 Forbidden - 5159 bytes in 0.002 second response time |time=0.001934s;;;0.000000 size=5159B;;;0

[root@99 libexec]# ./check_http -I 192.168.4.254 -p 80

拒绝连接

HTTP CRITICAL - Unable to open TCP socket

[root@99 libexec]# ./check_http -I localhost -p 80

HTTP OK: HTTP/1.1 200 OK - 271 bytes in 0.001 second response time |time=0.001469s;;;0.000000 size=271B;;;0

 

[root@99 libexec]# ./check_ping  -H 192.168.4.253  -w 10,50%    -c 10,60%

PING OK - Packet loss = 0%, RTA = 0.15 ms|rta=0.152000ms;10.000000;10.000000;0.000000 pl=0%;50;60;0

 

 

[root@99 libexec]# ./check_disk -h

[root@99 libexec]# df -h

Filesystem            Size  Used Avail Use% Mounted on

/dev/mapper/VolGroup-lv_root

                       47G  1.6G   43G   4% /

tmpfs                 499M     0  499M   0% /dev/shm

/dev/vda1             477M   36M  416M   8% /boot

[root@99 libexec]# ./check_disk -w 50%  -c 40%   -p /

DISK OK - free space: / 43791 MB (96% inode=98%);| /=1560MB;23893;28671;0;47786

 

[root@99 libexec]# dd if=/dev/zero of=/boot/test.txt bs=1M count=400

记录了400+0 的读入

记录了400+0 的写出

419430400字节(419 MB)已复制,7.00428 秒,59.9 MB/秒

[root@99 libexec]# ./check_disk -w 50%  -c 40%   -p /boot

DISK CRITICAL - free space: /boot 15 MB (3% inode=99%);| /boot=435MB;238;285;0;476

[root@99 libexec]# ./check_disk -w 50%  -c 40%   -p /dev/shm

DISK OK - free space: /dev/shm 498 MB (100% inode=99%);| /dev/shm=0MB;249;298;0;498

 

[root@99 libexec]# ./check_ssh -p 22 localhost

SSH OK - OpenSSH_5.3 (protocol 2.0)

[root@99 libexec]# ./check_ssh  192.168.4.254

SSH OK - OpenSSH_5.3 (protocol 2.0)

 

[root@99 libexec]# ./check_swap

 

[root@99 libexec]# ./check_procs -h

 

 

[root@99 libexec]# ./check_procs -w 100 -c 110

PROCS OK: 100 processes

[root@99 libexec]# ./check_procs -w 90 -c 110

PROCS WARNING: 100 processes

[root@99 libexec]# ./check_procs -w 90 -c 95

PROCS CRITICAL: 100 processes

 

[root@99 libexec]# ./check_procs  -w 60 -c 65 -s R

PROCS OK: 0 processes with STATE = R

[root@99 libexec]# ./check_procs  -w 60 -c 65 -s Z

PROCS OK: 0 processes with STATE = Z

[root@99 libexec]# ./check_procs  -w 60 -c 65 -s ZR

PROCS OK: 0 processes with STATE = ZR

 

有些服务没有专属插件,可以根据端口号来指定(都是tcp):

[root@99 libexec]# ./check_tcp -H 192.168.4.254 -p 80

TCP OK - 0.000 second response time on port 80|time=0.000351s;;;0.000000;10.000000

[root@99 libexec]# ./check_tcp -H 192.168.4.254 -p 25

拒绝连接

[root@99 libexec]# ./check_tcp -H 192.168.4.254 -p 22

TCP OK - 0.000 second response time on port 22|time=0.000243s;;;0.000000;10.000000

 

 

 

____修改服务配置文件可以自己增加监控服务 (先增加监控本机服务)_____________________

 

[root@99 nagios-3.2.1]# ls /usr/local/nagios

bin  etc  libexec  sbin  share  var

 

libexec:插件目录

bin:可执行命令

etc:配置文件

sbin:cgi文件(点监控页面时出现的各种页面)

var:变化的,日志 缓存

share:配置文件

 

nagios服务配置文件说明?

 

[root@99 nagios]#  /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg   (检查配置文件是否有错)

 

[root@99 nagios]# vim /usr/local/nagios/etc/nagios.cfg

  19 log_file=/usr/local/nagios/var/nagios.log

  30 cfg_file=/usr/local/nagios/etc/objects/commands.cfg  (设置服务运行时使用的监控插件)

  31 cfg_file=/usr/local/nagios/etc/objects/contacts.cfg    (设置接收报警消息的邮箱地址)

  32 cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg   (定义监控时间模板的文件—)

  33 cfg_file=/usr/local/nagios/etc/objects/templates.cfg     (定义监控模板的配置文件)

 36 cfg_file=/usr/local/nagios/etc/objects/localhost.cfg      (这个文件监控本机的配置文件,是里面的内容容器导致,不是文件名可以随变)

 

/usr/local/nagios/etc/resource.cfg 宏定义   $USER1$(插件定义目录)

 

_______________________

 

1)

[root@99 ~]# vim  /usr/local/nagios/etc/objects/commands.cfg  (设置增加监控服务使用的插件)

 

define command {

        command_name monitor_localhost_boot

         command_line  /usr/local/nagios/libexec/check_disk -w 20% -c 10%  -p  /boot

} (这四行和下面四行只要写一个就行。这用的是常量,下面变量)

 

define command {

        command_name monitor_localhost_boot

         command_line  /usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$  -p  $ARG3$

}

 

define command {

     command_name monitor_localhost_ftp

     command_line $USER1$/check_ftp -H localhost -p 21

 

}

 

2)设置接收报警消息的邮箱地址

[root@99 nagios]# /etc/init.d/postfix status

master (pid  1766) 正在运行...

 

[root@99 ~]# vim /usr/local/nagios/etc/objects/contacts.cfg (设置接收报警消息的邮箱地址,先邮件服务开启,如果设置的是163,一定要先能163能接收到)

 

3)定义监控时间模板的文件

[root@99 ~]# vim /usr/local/nagios/etc/objects/timeperiods.cfg

 28 define timeperiod{

 29         timeperiod_name 24x7

 30         alias           24 Hours A Day, 7 Days A Week

 31         sunday          00:00-24:00

 32         monday          00:00-24:00

 33         tuesday         00:00-24:00

 34         wednesday       00:00-24:00

 35         thursday        00:00-24:00

 36         friday          00:00-24:00

 37         saturday        00:00-24:00

 38         }

 39

 40

 41 # 'workhours' timeperiod definition

 42 define timeperiod{

 43         timeperiod_name workhours

 44         alias           Normal Work Hours

 45         monday          09:00-17:00

 46         tuesday         09:00-17:00

 47         wednesday       09:00-17:00

 48         thursday        09:00-17:00

 49         friday          09:00-17:00

 50         }

 

 

4)定义监控模板的配置文件

[root@99 ~]# vim /usr/local/nagios/etc/objects/templates.cfg

 

5)(这个文件监控本机的配置文件,是里面的内容容器导致,不是文件名,文件名可以随变起)把增加的监控本机的服务写在监控本机配置文件中(在底下写)

[root@99 ~]# vim /usr/local/nagios/etc/objects/localhost.cfg

 

define host{

        use                     linux-server            ; Name of host template to use

; This host definition will inherit all variables that are defined

; in (or inherited by) the linux-server host template definition.

        host_name               localhost

        alias                   localhost

        address                 127.0.0.1

        }

 

define service{

        use                             local-service         ; Name of service template to use

        host_name                       localhost

        service_description             PING

check_commandcheck_ping!100.0,20%!500.0,60%

        }

 

.........................

.........................

 

##################################myset################################333

 

define service{

        use                             local-service         ; Name of service template to use

        host_name                       localhost

        service_description             boot

        check_command                   monitor_localhost_boot!10%!15%!/boot

        notifications_enabled           0

        }

 

define service{

        use                             local-service         ; Name of service template to use

        host_name                       localhost

        service_description             ftp

        check_command                   monitor_localhost_ftp

        notifications_enabled           0

        }

 

[root@99 nagios]#  /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg 测试配置文件语法是否有错

 

6)设置访问cgi配置用户

[root@99 nagios]# cd /usr/local/nagios/sbin/

[root@99 sbin]# ls

avail.cgi   extinfo.cgi        outages.cgi  statuswml.cgi  tac.cgi

cmd.cgi     history.cgi        showlog.cgi  statuswrl.cgi

config.cgi  notifications.cgi  status.cgi   summary.cgi

 

[root@99 etc]# vim /usr/local/nagios/etc/cgi.cfg   设置访问cgi配置用户

119 authorized_for_system_information=nagiosadmin  (登录的用户名)

 

 

[root@room1pc01 桌面]# http://192.168.4.99/nagios 在浏览器中再点server刷新一下 等几分钟会再增加监控的服务

 

————————————————————————————————————————————————————————————————

—————————修改配置文件,增加一个监控远程主机———————————————————————————————————————

监控远端主机:

监控远端主机的公有数据192.168.4.98

监控远端主机的私有数据(服务状态)

网站 ftp 数据库 sshd  

 

 

 

监控远端公有数据:

 

 

[root@99 sbin]# /usr/local/nagios/libexec/check_http -H 192.168.4.98 -p 80

拒绝连接

HTTP CRITICAL - Unable to open TCP socket

 

[root@99 sbin]# vim /usr/local/nagios/etc/nagios.cfg       (1.在nagios主配置文件中,写上监控服务器的文件名,起服务会自动加载这)

  36 cfg_file=/usr/local/nagios/etc/objects/localhost.cfg (监控本机)

  37 cfg_file=/usr/local/nagios/etc/objects/otherser.cfg  (监控远程的主机)

 

[root@99 objects]# vim /usr/local/nagios/etc/objects/otherser.sh  (2.写容器定义监控远端主机的哪些服务)

 

define host{

        use                     linux-server

        host_name               server98

        address                 192.168.4.98

        }

 

define service{

        use                             local-service

        host_name                       server98

        service_description             httpd

        check_command                monitor_server98_httpd

        }

 

define service{

        use                             local-service

        host_name                       server98

        service_description             sshd

        check_command                monitor_server98_sshd

 

[root@99 objects]# vim /usr/local/nagios/etc/objects/commands.cfg        (3.设置监控远端服务使用的插件,)  

..................................

..................................

######################monitor##################################################

 

define command {

     command_name monitor_server98_httpd

     command_line $USER1$/check_httpd -H 192.168.4.98 -p 80

}

 

define command {

     command_name monitor_server98_ftp

     command_line $USER1$/check_ftp -H 192.168.4.98 -p 21

}

 

 

测试:会查到除了监控自己也会监控远端。

[root@room1pc01 桌面]# firefox http://192.168.4.99/nagios

 

 

[root@99 objects]# hostname localhost (退出再进,收邮件时,主机名如果是数字会有影响)

[root@localhost ~]# /etc/init.d/nagios restart

 

[root@localhost ~]# mail -u nagios

"/var/mail/nagios": 2 messages 2 unread

>U  1 nagios@localhost.loc  Thu Mar  9 08:44  32/924   "** PROBLEM Service Alert: server98/httpd is CRITICAL **"

 U  2 nagios@localhost.loc  Thu Mar  9 08:47  32/894   "** PROBLEM Service Alert: server98/sshd is CRITICAL **"

 

————————————————————————————————————————————————————————————————————————————————

 

监控远端主机的私有数据(磁盘 进程 用户)

总数,运行,休眠,僵尸

1  在被监控的主机上安装 监控插件  (192.168.4.98)

#yum  -y install gcc   gcc-c++

250  tar -zxvf nagios-plugins-1.4.14.tar.gz

cd nagios-plugins-1.4.14

  257  ./configure

  258  make

  259  make install

262  ls /usr/local/nagios/libexec/

 

[root@stu ~]# /usr/local/nagios/libexec/check_users -w 3 -c 5

USERS OK - 2 users currently logged in |users=2;3;5;0

 

[root@stu ~]#/usr/local/nagios/libexec/check_procs -w 50 -c 60  -s  Z

PROCS OK: 0 processes with STATE = Z

 

[root@stu ~]# /usr/local/nagios/libexec/check_procs -w 50 -c 60  -s  R

PROCS OK: 0 processes with STATE = R

 

[root@stu ~]# /usr/local/nagios/libexec/check_procs -w 50 -c 60   (总进程)

PROCS CRITICAL: 103 processes

 

[root@stu ~]# /usr/local/nagios/libexec/check_disk -w 50% -c30%  -p  /

DISK OK - free space: / 42137 MB (92% inode=97%);| /=3215MB;23893;33450;0;47786

 

2  在被监控的主机上运行nrpe服务

#useradd  nagios

#rpm  -q openssl   openssl-devel

#tar -zxvf nrpe-2.12.tar.gz

#cd nrpe-2.12

#./configure

#make all

#make install-plugin

#make install-daemon

#make install-daemon-config

#make install-xinetd

#vim /etc/xinetd.d/nrpe

only_from       = 127.0.0.1  172.40.50.99

:wq

 

#vim /etc/services

nrpe            5666/tcp# NRPE

:wq

 

#yum -y  install xinetd

#service  xinetd start

# chkconfig xinetd on

#netstat -utnalp  | grep :5666

 

3.改nrpe服务的主配置文件,设置获取本地的私有数据(192.168.4.98)

 

 

vim /usr/local/nagios/etc/nrpe.cfg

#command[命令名]=本机使用的插件

199 command[check_nrpe_users]=/usr/local/nagios/libexec/check_users -w 2 -c 5

200 command[check_nrpe_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c     30,25,20

201 command[check_nrpe_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10%     -p /dev/vda1

202 command[check_nrpe_root]=/usr/local/nagios/libexec/check_disk -w 20% -c 10%     -p /

204 command[check_nrpe_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5     -c 10 -s Z

205 command[check_nrpe_total_procs]=/usr/local/nagios/libexec/check_procs -w 150     -c 200

 

:wq

 

# /etc/init.d/xinetd restart

 

 

 

验证nrpe命令(192.168.4.98)

/usr/local/nagios/libexec/check_nrpe -H localhost -c  check_nrpe_root

 

 

4.在监控服务器上配置(192.168.4.99)

 

#/usr/local/nagios/libexec/check_nrpe -H 192.168.4.98   -p 5666  -c  check_nrpe_users  (报错,没那个目录)

 

1)安装依赖包

# yum -y install openssl openssl-devel

2)

# tar -zxvf  nrpe-2.12.tar.gz

 

# cd nrpe-2.12

#./configure

 

 #make all

#make install-plugin

 

 #ls /usr/local/nagios/libexec/check_nrpe

 

 

 

3)在命令行下测试

 

 

# /usr/local/nagios/libexec/check_nrpe -H 192.168.4.98 -p 5666 -c check_nrpe_users

USERS OK - 2 users currently logged in |users=2;2;5;0

 

# /usr/local/nagios/libexec/check_nrpe -H 192.168.4.98 -p 5666 -c check_nrpe_load

OK - load average: 0.00, 0.00, 0.00|load1=0.000;15.000;30.000;0; load5=0.000;10.000;25.000;0; load15=0.000;5.000;20.000;0;

 

 

4)把连接nrpe服务的插件定义成nagios服务可以使用的监控命令

 

# vim /usr/local/nagios/etc/objects/commands.cfg

...............................

 

define command {

     command_name monitor_server98_boot

     command_line $USER1$/check_nrpe -H 192.168.4.98 -p 5666 -c check_nrpe_boot

}

 

define command {

     command_name monitor_server98_load

     command_line $USER1$/check_nrpe -H 192.168.4.98 -p 5666 -c  check_nrpe_load

}

 

define command {

     command_name monitor_server98_users

     command_line $USER1$/check_nrpe -H 192.168.4.98 -p 5666 -c  check_nrpe_load

}

 

define command {

     command_name monitor_server98_root

     command_line $USER1$/check_nrpe -H 192.168.4.98 -p 5666 -c  check_nrpe_root

}

 

define command {

     command_name monitor_server98_zombie

     command_line $USER1$/check_nrpe -H 192.168.4.98 -p 5666 -c  check_nrpe_zombie_proc

}

 

define command {

     command_name monitor_server98_total_procs

    command_line $USER1$/check_nrpe -H 192.168.4.98 -p 5666 -c  check_nrpe_total_procs

}

 

5)在监控主机配置文件中调用定义的监控命令

 

# vim /usr/local/nagios/etc/objects/otherser.cfg

 

#################################private###########################

define service{

        use                             local-service

        host_name                       server98

      service_description             users

        check_command                   monitor_server98_users

}

 

define service{

        use                             local-service

        host_name                       server98

        service_description             root

        check_command                   monitor_server98_root

}

 

define service{

        use                             local-service

        host_name                       server98

        service_description             total_procs

        check_command                   monitor_server98_total_procs

}

 

define service{

        use                             local-service

        host_name                       server98

        service_description             boot

        check_command                   monitor_server98_boot

}

 

define service{

        use                             local-service

        host_name                       server98

        service_description             load

        check_command                   monitor_server98_load

}

define service{

        use                             local-service

        host_name                       server98

        service_description             zombie

        check_command                   monitor_server98_zombie

}

 

 

 

6)测试配置文件是否有语法错误

 

# alias plj='/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg'

# plj

 

 

7)重启监控服务

 

# /etc/init.d/nagios restart

 

8)查看监控信息