九、监控oracle数据库
1、在监控机10.100.10.11上添加一个check_nrpe的命令
# vi /usr/local/nagios/etc/objects/commands.cfg
define command {
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
2、在监控机10.11上创建一个监控oracle服务的文件oracle.cfg。
# cd /usr/local/nagios/etc/objects
# vi oracle6.cfg
define host {
use linux-server
host_name ledbbackup01
alias Oracle 10g
address 10.100.10.6
}
define service {
use generic-service
host_name ledbbackup01
service_description TNS Check
check_command check_nrpe!check_oracle_tns
}
define service {
use generic-service
host_name ledbbackup01
service_description DB Check
check_command check_nrpe!check_oracle_db
}
define service {
use generic-service
host_name ledbbackup01
service_description Login Check
check_command check_nrpe!check_oracle_login
}
define service {
use generic-service
host_name ledbbackup01
service_description Cache Check
check_command check_nrpe!check_oracle_cache
}
define service {
use generic-service
host_name ledbbackup01
service_description Tablespace Check
check_command check_nrpe!check_oracle_tablespace
}
3、在nagios的配置文件里添加上这个监控的文件
# cd /usr/loca/nagios/etc
# vi nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/oracle6.cfg
4、在oracle服务器上安装nrpe,具体步骤如上,不再详述。
添加几条监控的命令:
# vi /usr/local/nagios/etc/nrpe.cfg
command[check_oracle_tns]=/usr/local/nagios/libexec/check_oracle --tns legold01
command[check_oracle_db]=/usr/local/nagios/libexec/check_oracle --db legold01
command[check_oracle_login]=/usr/local/nagios/libexec/check_oracle --login legold01
command[check_oracle_cache]=/usr/local/nagios/libexec/check_oracle --cache legold01 nagios 123.com 80 90
command[check_oracle_tablespace]=/usr/local/nagios/libexec/check_oracle --tablespace legold01 nagios 123.com USERS 90 80
具体的命令格式可参考check_oracle的help文档。
5、在oracle服务器上将check_oracle插件修改一下:将 $ORACLE_HOME 以及 $PATH 手动加入,避免出现问题。
export ORACLE_HOME=/home/oracle/app/product/11.2.0/db_1
export PATH=$PATH:$ORACLE_HOME/bin
此时,可以启动nrpe服务了:
# service nrped start
对了,别忘记了在10.11监控机上将nagios服务重启一下:
# service nagios restart
6此时,打开浏览器查看:
好像出错了,仔细察看了一下nagios的错误日志,发现监控机10.11上没有check_nrpe 的插件,所以还要进行如下步骤:
7、copy oracle服务器10.6上的check_nrpe 到10.11上
在10.11上:
# cd /usr/local/nagios/libexec
# ./check_nrpe
./check_nrpe: error while loading shared libraries: libssl.so.6: cannot open shared object file: No such file or directory
没有libssl.so.6的库文件,
解决方法:
在10.6上:
# find / -name libssl.so.6
/lib64/libssl.so.6
/lib/libssl.so.6
8、在10.11上:
# cd /usr/local/nagios/libexec
# ./check_nrpe
./check_nrpe: error while loading shared libraries: libcrypto.so.6: cannot open shared object file: No such file or directory
还是缺少库文件,不用怕,继续到10.6上拷贝,这叫越挫越勇!
在10.6上:
# find / -name libcrypto.so.6
/lib64/libcrypto.so.6
/lib/libcrypto.so.6
9、好吧,再回到10.11上尝试一下:
#./check_nrpe
Incorrect command line arguments supplied
NRPE Plugin for Nagios
Copyright (c) 1999-2008 Ethan Galstad (nagios@nagios.org)
Version: 2.13
Last Modified: 11-11-2011
License: GPL v2 with exemptions (-l for more info)
SSL/TLS Available: Anonymous DH Mode, OpenSSL 0.9.6 or higher required
Usage: check_nrpe -H <host> [-n] [-u] [-p <port>] [-t <timeout>] [-c <command>] [-a <arglist...>]
Options:
-n = Do no use SSL
-u = Make socket timeouts return an UNKNOWN state instead of CRITICAL
<host> = The address of the host running the NRPE daemon
[port] = The port on which the daemon is running (default=5666)
[timeout] = Number of seconds before connection times out (default=10)
[command] = The name of the command that the remote daemon should run
[arglist] = Optional arguments that should be passed to the command. Multiple
arguments should be separated by a space. If provided, this must be
the last option supplied on the command line.
此时就说明check_nrpe可以用了!真的好难得哦!
# ./check_nrpe -H 10.100.10.6 -p 5666
NRPE v2.13
10、重新启动nagios服务:
# service nagios restart
输入ip地址查看一下:
好了,就先记录到这里吧,作为以后的参考文档。
岁月静好,岁月静好。
(*^__^*) 嘻嘻……