在Nagios的libexec下有check_nt这个插件,它就是用来检查windows机器的服务的。其功能类似于check_nrpe。不过还需要搭配另外一个软件NSClient++,它则类似于NRPE。我们需要下载NSClient合适的版本,然后安装在被监控的windows主机上。
可以看到NSClient与nrpe最大的区别就是:
•NRPE: 被监控机上安装有nrpe,并且还有插件,最终的监控是由这些插件来进行的。当监控主机将监控请求发给nrpe后,nrpe调用插件来完成监控。
•NSClient++: NSClient++则不同,被监控机上只安装NSClient,没有任何的插件。当监控主机将监控请求发给NSClient++后,NSClient直接完成监控,所有的监控是由NSClient完成的。
这也说明了NSClient++的一个很大的问题:不灵活、没有可扩展性。它只能完成自己本身包含的监控操作,不能由一些插件来扩展。好在NSClient++已经做的不错了,基本上可以完全满足我们的监控需求。
1、在被监控的windows主机上安装NSClient++
从http://www.nsclient.org/nscp/downloads 下载NSClient++。安装的过程和其他windows应用程序的安装过程类似,next即可,选项我们都保持默认。安装过程中需要填写监控端的IP地址和密码,以及把下面的Modules全部勾选上。
安装完成后,我们查看是否启用了5666和12489端口,如果有,表明NSClient服务启动正常
在“运行”里面,输入services.msc, 打开“服务”
如果看到NSClient++,说明正常。
设置远程桌面交互
2.配置NSClient++
默认是安装在D:\Program Files\NSClient++ 目录下,nsclient.ini即为NSClient服务的配置文件,一般我们无需修改,但是当我们监控端的IP地址改变时,
或者密码忘记,即可以在这里修改了。
3.Nagios监控服务器设置
[root@bairui212 ~]# vi /usr/local/nagios/etc/nagios.cfg
搜索找到:
# Definitions for monitoring a Windows machine
# cfg_file=/usr/local/nagios/etc/objects/windows.cfg
将第二行的#去掉打开Nagios监控的Windows模块,然后保存。
接着编辑
[root@bairui212 ~]# vi /usr/local/nagios/etc/objects/windows.cfg
注意: 除了主机名和ip改成自己的其他的不用改
在host的定义部分,可以修改实例中的host_name,alias,address三个部分来的参数。第一个参数use则是继承了template.cfg中对Windows的监控参数
# Define a host for the Windows machine we'll be monitoring
# Change the host_name, alias, and address to fit your situation
define host{
use windows-server ; Inherit default values from a template 从template中继承相关监控参数
host_name BR56 ; The name we're giving to this host 主机名称
alias ZWBR56 ; A longer name associated with the host 别名
address 192.168.0.56 ; IP address of the host IP地址
}
4. 在Service定义部分,参数说明
首先需要设定以下内容,用于确认监控对象的NSClient++的版本是否正确
define service{
use generic-service
host_name winserver
service_description NSClient++ Version
check_command check_nt!CLIENTVERSION
}
然后是定义监控对象的正常运行时间
check_command check_nt!UPTIME
随后是定义CPU的负载状况,下面的定义表示在5分钟内的平均负载超过80%则发出警告WARNING,而超过90%则是危机报警CRITICAL alert
check_command check_nt!CPULOAD!-l 5,80,90
定义内存负载状况,当内存使用率达到80则warning 90%则CRITICAL alert
check_command check_nt!MEMUSE!-w 80 -c 90
监控C盘空间,使用率达到80则warning 90%则CRITICAL alert
check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90
监控服务状态的格式,当服务停止了则发送CRITICAL alert
check_command check_nt!SERVICESTATE!-d SHOWALL -l W3SVC
监控系统进程,当进程处于非运行状态时,则发送CRITICAL alert
check_command check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe
5. 设置被监控主机名和被监控的Windows服务器地址。
vi /usr/local/nagios/etc/objects/commands.cfg中是否有以下内容(允许使用check_nt来监控windows服务)
# 'check_nt' command definition
define command{
command_name check_nt
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
}
在vi /usr/local/nagios/etc/object/templates.cfg中是否有以下内容(这个用于未来添加新主机的时候的范例)
# Windows host definition template - This is NOT a real host, just a template!
define host{
name windows-server ; The name of this host template
use generic-host ; Inherit default values from the generic-host template
check_period 24x7 ; By default, Windows servers are monitored round the clock
check_interval 5 ; Actively check the server every 5 minutes
retry_interval 1 ; Schedule host check retries at 1 minute intervals
max_check_attempts 10 ; Check each server 10 times (max)
check_command check-host-alive; Default command to check if servers are "alive"
notification_period 24x7 ; Send notification out at any time - day or night
notification_interval 30 ; Resend notifications every 30 minutes
notification_options d,r ; Only send notifications for specific host states
contact_groups admins ; Notifications get sent to the admins by default
hostgroups windows-servers ; Host groups that Windows servers should be a member of
register 0 ; DONT REGISTER THIS - ITS JUST A TEMPLATE
}
6.密码设定
编辑/usr/local/nagios/etc/objects/commands.cfg 在check_nt部分中,command_line后面添加一个参数-s "NSClient主机的密码"
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -s 123456 -v $ARG1$ $ARG2$
可以通过以下指令测试设备响应是否正常
/usr/local/nagios/libexec/check_nt -H 主机IP -p 12489 -s 密码 -v UPTIME
[root@bairui212 ~]# /usr/local/nagios/libexec/check_nt -H 192.168.0.56 -p 12489 -s 123456 -v UPTIME
System Uptime - 0 day(s) 5 hour(s) 21 minute(s) |uptime=321
如果反馈的信息是System Uptime - 0 day(s) 8 hour(s) 44 minute(s) 则表示连接正常。
如果显示为could not fetch information from server,则有以下可能
密码不正确(最傻的可能,也是经常发生的可能)
服务器上有防火墙,需要开放12489端口。
在windows上的NSC.INI上的allow_hosts没有添加正确的nagios IP、
7.重新启动Nagios服务
在修改nagios配置文件之后,则可以运行 /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
[root@bairui212 ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 4.0.6
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 04-29-2014
License: GPL
Website: http://www.nagios.org
Reading configuration data...
Read main config file okay...
Error: Could not find any host matching 'winserver' (config file '/usr/local/nagios/etc/objects/windows.cfg', starting on line 138)
Error: Failed to expand host list 'winserver' for service 'Explorer' (/usr/local/nagios/etc/objects/windows.cfg:138)
Error processing object config files!
***> One or more problems was encountered while processing the config files...
Check your configuration file(s) to ensure that they contain valid
directives and data defintions. If you are upgrading from a previous
version of Nagios, you should be aware that some variables/definitions
may have been removed or modified in this version. Make sure to read
the HTML documentation regarding the config files, as well as the
'Whats New' section to find out what has changed.
经过一番修改终于好了
[root@bairui212 etc]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg --验证
Nagios Core 4.0.6
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 04-29-2014
License: GPL
Website: http://www.nagios.org
Reading configuration data...
Read main config file okay...
Read object config files okay...
Running pre-flight check on configuration data...
Checking objects...
Checked 15 services.
Checked 2 hosts.
Checked 2 host groups.
Checked 0 service groups.
Checked 1 contacts.
Checked 1 contact groups.
Checked 24 commands.
Checked 5 time periods.
Checked 0 host escalations.
Checked 0 service escalations.
Checking for circular paths...
Checked 2 hosts
Checked 0 service dependencies
Checked 0 host dependencies
Checked 5 timeperiods
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
Things look okay - No serious problems were detected during the pre-flight check
关闭服务:[root@bairui212 bin]# /usr/local/apache/bin/apachectl stop
启动服务:[root@bairui212 bin]# /usr/local/apache/bin/apachectl start
来验证配置文件是否正确。
重启服务:service nagios restart
看看nagios的管理页面,被监控的windows服务器是否显示
http://192.168.0.212:80/nagios/ 登陆验证