Zabbix 监控进程宕机

  业务需求后端进程宕机以后能在短时间内迅速拉起,业务影响不大,但是开发需要查看coredump,要求能监控到pid变化;在现有构架下zabbix能监控并报警;

当然zabbix设置报警设置就不再一一

在每台服务器/etc/zabbix/zabbix_agentd.conf设置路径:此例只需要piddiff.sh

UserParameter=checkpid,sh /usr/local/script/piddiff.sh

UserParameter=test,sh /usr/local/script/test.sh

UserParameter=discovery.process,/usr/local/script/disprocess.sh

UserParameter=process.check[*],/usr/local/script/proc_check.sh $1 $2 $3


/usr/local/script下面存放脚本

Vim piddiff.sh

aapid为业务监控id 取值根据业务需求;

#/bin/sh

onl_ok=1

onl_cored=3

dir=/usr/local/script

if [[ ! -f "$dir/old.txt" ]];then

 ps aux|grep aapid |grep -v grep|grep -v /bin/bash|awk  '{print $2,$11}' > $dir/old.txt

      else

         sleep 1s           

fi

 ps aux|grep aapid |grep -v grep|grep -v /bin/bash|awk  '{print $2,$11}' > $dir/now.txt

if ! diff -q $dir/old.txt  $dir/now.txt > /dev/null; then

          echo $onl_cored

          diff -c $dir/old.txt  $dir/now.txt > $dir/`date "+%Y%m%d%H%M"`_diff.txt

          cat $dir/now.txt >$dir/old.txt

     else

          echo $onl_ok    

fi

一个简单的判断脚本;

Zabbix30秒会抓取一次,正常没变化为1,有变化为3,那么zabbix抓取数值为3则表示pid有变化,会发出警报;

Zabbix设置:

 

监控项模板添加如下:

1.png


触发器:{Template OS Linux:checkpid.last()}=3

2.png