一、运行环境
软路由上跑了 esxi 6.7,esxi 上运行了 openwrt 虚拟机作为主路由,UPS 给软路由供电
实现功能
1.停电后UPS供电期间esxi自动关机
ping内网市电供电的设备,判断当前市电是否正常,如果超过指定次数,都不能ping通,则说明市电故障,关机,从而实现停电自动关机
2.网络故障自动恢复
ping 外网IP,超过指定次数不能ping通,则重启openwrt虚拟机,如果连续重启了指定次数网络还不能恢复正常,则重启软路由
二、整体文件结构
config.ini 配置文件
log.txt 日志文件,由程序生成
power_and_network.py 脚本文件
三、python 脚本
文件路径:/vmfs/volumes/datastore1/python_scripts/power_and_network/power_and_network.py
坑坑:
注意赋予文件执行权限!!!
#!/bin/python3
# -*- coding: utf-8 -*-
import os, sys
import time
import subprocess
import logging
import configparser
config = {}
def process_detect():
'''检测当前是否已经有在运行该程序,有则退出'''
status, output = subprocess.getstatusoutput("ps -c | grep %s | grep -v grep | wc -l" % os.path.basename(__file__))
print("Number of processes: %s" % output)
process_sum = int(output)
if process_sum > 1:
print("Process already running, exit!");
sys.exit()
else:
logging.info("Start ===== ")
def ping(addr):
'''ping 指定ip,网络正常返回 True'''
ret = os.system("ping -c 1 -W 1 %s > /dev/null" % addr)
if ret == 0:
return True
else:
return False
def power_220v_is_ok():
'''ping 指定ip,判断220v市电是否正常'''
if ping(config['220v_ip1']) or ping(config['220v_ip2']):
return True
else:
return False
def network_is_ok():
'''ping 指定外网IP 判断网络是否正常'''
if ping(config['network_ip1']) or ping(config['network_ip2']):
return True
else:
return False
def openwrt_reset():
'''重启 openwrt'''
logging.info("openwrt_reset!")
os.system(config['openwrt_off'])
time.sleep(2)
os.system(config['openwrt_on'])
time.sleep(config['openwrt_wait'])
def log_config():
'''日志配置'''
logging.basicConfig(filename=os.path.dirname(os.path.realpath(__file__)) + '/log.txt', level=logging.DEBUG, format="%(asctime)s %(message)s", datefmt="%Y/%m/%d %H:%M:%S -> ")
# logging.basicConfig(level=logging.DEBUG, format="%(asctime)s %(message)s", datefmt="%Y-%m-%d %H:%M:%S ->")
def readConf():
'''读取配置文件'''
conf = configparser.ConfigParser()
file_path = os.path.dirname(os.path.realpath(__file__)) + '/config.ini' # 配置文件绝对路径
print(file_path)
conf.read(file_path, encoding = "utf-8")
config['220v_ip1'] = conf.get("power_220v", "ip1")
config['220v_ip2'] = conf.get("power_220v", "ip2")
config['220v_ping_sum'] = int(conf.get("power_220v", "ping_sum"))
config['network_ip1'] = conf.get("network", "ip1")
config['network_ip2'] = conf.get("network", "ip2")
config['network_ping_sum'] = int(conf.get("network", "ping_sum"))
config['network_reset_esxi'] = int(conf.get("network", "reset_esxi"))
config['openwrt_off'] = conf.get("openwrt", "off_cmd")
config['openwrt_on'] = conf.get("openwrt", "on_cmd")
config['openwrt_wait'] = int(conf.get("openwrt", "wait"))
config['sleep_sec'] = int(conf.get("default", "sleep_sec"))
print("===== Configuration =====")
print(config)
print("====================")
def main():
log_config()
readConf()
process_detect()
power_220v_cnt = 0
openwrt_cnt = 0
openwrt_reset_wan_cnt = 0
while True:
sleep_sec = config['sleep_sec']
if power_220v_is_ok():
print("Power 220V ok!")
power_220v_cnt = 0
else:
sleep_sec -= 4
power_220v_cnt += 1
logging.info("[%d/%d] 220V Power maybe off,checking again after %d seconds!" % (power_220v_cnt, config['220v_ping_sum'], config['sleep_sec']))
if power_220v_cnt >= config['220v_ping_sum']:
logging.info("Poweroff!")
os.system("poweroff")
while True:
time.sleep(1)
if network_is_ok():
print("Network ok!")
openwrt_cnt = 0
openwrt_reset_wan_cnt = 0
else:
sleep_sec -= 4
openwrt_cnt += 1
logging.info("[%d/%d] Network maybe disconnected!(reboot %d/%d)" % (openwrt_cnt, config['network_ping_sum'], openwrt_reset_wan_cnt, config['network_reset_esxi']))
if openwrt_cnt >= config['network_ping_sum']:
openwrt_cnt = 0
openwrt_reset_wan_cnt += 1
if openwrt_reset_wan_cnt >= config['network_reset_esxi']:
if power_220v_is_ok(): #市电正常时才进行重启,否则配置的市电故障关机时间比重启时间长的话,会一直重启无法关机
logging.info("Reboot!")
os.system("reboot")
while True:
time.sleep(1)
openwrt_reset()
time.sleep(sleep_sec)
if __name__ == '__main__':
main()
四、配置文件
[power_220v]
#两内网IP用于判断市电
ip1 = 192.168.2.2
ip2 = 192.168.2.234
#超过该次数,esxi 执行关机, 时间 = ping_sum * sleep_sec + (ping_sum / network_ping_sum) * wait
#公式转为: 时间 = ping_sum * (sleep_sec + (wait / network_ping_sum))
ping_sum = 40
[network]
#两个外网IP用于判断网络是否正常
ip1 = 114.114.114.114
ip2 = 223.6.6.6
#超过该次数,esxi重启openwrt虚拟机,时间 = ping_sum * sleep_sec
ping_sum = 4
#连续重启openwrt虚拟机超过该次数,网络还不正常,放大招重启 esxi, 时间 = (ping_sum * sleep_sec + wait) * reset_esxi
#!!!宽带故障的话,软路由会一直重启,时间不要太短
reset_esxi = 30
[openwrt]
#外网不正常时,重启openwrt虚拟机
#获取所有虚拟机的vmid命令: vim-cmd vmsvc/getallvms (下边命令后边的数字就是 vmid)
#关闭虚拟机:vim-cmd vmsvc/power.off 3
#打开虚拟机:vim-cmd vmsvc/power.on 3
#重启虚拟机:vim-cmd vmsvc/power.reset 3
#实测执行重启虚拟机命令后,大概30秒恢复网络
#注意:启动命令要加 & 后台运行,不然会阻塞等待启动,导致时间大大延长
off_cmd = vim-cmd vmsvc/power.off 3
on_cmd = vim-cmd vmsvc/power.on 3 &
#执行开机命令后,等待时间
wait = 30
[default]
#内部有时间修正,不可以小于8
sleep_sec = 10
IP 跟时间根据自己环境修改
注意用 vim-cmd vmsvc/getallvms
命令获取openwrt 虚拟机 id 进行修改
五、日志文件
日志格式如下
2021/05/23 22:01:54 -> Start =====
2021/05/23 22:01:58 -> [1/40] 220V Power maybe off,checking again after 10 seconds!
2021/05/23 22:02:02 -> [1/4] Network maybe disconnected!(reboot 0/30)
2021/05/23 22:02:08 -> [2/4] Network maybe disconnected!(reboot 0/30)
2021/05/23 22:02:18 -> [3/4] Network maybe disconnected!(reboot 0/30)
2021/05/23 22:02:28 -> [4/4] Network maybe disconnected!(reboot 0/30)
2021/05/23 22:02:28 -> openwrt_reset!
六、设置计划任务定时启动脚本
利用cron设置计划任务vi /etc/rc.local.d/local.sh
修改文件,在文件末尾的 exit 0 前面添加上我们需要的命令:
/bin/kill $(cat /var/run/crond.pid)
/bin/echo "*/1 * * * * /vmfs/volumes/datastore1/python_scripts/power_and_network/power_and_network.py > /dev/null &" >> /var/spool/cron/crontabs/root
/usr/lib/vmware/busybox/bin/busybox crond
含义:
第一行:cat /var/run/crond.pid,该命令会显示出crond该命令的进程号,然后调用kill命令将crond进程kill掉
第二行:使用echo命令在 /var/spool/cron/crontabs/root 文件中追加我们要设置的计划任务的命令,"*/1 * * * * xxxxxx"
意思是每1分钟,执行一次 xxxxxx
坑坑:注意这里要加上 > /dev/null &
让脚本后台运行!!!!
第三行:重启crond进程
最后执行 /sbin/auto-backup.sh
脚本,local.sh文件才会正真地保存,否则会可能重启后就会被还原。