keepalived实现nginx高可用

1、环境说明

IP 服务 作用
192.168.1.101 nginx + keepalived master
192.168.1.102 nginx + keepalived backup
192.168.1.103 虚拟ip(VIP)
  • 说明: 系统:CentOS 6.10 master配一个,backup可以配置多个; 虚拟ip(VIP):192.168.1.103,对外提供服务的ip,也可称作浮动ip

各个组件之间的关系图如下: tomcat的安装不在本博客范围之内;

2、nginx 安装与配置

2.1、安装nginx

master和backup所有节点都安装 配置nginx官方源

vim /etc/yum.repos.d/nginx.repo

添加如下内容:

[nginx]
name=nginx repo
baseurl=http://nginx.org/packages/centos/$releasever/$basearch/
gpgcheck=0
enabled=1

安装

yum install nginx -y

2.2、master节点配置

2.2.1、删除没用的配置内容(可选)

vim /etc/nginx/conf.d/default.conf

改为如下:

server {
    listen       80;
    server_name  localhost;

    access_log  /var/log/nginx/host.access.log  main;

    location / {
        root   /usr/share/nginx/html;
        index  index.html index.htm;
    }

    error_page  404              /404.html;

    error_page   500 502 503 504  /50x.html;
    location = /50x.html {
        root   /usr/share/nginx/html;
    }
}

2.2.2、修改nginx默认显示内容

vim /usr/share/nginx/html/index.html

只修改第14行内容,如下:

 1 <!DOCTYPE html>
 2 <html>
 3 <head>
 4 <title>Welcome to nginx!</title>
 5 <style>
 6     body {
 7         width: 35em;
 8         margin: 0 auto;
 9         font-family: Tahoma, Verdana, Arial, sans-serif;
10     }
11 </style>
12 </head>
13 <body>
14 Welcome to nginx! test keepalived master!
15 <p>If you see this page, the nginx web server is successfully installed and
16 working. Further configuration is required.</p>
17 
18 <p>For online documentation and support please refer to
19 <a rel="nofollow" href="http://nginx.org/">nginx.org</a>.<br/>
20 Commercial support is available at
21 <a rel="nofollow" href="http://nginx.com/">nginx.com</a>.</p>
22
23 <p><em>Thank you for using nginx.</em></p>
24 </body>
25 </html>

2.3、backup节点配置

只把/usr/share/nginx/html/index.html的第14行改为如下,其它和master一致。

Welcome to nginx! test keepalived backup!

3、keepalived服务

3.1、keepalived 是什么?

Keepalived 一方面具有配置管理LVS的功能,同时还具有对LVS下面节点进行健康检查的功能,另一方面也可实现系统网络服务的高可用功能,用来防止单点故障。

3.2、keepalived 工作原理

keepalived 是以 VRRP 协议为实现基础,VRRP全称Virtual Router Redundancy Protocol,即虚拟路由冗余协议。 虚拟路由冗余协议,可以认为是实现路由器高可用的协议,即将N台提供相同功能的路由器组成一个路由器组,这个组里面有一个master和多个backup,master上面有一个对外提供服务的vip(该路由器所在局域网内其他机器的默认路由为该vip),master会发组播vrrp包,用于通知backup自己还活着,当backup收不到vrrp包时就认为master宕掉了,这时就需要根据VRRP的优先级来选举一个backup当master。这样的话就可以保证路由器的高可用了。保证业务的连续性,接管速度最快可以小于1秒。

3.3、keepalived主要有三个模块,分别是core、check和vrrp。

core模块为keepalived的核心,负责主进程的启动、维护以及全局配置文件的加载和解析。

check负责健康检查,包括常见的各种检查方式。

vrrp模块是来实现VRRP协议的。

3.4、keepalived 与 zookeeper 高可用性区别

  • Keepalived:
    • 优点:简单,基本不需要业务层面做任何事情,就可以实现高可用,主备容灾。而且容灾的宕机时间也比较短。
    • 缺点:也是简单,因为VRRP、主备切换都没有什么复杂的逻辑,所以无法应对某些特殊场景,比如主备通信链路出问题,会导致脑裂。同时keepalived也不容易做负载均衡。
  • Zookeeper:
    • 优点:可以支持高可用,负载均衡。本身是个分布式的服务。
    • 缺点:跟业务结合的比较紧密。需要在业务代码中写好ZK使用的逻辑,比如注册名字。拉取名字对应的服务地址等。

4、keepalived 配置

4.1、keepalived 安装

master和backup所有节点都安装

[root@node1 ~]# yum install keepalived -y
[root@node1 ~]# rpm -ql keepalived
/etc/keepalived
/etc/keepalived/keepalived.conf     # keepalived服务主配置文件
/etc/rc.d/init.d/keepalived         # 服务启动脚本(centos 7 之前的用init.d 脚本启动,之后的systemd启动)
/etc/sysconfig/keepalived
/usr/bin/genhash
/usr/libexec/keepalived
/usr/sbin/keepalived
/usr/share/doc/keepalived-1.2.13
... ...
/usr/share/man/man1/genhash.1.gz
/usr/share/man/man5/keepalived.conf.5.gz
/usr/share/man/man8/keepalived.8.gz
/usr/share/snmp/mibs/KEEPALIVED-MIB.txt

4.2、默认配置及说明

[root@node1 keepalived]# cat /etc/keepalived/keepalived.conf
! Configuration File for keepalived     # 全局定义
 
global_defs {
     notification_email {       # 指定keepalived在发生事件时(比如切换)发送通知邮件的邮箱
     acassen@firewall.loc    # 设置报警邮件地址,可以设置多个,每行一个。 需开启本机的sendmail服务
     failover@firewall.loc
     sysadmin@firewall.loc
   }
   notification_email_from Alexandre.Cassen@firewall.loc # keepalived在发生诸如切换操作时需要发送email通知地址
   smtp_server 192.168.200.1    # 指定发送email的smtp服务器
   smtp_connect_timeout 30      # 设置连接smtp server的超时时间
   router_id LVS_DEVEL      # 运行keepalived的机器的一个标识,通常可设为hostname。故障发生时,发邮件时显示在邮件主题中的信息。
   vrrp_skip_check_adv_addr
   vrrp_strict
   vrrp_garp_interval 0
   vrrp_gna_interval 0
}
<br># 虚拟 IP 配置 vrrp
vrrp_instance VI_1 {        # keepalived在同一virtual_router_id中priority(0-255)最大的会成为master,也就是接管VIP,当priority最大的主机发生故障后次priority将会接管
    state MASTER
	# 指定keepalived的角色,MASTER表示此主机是主服务器,BACKUP表示此主机是备用服务器。
	# 注意这里的state指定instance(Initial)的初始状态,就是说在配置好后,这台服务器的初始状态就是这里指定的,但这里指定的不算,还是得要通过竞选通过优先级来确定。
	# 如果这里设置为MASTER,但如若他的优先级不及另外一台,那么这台在发送通告时,会发送自己的优先级,另外一台发现优先级不如自己的高,那么他会就回抢占为MASTER
		
    interface eth1          # 绑定虚拟 IP 的网络接口,与本机 IP 地址所在的网络接口相同, 我的是 eth1;
    virtual_router_id 51    # 虚拟路由标识,这个标识是一个数字,同一个vrrp实例使用唯一的标识。即同一vrrp_instance下,MASTER和BACKUP必须是一致的;
    priority 100            # 定义优先级,数字越大,优先级越高,在同一个vrrp_instance下,MASTER的优先级必须大于BACKUP的优先级,值范围 0-254;
    advert_int 1            # 设定MASTER与BACKUP负载均衡器之间同步检查的时间间隔,单位是秒;
    authentication {        # 设置验证类型和密码。主从必须一样;
        auth_type PASS     # 设置vrrp验证类型,主要有PASS和AH两种;
        auth_pass 1111      # #设置vrrp验证密码,在同一个vrrp_instance下,MASTER与BACKUP必须使用相同的密码才能正常通信;
    }<br>   <br>    ## 将 track_script 块加入 instance 配置块 <br> <br>    track_script {<br>        chk_nginx  ## 执行 Nginx 监控的服务 <br>    }
    virtual_ipaddress {          #VRRP HA 虚拟地址 如果有多个VIP,继续换行填写
        192.168.200.16
        192.168.200.17
        192.168.200.18
    }
}

virtual_server 192.168.200.100 443 {
    delay_loop 6
    lb_algo rr
    lb_kind NAT
    nat_mask 255.255.255.0
    persistence_timeout 50
    protocol TCP

    real_server 192.168.201.100 443 {
        weight 1
        SSL_GET {
            url {
              path /
              digest ff20ad2481f97b1754ef3e12ecd3a9cc
            }
            url {
              path /mrtg/
              digest 9b3a0c85a887a256d6939da88aabd8cd
            }
            connect_timeout 3
            nb_get_retry 3
            delay_before_retry 3
        }
    }
}

virtual_server 10.10.10.2 1358 {
    delay_loop 6
    lb_algo rr
    lb_kind NAT
    persistence_timeout 50
    protocol TCP

    sorry_server 192.168.200.200 1358

    real_server 192.168.200.2 1358 {
        weight 1
        HTTP_GET {
            url {
              path /testurl/test.jsp
              digest 640205b7b0fc66c1ea91c463fac6334d
            }
            url {
              path /testurl2/test.jsp
              digest 640205b7b0fc66c1ea91c463fac6334d
            }
            url {
              path /testurl3/test.jsp
              digest 640205b7b0fc66c1ea91c463fac6334d
            }
            connect_timeout 3
            nb_get_retry 3
            delay_before_retry 3
        }
    }

    real_server 192.168.200.3 1358 {
        weight 1
        HTTP_GET {
            url {
              path /testurl/test.jsp
              digest 640205b7b0fc66c1ea91c463fac6334c
            }
            url {
              path /testurl2/test.jsp
              digest 640205b7b0fc66c1ea91c463fac6334c
            }
            connect_timeout 3
            nb_get_retry 3
            delay_before_retry 3
        }
    }
}

virtual_server 10.10.10.3 1358 {
    delay_loop 3
    lb_algo rr
    lb_kind NAT
    nat_mask 255.255.255.0
    persistence_timeout 50
    protocol TCP

    real_server 192.168.200.4 1358 {
        weight 1
        HTTP_GET {
            url {
              path /testurl/test.jsp
              digest 640205b7b0fc66c1ea91c463fac6334d
            }
            url {
              path /testurl2/test.jsp
              digest 640205b7b0fc66c1ea91c463fac6334d
            }
            url {
              path /testurl3/test.jsp
              digest 640205b7b0fc66c1ea91c463fac6334d
            }
            connect_timeout 3
            nb_get_retry 3
            delay_before_retry 3
        }
    }

    real_server 192.168.200.5 1358 {
        weight 1
        HTTP_GET {
            url {
              path /testurl/test.jsp
              digest 640205b7b0fc66c1ea91c463fac6334d
            }
            url {
              path /testurl2/test.jsp
              digest 640205b7b0fc66c1ea91c463fac6334d
            }
            url {
              path /testurl3/test.jsp
              digest 640205b7b0fc66c1ea91c463fac6334d
            }
            connect_timeout 3
            nb_get_retry 3
            delay_before_retry 3
        }
    }
}

4.3、master主负载均衡服务器配置

[root@master ~]# vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived

global_defs {
   router_id LVS_01
}

## keepalived 会定时执行脚本并对脚本执行的结果进行分析,动态调整 vrrp_instance 的优先级。
#  如果脚本执行结果为 0,并且 weight 配置的值大于 0,则优先级相应的增加。
#  如果脚本执行结果非 0,并且 weight配置的值小于 0,则优先级相应的减少。
#  其他情况,维持原本配置的优先级,即配置文件中 priority 对应的值。
vrrp_script chk_nginx {
    script "/etc/keepalived/nginx_check.sh"      # 检测 nginx 状态的脚本路径
    interval 2             # 脚本执行间隔,每2s检测一次
    weight -5           # 脚本结果导致的优先级变更,检测失败(脚本返回非0)则优先级 -5
    fall 2                    # 检测连续2次失败才算确定是真失败。会用weight减少优先级(1-255之间)
    rise 1                   # 检测1次成功就算成功。但不修改优先级
}

vrrp_instance VI_1 {
    state MASTER
    interface eth1
    virtual_router_id 51
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }

    ## 将 track_script 块加入 instance 配置块
    track_script {     # 执行监控的服务。注意这个设置不能紧挨着写在vrrp_script配置块的后面(实验中碰过的坑),否则nginx监控失效!!
        chk_nginx     # 引用VRRP脚本,即在 vrrp_script 部分指定的名字。定期运行它们来改变优先级,并最终引发主备切换。
    }

    virtual_ipaddress {
        192.168.1.103
    }
}
... ...
[root@master ~]#

4.4、backup备负载均衡服务器配置

[root@slave ~]# vim /etc/keepalived/keepalived.conf
! Configuration File for keepalived

global_defs {
   router_id LVS_02
}

vrrp_script chk_nginx {
    script "/etc/keepalived/nginx_check.sh"
    interval 3
    weight -20
}

vrrp_instance VI_1 {
    state SLAVE
    interface eth1
    virtual_router_id 51
    priority 90
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1111
    }

    # 将 track_script 块加入 instance 配置块
    track_script {
        chk_nginx  # 执行 Nginx 监控的服务
    }

    virtual_ipaddress {
        192.168.1.103
    }
}
... ...
[root@slave ~]#

5、测试

5.1、编写 nginx 监测脚本

在所有的节点上面编写Nginx状态检测脚本/etc/keepalived/nginx_check.sh(已在 keepalived.conf 中配置) 脚本要求:如果 nginx 停止运行,尝试启动,如果无法启动则杀死本机的 keepalived 进程, keepalied将虚拟 ip 绑定到 BACKUP 机器上。 内容如下:

[root@master ~]# vim /etc/keepalived/nginx_check.sh
#!/bin/bash
set -x

nginx_status=`ps -C nginx --no-header |wc -l`
if [ ${nginx_status} -eq 0 ];then
    service nginx start
    sleep 1

    if [ `ps -C nginx --no-header |wc -l` -eq 0 ];then    #nginx重启失败
        echo -e "$(date):  nginx is not healthy, try to killall keepalived!"  >> /etc/keepalived/keepalived.log
        killall keepalived
    fi
fi
echo $?
[root@master ~]# chmod +x /etc/keepalived/nginx_check.sh
[root@master ~]# ll /etc/keepalived/nginx_check.sh
-rwxr-xr-x 1 root root 338 2019-02-15 14:11 /etc/keepalived/nginx_check.sh

5.2、启动所有节点上的nginx和keepalived

  • 启动nginx
service nginx start
  • 启动keepalived 相关操作命令如下:
chkconfig keepalived on    # keepalived服务开机启动
service keepalived start   # 启动服务
service keepalived stop    # 停止服务
service keepalived restart # 重启服务

keepalived正常运行后,会启动3个进程,其中一个是父进程,负责监控其子进程。一个是vrrp子进程,另外一个是checkers子进程。

[root@master ~]# ps -ef | grep keepalived
root       3653      1  0 14:18 ?        00:00:00 /usr/sbin/keepalived -D
root       3654   3653  0 14:18 ?        00:00:02 /usr/sbin/keepalived -D
root       3655   3653  0 14:18 ?        00:00:03 /usr/sbin/keepalived -D
root       7481   3655  0 15:19 ?        00:00:00 /usr/sbin/keepalived -D
root       7483   1323  0 15:19 pts/0    00:00:00 grep --color=auto keepalived
[root@master ~]#

5.3、master主负载均衡服务器IP信息:192.168.1.101

[root@master ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether b0:51:8e:01:9b:b0 brd ff:ff:ff:ff:ff:ff		 
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:20:ae:75 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.101/24 brd 192.168.1.255 scope global eth1
    inet 192.168.1.103/32 scope global eth1
    inet6 fe80::20c:29ff:fe20:ae75/64 scope link
       valid_lft forever preferred_lft forever
[root@master ~]#

5.4、backup备负载均衡服务器查看IP信息:192.168.1.102

[root@slave ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether b0:51:8e:01:9b:b0 brd ff:ff:ff:ff:ff:ff		 
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:7d:6a:24 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.102/24 brd 192.168.1.255 scope global eth1
    inet6 fe80::20c:29ff:fe7d:6a24/64 scope link
       valid_lft forever preferred_lft forever
[root@slave ~]#

以上可以看到,虚拟ip(VIP)生效是在192.168.1.101服务器上。

5.5、测试

通过VIP(192.168.1.103)来访问nginx,结果如下:

以上可知,现在生效的nginx代理机器是1.101;我们停掉机器1.101上面的keepalived

[root@master ~]# service keepalived stop
停止 keepalived:                                          [确定]

再使用VIP(192.168.1.103)访问nginx服务,结果如下: 以上可知,现在生效的nginx代理机器是1.102;我们重启机器1.101上面的keepalived

[root@master ~]# service keepalived start
正在启动 keepalived:                                      [确定]

再使用VIP(192.168.1.103)访问nginx服务,结果如下:

停止 nginx ;查看 nginx 监测脚本是否有效:

[root@master ~]# service nginx status
nginx (pid  19617) 正在运行...
[root@master ~]# service nginx stop
停止 nginx:                                               [确定]
[root@master ~]# service nginx status
nginx (pid  23595) 正在运行...
[root@master ~]#

至此,Keepalived + Nginx 实现高可用 Web 负载均衡搭建完毕!

5.6、keepalived服务监测脚本

由于keepalived服务也可能停止,可以写一个keepalived服务检测脚本并添加到定时任务里; 所有服务器都要操作;

[root@master ~]# vim /opt/scripts/keepalived_monitor.sh
#!/bin/bash
set -x

keepalived_status=`ps -C keepalived --no-header |wc -l`
if [ ${keepalived_status} -eq 0 ];then
    echo -e "$(date): keepalived is not healthy!\n"  >> /etc/keepalived/keepalived.log
		service keepalived start
    sleep 1

    if [ `ps -C keepalived --no-header |wc -l` -eq 0 ];then    #nginx重启失败
        echo -e "$(date): try to restart keepalived failure!\n"  >> /etc/keepalived/keepalived.log
    fi
fi
echo $?
[root@master ~]# chmod +x /opt/scripts/keepalived_monitor.sh
[root@master ~]# echo "* * * * *  /opt/scripts/keepalived_monitor.sh  > /dev/null 2>&1" >> /var/spool/cron/root    # 
[root@master ~]# crontab -l
*/30 * * * * /usr/sbin/ntpdate ntp1.aliyun.com > /dev/null 2>&1;/sbin/hwclock -w
* * * * *  /opt/scripts/keepalived_monitor.sh  > /dev/null 2>&1
[root@master ~]#

5.7 keepalived日志

默认日志存放在系统日志:/var/log/messages下,如果无法主备切换,可以查看日志分析;

tail -100f /var/log/messages

6、报错总结

6.1、同一个网段内所有服务器virtual_router_id设置相同的后果

$ tail -1000f /var/log/messages |grep VRRP
Jun 12 17:06:31 gj-dev-192-168-145-112 Keepalived_vrrp[14755]: bogus VRRP packet received on eth0 !!!
Jun 12 17:06:31 gj-dev-192-168-145-112 Keepalived_vrrp[14755]: VRRP_Instance(VIP_W_G2) ignoring received advertisment...
Jun 12 17:06:32 gj-dev-192-168-145-112 Keepalived_vrrp[14755]: bogus VRRP packet received on eth0 !!!
Jun 12 17:06:32 gj-dev-192-168-145-112 Keepalived_vrrp[14755]: VRRP_Instance(VIP_W_G2) ignoring received advertisment...
Jun 12 17:06:33 gj-dev-192-168-145-112 Keepalived_vrrp[14755]: bogus VRRP packet received on eth0 !!!
Jun 12 17:06:33 gj-dev-192-168-145-112 Keepalived_vrrp[14755]: VRRP_Instance(VIP_W_G2) ignoring received advertisment...

参考

https://www.cnblogs.com/kevingrace/p/6138185.html nginx官方文档 keepalived官方文档

END