首先:
使用nginx做为负载均衡器时,通讯模型类似于LVS-NAT,在某些情况下,随着集群节点数量的增长,nginx将会成为网络通讯的瓶颈,因为所有应 答数据包都必须通过nginx,一颗400MHz的处理器能够容纳100Mbps的连接,因此,在一般情况下,网络更可能比LVS Director更可能成为瓶颈。在这种情况下,使用LVS-DR比使用nginx做负载均衡器上更可靠一些。
使用nginx+keepalived的可行性:
Keepalived是Linux下面实现VRRP 备份路由的高可靠性运行件。基于Keepalived设计的服务模式能够真正做到主服务器和备份服务器故障时IP瞬间无缝交接。在新浪动态应用平台 上,Keepalived配合LVS在线上服务中有着很好的稳定性。
Nginx是基于Linux 2.6内核中epoll模型http服务器,与Apache进程派生模式不同的是Nginx进程基于于Master+Slave多进程模型,自身具有非常 稳定的子进程管理功能。在Master进程分配模式下,Master进程永远不进行业务处理,只是进行任务分发,从而达到Master进程的存活高可靠 性,Slave进程所有的业务信号都由主进程发出,Slave进程所有的超时任务都会被Master中止,属于非阻塞式任务模型。在新浪博客应用平台上, 经过将近8个月的运行,没有因为主进程退出或者子进程僵死导致服务中致的故障存在。
在生产环境中,任何的机器宕机导致的损失都需要被 降到最低,传统的生产环境中,都是将服务器直接放置在4/7层交换机后面以避免因为服务器或者服务器软件故障导致的服务中止。当前的业务模式下,有许多高 并发的服务需求,Js小文件、高速动态接口、Nginx七层业务,都希望所有的Socket操作能够尽快完成,减少用户的时间等待。4/7层交换机由于负 责了新浪全站多个产品的服务,经常会成为高并发服务应用的一个制约条件。于是,就孕育出了使用Keepalived+Nginx实现双机交叉热备使用公网 ip进行DNS轮询服务的想法,这个方案可以运用于需要高并发服务的所有应用环境。越少的Socket通讯层,数据到达用户桌面的速度越快。
1、服务器IP存活检测:
服务器IP存活检测是由Keepalived自己本身完成的,将2台服务器配置成Keepalived互为主辅关系,任意一方机器故障对方都能够将IP接管过去。
2、服务器应用服务存活检测:
一个正常的业务服务,除了保证服务器的状态存活之外,还需要应用业务的存活。之前之所以有Apache服务器因为进程僵死导致HTTP不响应从而影响服 务是因为Apache的进程模式导致的。在Nginx的进程模型下,可以认为只要Nginx进程存活状态,服务就是正常的,于是只需要做到检测进程存活就 能够做到检测服务的存活。Slave进程的健康状态由Nginx自身的Master进程去完成,Master进程的存活可以通过服务器上的专用脚本进行监 测,一旦发现Nginx Master进程异常退出,则立即重新启动Nginx进程,该方案已经在新浪博客系统上运行近半年。
3、服务器在线维护:
Keepalived的服务IP通过其配置文件进行管理,依靠其自身的进程去确定服务器的存活状态,如果在需要对服务器进程在线维护的情况下,只需要停掉被维护机器的Keepalived服务进程,另外一台服务器就能够接管该台服务器的所有应用。
上面的可行性的文章转自其他blog,以下是根据上面的方案做的配置笔记,另外,我还没有搞明白keepalived如何防止脑裂,因此,现在,个人觉得,用heartbeat做双机的热备更可靠一些,文章的后面有使用heartbeat做双机热备的配置。
方案 一 使用keepalived做nginx负载均衡器的双机热备
Keepalived为LVS群集提供强劲的健康检查机制。它实现了一个多层L3、L4、L5/7容错健康检查框架,当有Server Pool宕机后通过socket通知***内核***将 其从Server Pools中剔除,进一步提高Linux Virtual Server project项目的High Availability。同时提供了独立的VRRPv2栈来及时处理 director failover ,及时为LVS集群节点健康检查及LVS directors failover。
在这里我们只使用keepalived的vrrp的功能,使主服务器和备份服务器故障时IP瞬间无缝交接。
1 安装
./configure --prefix=/usr/local/keepalived
make
make install2 配置文件
vi /usr/local/keepalived/etc/keepalived/keepalived.confMaster的配置文件
vrrp_instance VI_INET1 {
state MASTER #(主机为MASTER,备用机为BACKUP)
interface eth0 #(HA监测网络接口)
mcast_src_ip 192.168.7.191 #(VRRP Multicast广播源地址,分别取主、备机地址,不能取与virtual_ipaddress相同)
track_interface { #其他要监测状态的接口
eth1
}
virtual_router_id 53 #(主、备机的virtual_router_id必须相同)
priority 200 #(主、备机取不同的优先级,主机值较大,备份机值较小,值越大优先级越高)
advert_int 5 #(VRRP Multicast广播周期秒数)
authentication {
auth_type pass #(VRRP认证方式)
auth_pass yourpass #(VRRP口令字)
}
virtual_ipaddress {
192.168.7.100 #(VRRP HA虚拟地址)
}
}
Slave的配置文件vrrp_instance VI_INET1 {
state BACKUP
interface eth0
track_interface {
eth1
}
virtual_router_id 53
priority 100
advert_int 5
authentication {
auth_type pass
auth_pass yourpass
}
virtual_ipaddress {
192.168.7.100
}
}
track_interface的意思是将Linux中你想监控的网络接口卡监控起来,当其中的一块出现故障是keepalived都将视为路由器出现故障。启动
在启动前先查看IP地址。注:不能使用ifconfig查看
[root@real1 ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:0c:29:c8:9b:3c brd ff:ff:ff:ff:ff:ff
inet 192.168.7.191/24 brd 192.168.7.255 scope global eth0/usr/local/keepalived/sbin/keepalived -D -f /usr/local/keepalived/etc/keepalived/keepalived.conf
启动后
[root@real1 ~]# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:0c:29:c8:9b:3c brd ff:ff:ff:ff:ff:ff
inet 192.168.7.191/24 brd 192.168.7.255 scope global eth0
inet 192.168.7.100/32 scope global eth0当Master失效时,Backup就会通过MultiCast地址:224.0.0.18(vrrp的默认地址)这个组播地址,获得这个消息,并将192.168.7.100这个地址接管过来。
别忘记在iptables配置当中增加:
-I INPUT -s <主/备服务器ip> -d 224.0.0.18 -j ACCEPT
老外的HA配置:
Using keepalived to failover routers
vrrpd is a router failover demon protocol. While keepalived uses it to failover LVS, vrrpd can be used independantly of LVS
to failover a pair of routers.
Graeme Fowler graeme (at) graemef (dot) net 11 Sep 2007
config for the ACTIVE router looks like:
# keepalived.conf for HA "routers"
global_defs {
notification_email {
recipient@mail.domain
}
notification_email_from root@fqdn.of.machine
smtp_server 1.2.3.4
smtp_connect_timeout 60
router_id router_1
}
vrrp_script check_running {
script "/usr/local/bin/check_running"
interval 10
weight 10
}
vrrp_script always_succeed {
script "/bin/date"
interval 10
weight 10
}
vrrp_script always_fail {
script "/usr/local/bin/always_fail"
interval 10
weight 10
}
vrrp_instance ROUTER_1 {
state MASTER
smtp_alert
interface eth0
virtual_router_id 101
priority 100
advert_int 3
authentication {
auth_type PASS
auth_pass whatever
}
virtual_ipaddress {
1.1.1.1
}
track_script {
check_running weight 20
}
}
...the corresponding config for the BACKUP looks like:
# keepalived.conf for HA "routers"
global_defs {
notification_email {
recipient@mail.domain
}
notification_email_from root@fqdn.of.machine
smtp_server 1.2.3.4
smtp_connect_timeout 60
router_id router_2
}
vrrp_script check_running {
script "/usr/local/bin/check_running"
interval 10
weight 10
}
vrrp_script always_succeed {
script "/bin/date"
interval 10
weight 10
}
vrrp_script always_fail {
script "/usr/local/bin/always_fail"
interval 10
weight 10
}
vrrp_instance ROUTER_1 {
state BACKUP
smtp_alert
interface eth0
virtual_router_id 101
priority 90
advert_int 3
authentication {
auth_type PASS
auth_pass whatever
}
virtual_ipaddress {
1.1.1.1
}
track_script {
check_running weight 20
}
}
i.e. it differs in the "weight" stanza for the VRRP definition (90 instead of 100) and there are cosmetic differences to the
name.
The "check_running" script is simply a wrapper round:
KILLALL -0 procname
if the result code ($?) is 0, it exits with 0. If not, it exits with 1.
If it exits with 1, the weight of the VRRP announcement is pulled down by 20 - this makes sure that the critical process on
this machine is up, and if it isn't then we play a smaller part in the VRRP adverts (these are derived from a pair of
frontend mail servers).
关于检查nginx状态的脚本,使用写好的启动脚本,运行时判断状态是否running就可以了,crontab,定时运行。
nginx启动脚本,放于/etc/init.d/nginxd
#!/bin/bash
# nginx Startup script for the Nginx HTTP Server
# this script create it by jackbillow at 2007.10.15.
# it is v.0.0.2 version.
# if you find any errors on this scripts,please contact jackbillow.
# and send mail to jackbillow at gmail dot com.
#
# chkconfig: - 85 15
# description: Nginx is a high-performance web and proxy server.
# It has a lot of features, but it's not for everyone.
# processname: nginx
# pidfile: /usr/local/nginx/logs/nginx.pid
# config: /usr/local/nginx/conf/nginx.conf
nginxd=/usr/local/nginx/sbin/nginx
nginx_config=/usr/local/nginx/conf/nginx.conf
nginx_pid=/var/run/nginx.pid
RETVAL=0
prog="nginx"
# Source function library.
. /etc/rc.d/init.d/functions
# Source networking configuration.
. /etc/sysconfig/network
# Check that networking is up.
[ ${NETWORKING} = "no" ] && exit 0
[ -x $nginxd ] || exit 0
# Start nginx daemons functions.
start() {
if [ -e $nginx_pid ];then
echo "nginx already running...."
exit 1
fi
echo -n $"Starting $prog: "
daemon $nginxd -c ${nginx_config}
RETVAL=$?
echo
[ $RETVAL = 0 ] && touch /var/lock/subsys/nginx
return $RETVAL
}
# Stop nginx daemons functions.
stop() {
echo -n $"Stopping $prog: "
killproc $nginxd
RETVAL=$?
echo
[ $RETVAL = 0 ] && rm -f /var/lock/subsys/nginx /var/run/nginx.pid
}
# reload nginx service functions.
reload() {
echo -n $"Reloading $prog: "
#kill -HUP `cat ${nginx_pid}`
killproc $nginxd -HUP
RETVAL=$?
echo
}
# See how we were called.
case "$1" in
start)
start
;;
stop)
stop
;;
reload)
reload
;;
restart)
stop
start
;;
status)
status $prog
RETVAL=$?
;;
*)
echo $"Usage: $prog {start|stop|restart|reload|status|help}"
exit 1
esac
exit $RETVAL