参考资料
<<从PAXOS到ZOOKEEPER分布式一致性原理与实践>>
zookeeper-3.0.0
Zookeeper概述
Zookeeper是一个分布式的,开放源码的分布式应用程序协调服务。致力于提供一个高性能、高可用,具有严格的顺序访问控制能力(写操作严格顺序)的分布式协调服务。
Zookeeper集群启动
集群启动方法与配置文件
查看目录bin下的zkServer.sh内容;
ZOOBIN=`readlink -f "$0"`
ZOOBINDIR=`dirname "$ZOOBIN"`
. $ZOOBINDIR/zkEnv.sh # 设置运行的环境变量
case $1 in
start)
echo -n "Starting zookeeper ... "
java "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \
-cp $CLASSPATH $JVMFLAGS org.apache.zookeeper.server.quorum.QuorumPeerMain $ZOOCFG & # 启动 zookeeper 启动类为 QuorumPeerMain
echo STARTED
;;
stop)
echo -n "Stopping zookeeper ... "
echo kill | nc localhost $(grep clientPort $ZOOCFG | sed -e 's/.*=//') # 杀死进程
echo STOPPED
;;
upgrade)
shift
echo "upgrading the servers to 3.*"
java "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}" \
-cp $CLASSPATH $JVMFLAGS org.apache.zookeeper.server.upgrade.UpgradeMain ${@}
echo "Upgrading ... "
;;
restart)
shift
$0 stop ${@}
sleep 3
$0 start ${@} # 重启就是先杀死进程 然后再启动
;;
status)
STAT=`echo stat | nc localhost $(grep clientPort $ZOOCFG | sed -e 's/.*=//') 2> /dev/null| grep Mode`
if [ "x$STAT" = "x" ]
then
echo "Error contacting service. It is probably not running." # 检查状态
else
echo $STAT
fi
;;
*)
echo "Usage: $0 {start|stop|restart|status}" >&2
esac
在中断中输入zkServer.sh start就可以启动Zookeeper集群,启动的配置文件为默认的zoo_sample.cfg,如果是集群启动,需要修改该配置文件,文件中需要加入多台集群的IP信息,并且集群启动的时候的配置文件需要相同。参考配置文件如下;
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
dataDir=/export/crawlspace/mahadev/zookeeper/server1/data
# the port at which the clients will connect
clientPort=2181
#设置集群信息
server.1=192.168.0.1:2888:3888
server.2=192.168.0.2:2888:3888
server.3=192.168.0.3:2888:3888
在启动的参数解析过程中可以依次查看各个参数的用途。
Zookeeper集群启动流程
public class QuorumPeerMain {
private static final Logger LOG = Logger.getLogger(QuorumPeerMain.class);
/**
* To start the replicated server specify the configuration file name on the
* command line.
* @param args command line
*/
public static void main(String[] args) {
if (args.length == 2) {
ZooKeeperServerMain.main(args); // 如果是参数启动则直接启动 默认为单节点启动
return;
}
QuorumPeerConfig.parse(args);
if (!QuorumPeerConfig.isStandalone()) {
runPeer(new QuorumPeer.Factory() { // 继承自QuorumPeer.Factory 并实现了其中的接口方法 create 和 createConnectionFactory
public QuorumPeer create(NIOServerCnxn.Factory cnxnFactory) throws IOException {
QuorumPeer peer = new QuorumPeer(); // 生成实例
peer.setClientPort(ServerConfig.getClientPort()); // 获取实例监听的客户端端口
peer.setTxnFactory(new FileTxnSnapLog(
new File(QuorumPeerConfig.getDataLogDir()),
new File(QuorumPeerConfig.getDataDir())));
peer.setQuorumPeers(QuorumPeerConfig.getServers()); // 设置Servers配置信息
peer.setElectionType(QuorumPeerConfig.getElectionAlg()); // 设置选举类型
peer.setMyid(QuorumPeerConfig.getServerId()); // 设置Serverid
peer.setTickTime(QuorumPeerConfig.getTickTime());
peer.setInitLimit(QuorumPeerConfig.getInitLimit());
peer.setSyncLimit(QuorumPeerConfig.getSyncLimit());
peer.setCnxnFactory(cnxnFactory); // 设置网络客户端请求处理的框架
return peer;
}
public NIOServerCnxn.Factory createConnectionFactory() throws IOException {
return new NIOServerCnxn.Factory(getClientPort()); // 找到IO复用的工厂方法
}
});
}else{
// there is only server in the quorum -- run as standalone
ZooKeeperServerMain.main(args);
}
}
public static void runPeer(QuorumPeer.Factory qpFactory) {
try {
QuorumStats.registerAsConcrete();
QuorumPeer self = qpFactory.create(qpFactory.createConnectionFactory()); // 创建实例
self.start(); // 启动线程执行
self.join(); // 阻塞直到线程退出
} catch (Exception e) {
LOG.fatal("Unexpected exception",e);
}
System.exit(2);
}
}
启动的简单逻辑流程就是,首先判断是否是集群模式启动,如果是集群模式启动,则首先调用QuorumPeerConfig解析配置参数,通过解析参数来判断是否在配置文件中是否是集群模式,如果配置中是集群模式,则调用runPeer方法,该方法主要就是接受一个QuorumPeer.Factory参数,然后调用create方法,然后就调用start方法启动线程并阻塞等待经常结束。
Created with Raphaël 2.2.0 启动 命令行参数长度是否为2 Yes or No? 解析配置文件 解析是否单机启动 集群启动 单节点启动 yes no yes no
配置文件类QuorumPeerConfig
public class QuorumPeerConfig extends ServerConfig { // 继承自ServerConfig 该类实现了一个配置实例单例模式
private static final Logger LOG = Logger.getLogger(QuorumPeerConfig.class);
private int tickTime;
private int initLimit;
private int syncLimit;
private int electionAlg;
private int electionPort;
private HashMap<Long,QuorumServer> servers = null;
private long serverId;
private QuorumPeerConfig(int port, String dataDir, String dataLogDir) { // 调用父类的构造方法
super(port, dataDir, dataLogDir);
}
public static void parse(String[] args) { // 解析配置文件参数
if(instance!=null)
return;
try {
if (args.length != 1) { // 确保输入的唯一参数就是配置文件的文件路径
System.err.println("USAGE: configFile");
System.exit(2);
}
File zooCfgFile = new File(args[0]); // 生成配置文件类型
if (!zooCfgFile.exists()) { // 检查输入的配置文件是否存在
LOG.error(zooCfgFile.toString() + " file is missing");
System.exit(2);
}
Properties cfg = new Properties();
FileInputStream zooCfgStream = new FileInputStream(zooCfgFile); // 读文件
try {
cfg.load(zooCfgStream);
} finally {
zooCfgStream.close();
}
HashMap<Long,QuorumServer> servers = new HashMap<Long,QuorumServer>(); // 保存集群机器的信息
String dataDir = null;
String dataLogDir = null;
int clientPort = 0;
int tickTime = 0;
int initLimit = 0;
int syncLimit = 0;
int electionAlg = 3;
int electionPort = 2182;
for (Entry<Object, Object> entry : cfg.entrySet()) { // 获取解析的文件的配置参数
String key = entry.getKey().toString(); // 转为string类型
String value = entry.getValue().toString();
if (key.equals("dataDir")) { // 文件目录
dataDir = value;
} else if (key.equals("dataLogDir")) { // 日志目录
dataLogDir = value;
} else if (key.equals("clientPort")) { // 客户端连接端口
clientPort = Integer.parseInt(value);
} else if (key.equals("tickTime")) { // 基本时间间隔
tickTime = Integer.parseInt(value);
} else if (key.equals("initLimit")) { // 配置多少个心跳间隔
initLimit = Integer.parseInt(value);
} else if (key.equals("syncLimit")) { // 表示主从之间最长不能超过多少个基本时间间隔
syncLimit = Integer.parseInt(value);
} else if (key.equals("electionAlg")) { // 选举类型 有几个选举的策略可供选择
electionAlg = Integer.parseInt(value);
} else if (key.startsWith("server.")) { // 解析配置集群IP端口信息
int dot = key.indexOf('.');
long sid = Long.parseLong(key.substring(dot + 1)); // 获取server配置的第一个id
String parts[] = value.split(":"); // 获取 ip port
if ((parts.length != 2) &&
(parts.length != 3)){
LOG.error(value
+ " does not have the form host:port or host:port:port");
}
InetSocketAddress addr = new InetSocketAddress(parts[0],
Integer.parseInt(parts[1])); // 配置IP Port
if(parts.length == 2)
servers.put(Long.valueOf(sid), new QuorumServer(sid, addr));
else if(parts.length == 3){
InetSocketAddress electionAddr = new InetSocketAddress(parts[0],
Integer.parseInt(parts[2])); // 通信接口监听
servers.put(Long.valueOf(sid), new QuorumServer(sid, addr, electionAddr)); // 压入server
}
} else {
System.setProperty("zookeeper." + key, value); // 其他属性直接设置到类上
}
}
if (dataDir == null) { // 检查参数 是否为空 如果为空 则报错
LOG.error("dataDir is not set");
System.exit(2);
}
if (dataLogDir == null) {
dataLogDir = dataDir;
} else {
if (!new File(dataLogDir).isDirectory()) {
LOG.error("dataLogDir " + dataLogDir+ " is missing.");
System.exit(2);
}
}
if (clientPort == 0) {
LOG.error("clientPort is not set");
System.exit(2);
}
if (tickTime == 0) {
LOG.error("tickTime is not set");
System.exit(2);
}
if (servers.size() > 1 && initLimit == 0) {
LOG.error("initLimit is not set");
System.exit(2);
}
if (servers.size() > 1 && syncLimit == 0) {
LOG.error("syncLimit is not set");
System.exit(2);
}
QuorumPeerConfig conf = new QuorumPeerConfig(clientPort, dataDir,
dataLogDir); // 生成一个实例 并设置参数
conf.tickTime = tickTime;
conf.initLimit = initLimit;
conf.syncLimit = syncLimit;
conf.electionAlg = electionAlg;
conf.servers = servers;
if (servers.size() > 1) { // 如果是多个server
/*
* If using FLE, then every server requires a separate election port.
*/
if(electionAlg != 0){
for(QuorumServer s : servers.values()){
if(s.electionAddr == null)
LOG.error("Missing election port for server: " + s.id);
}
}
File myIdFile = new File(dataDir, "myid"); // 检查myid文件是否存在 该文件包含该实例的id信息
if (!myIdFile.exists()) {
LOG.error(myIdFile.toString() + " file is missing");
System.exit(2);
}
BufferedReader br = new BufferedReader(new FileReader(myIdFile)); // 获取server id
String myIdString;
try {
myIdString = br.readLine();
} finally {
br.close();
}
try {
conf.serverId = Long.parseLong(myIdString);
} catch (NumberFormatException e) {
LOG.error(myIdString + " is not a number");
System.exit(2);
}
}
instance=conf; // 将解析好的数据设置到instance 中, 后续的实例信息都是从该实例获取
} catch (Exception e) {
LOG.error("FIXMSG",e);
System.exit(2);
}
}
...
}
继承的父类就是ServerConfig,主要查看该类的parse方法。
public static int getClientPort(){
assert instance!=null;
return instance.clientPort;
}
public static String getDataDir(){
assert instance!=null;
return instance.dataDir;
}
public static String getDataLogDir(){
assert instance!=null;
return instance.dataLogDir;
}
public static boolean isStandalone(){
assert instance!=null;
return instance.isStandaloneServer();
}
protected static ServerConfig instance=null;
public static void parse(String[] args) { // 解析的时候生成单例
if(instance!=null)
return;
if (args.length != 2) { // 如果输入参数长度不为2
System.err.println("USAGE: ZooKeeperServer port datadir\n");
System.exit(2);
}
try {
instance=new ServerConfig(Integer.parseInt(args[0]),args[1],args[1]);
} catch (NumberFormatException e) {
System.err.println(args[0] + " is not a valid port number");
System.exit(2);
}
}
QuorumPeer执行流程
由于QuorumPeer类继承自Thread,所以调用start方法时,最终会调用QuorumPeer的start方法,然后该方法会执行run函数启动线程执行。
@Override
public synchronized void start() {
startLeaderElection(); // 启动选举流程
super.start(); // 调用Thread的start方法,即最终会调用该类的run方法
}
此时就调用了startLeaderElection方法来启动集群的选举。
synchronized public void startLeaderElection() {
currentVote = new Vote(myid, getLastLoggedZxid()); // 获取最后的zxid 并首先投一票给自己
for (QuorumServer p : quorumPeers.values()) { // 获取当前自己的id
if (p.id == myid) {
myQuorumAddr = p.addr; // 获取当前的地址
break;
}
}
if (myQuorumAddr == null) { // 如果没找到则报错
throw new RuntimeException("My id " + myid + " not in the peer list");
}
if (electionType == 0) { // 如果选择策略为0
try {
udpSocket = new DatagramSocket(myQuorumAddr.getPort()); // 获取端口 使用UDP进行选举
responder = new ResponderThread(); // 开启线程 执行
responder.start();
} catch (SocketException e) {
throw new RuntimeException(e);
}
}
this.electionAlg = createElectionAlgorithm(electionType); // 获取当前的选举算法
}
此时开始选举的使用了UDP来进行选举。
class ResponderThread extends Thread {
ResponderThread() {
super("ResponderThread");
}
volatile boolean running = true;
@Override
public void run() {
try {
byte b[] = new byte[36];
ByteBuffer responseBuffer = ByteBuffer.wrap(b);
DatagramPacket packet = new DatagramPacket(b, b.length);
while (running) {
udpSocket.receive(packet); // 接受数据包
if (packet.getLength() != 4) {
LOG.warn("Got more than just an xid! Len = "
+ packet.getLength());
} else {
responseBuffer.clear();
responseBuffer.getInt(); // Skip the xid // 跳过 xid
responseBuffer.putLong(myid);
Vote current = getCurrentVote(); // 获取当前选票
switch (getPeerState()) {
case LOOKING: // 如果是竞选状态
responseBuffer.putLong(current.id); // 压入id 和 zxid
responseBuffer.putLong(current.zxid);
break;
case LEADING:
responseBuffer.putLong(myid); // 如果是主 则返回当前主的服务器id
try {
responseBuffer.putLong(leader.lastProposed); // 压入主 最后一次提交的事物
} catch (NullPointerException npe) {
// This can happen in state transitions,
// just ignore the request
}
break;
case FOLLOWING:
responseBuffer.putLong(current.id); // 压入当前的id
try {
responseBuffer.putLong(follower.getZxid()); // 压入 zxid
} catch (NullPointerException npe) {
// This can happen in state transitions,
// just ignore the request
}
}
packet.setData(b);
udpSocket.send(packet); // 将数据发送出去
}
packet.setLength(b.length);
}
} catch (Exception e) {
LOG.warn("Unexpected exception",e);
} finally {
LOG.warn("QuorumPeer responder thread exited");
}
}
}
根据当前的角色进行不同的操作,选举过程中会传输当前的id和事物id来进行数据的统一,有关选举的详细内容后文再详细分析。
开始执行
@Override
public void run() {
setName("QuorumPeer:" + cnxnFactory.getLocalAddress()); // 设置当前的名称 该名称以监听客户端的端口结尾
/*
* Main loop
*/
while (running) {
switch (getPeerState()) { // 获取当前的状态
case LOOKING:
try {
LOG.info("LOOKING");
setCurrentVote(makeLEStrategy().lookForLeader()); // 设置投票并选择leader
} catch (Exception e) {
LOG.warn("Unexpected exception",e); // 如果出错则设置为LOOKING状态
setPeerState(ServerState.LOOKING);
}
break;
case FOLLOWING:
try {
LOG.info("FOLLOWING");
setFollower(makeFollower(logFactory)); // 如果是FOLLOWING状态则转换成follower 跟随主
follower.followLeader();
} catch (Exception e) {
LOG.warn("Unexpected exception",e);
} finally {
follower.shutdown();
setFollower(null);
setPeerState(ServerState.LOOKING);
}
break;
case LEADING:
LOG.info("LEADING");
try {
setLeader(makeLeader(logFactory)); // 设置成主状态
leader.lead(); // 接听所有事件请求
setLeader(null); // 如果失去当前主 则将主设置为空
} catch (Exception e) {
LOG.warn("Unexpected exception",e);
} finally {
if (leader != null) { // 设置为空并重置状态
leader.shutdown("Forcing shutdown");
setLeader(null);
}
setPeerState(ServerState.LOOKING);
}
break;
}
}
LOG.warn("QuorumPeer main thread exited");
}
根据状态来执行不同的操作,如果是主则接受从的连接,并处理从发送上来的信息。从则会转发创建信息等到主节点进行处理。后续会详细的描述整个过程。
客户端服务器启动
在创建的过程中也需要创建给客户端连接请求的服务端口,创建过程就是初始化过程中执行;
new NIOServerCnxn.Factory(getClientPort())
该方法如下;
public Factory(int port) throws IOException {
super("NIOServerCxn.Factory:" + port); // 获取服务端连接端口
setDaemon(true);
this.ss = ServerSocketChannel.open(); // 打开连接
ss.socket().bind(new InetSocketAddress(port)); // 监听端口
ss.configureBlocking(false); // 设置成非阻塞
ss.register(selector, SelectionKey.OP_ACCEPT); // 设置该描述符为接受请求
start(); // 开始执行
}
应该该类也是继承自Thread,调用start就是执行了该类重写的run方法。
public void run() {
while (!ss.socket().isClosed()) { // 检查连接是否关闭
try {
selector.select(1000); // IO复用
Set<SelectionKey> selected;
synchronized (this) {
selected = selector.selectedKeys(); // 加锁 获取 当前的触发事件描述符
}
ArrayList<SelectionKey> selectedList = new ArrayList<SelectionKey>(
selected);
Collections.shuffle(selectedList);
for (SelectionKey k : selectedList) { // 遍历 该列表
if ((k.readyOps() & SelectionKey.OP_ACCEPT) != 0) { // 如果是新的请求进来
SocketChannel sc = ((ServerSocketChannel) k
.channel()).accept(); // 接受新连接
sc.configureBlocking(false); // 设置非阻塞
SelectionKey sk = sc.register(selector,
SelectionKey.OP_READ); // 注册读事件
NIOServerCnxn cnxn = createConnection(sc, sk); // 初始化一个NIOServerCnxn类
sk.attach(cnxn); // 添加到列表中
addCnxn(cnxn);
} else if ((k.readyOps() & (SelectionKey.OP_READ | SelectionKey.OP_WRITE)) != 0) { // 如果是读事件或者写事件则获取触发内容
NIOServerCnxn c = (NIOServerCnxn) k.attachment();
c.doIO(k); // 回调执行处理该事件
}
}
selected.clear(); // 清空
} catch (Exception e) {
LOG.error("FIXMSG",e); // 如果报错则打印错误日志
}
}
ZooTrace.logTraceMessage(LOG, ZooTrace.getTextTraceLevel(),
"NIOServerCnxn factory exitedloop.");
clear();
LOG.error("=====> Goodbye cruel world <======");
// System.exit(0);
}
通过查看run函数的执行流程可知,该函数处理过程是一个典型的IO复用的处理过程,客户端新入的请求都是通过该服务来进行处理的,后续会详细分析该处理的详细流程。
总结
本文主要是简单的概述了一下Zookeeper集群模式的启动流程,很粗略的描述了启动过程中执行的主要内容,首先会开启一个线程接受客户端请求的处理,然后打开一个选举端口进行选举,接着就会打开一个集群之间数据处理同步的端口,至此三个端口都提供了不同的服务,完成了主要的Zookeeper集群的启动。由于本人才疏学浅,如有错误请批评指正。