现在看看zk服务端的启动过程,逻辑比较长,但不是很复杂,待会也能看到zk的代码在开发中还是值得借鉴的。所有的程序入口点都在main(),就从这里开始看起,go!
首先对QuorumPeerMain类进行实例化,然后开始运行。

public static void main(String[] args) {
        QuorumPeerMain main = new QuorumPeerMain();
        try {
            main.initializeAndRun(args);
        } catch (IllegalArgumentException e) {
            LOG.error("Invalid arguments, exiting abnormally", e);
            LOG.info(USAGE);
            System.err.println(USAGE);
            System.exit(2);
        } catch (ConfigException e) {
            LOG.error("Invalid config, exiting abnormally", e);
            System.err.println("Invalid config, exiting abnormally");
            System.exit(2);
        } catch (Exception e) {
            LOG.error("Unexpected exception, exiting abnormally", e);
            System.exit(1);
        }
        LOG.info("Exiting normally");
        System.exit(0);
    }

初始化

还记得启动zk时传了配置文件路径作为参数吗,这里就是对参数的解析,解析完成后判断是否为集群,刚开始先从单机看起,所有走的是ZooKeeperServerMain.main(args);

protected void initializeAndRun(String[] args)
        throws ConfigException, IOException
    {
        QuorumPeerConfig config = new QuorumPeerConfig();
        if (args.length == 1) {
            config.parse(args[0]);
        }

        // Start and schedule the the purge task
        DatadirCleanupManager purgeMgr = new DatadirCleanupManager(config
                .getDataDir(), config.getDataLogDir(), config
                .getSnapRetainCount(), config.getPurgeInterval());
        purgeMgr.start();

        if (args.length == 1 && config.servers.size() > 0) {
            runFromConfig(config);
        } else {
            LOG.warn("Either no config or no quorum defined in config, running "
                    + " in standalone mode");
            // there is only server in the quorum -- run as standalone
            ZooKeeperServerMain.main(args);
        }
    }

看看配置文件怎么被解析的吧,平时自己也可以试着写一些配置文件。根据路径生成File对象,将配置文件的配置项读到Properties中,然后zk会对这个Properties进行处理parseProperties(cfg);

public void parse(String path) throws ConfigException {
        File configFile = new File(path);

        LOG.info("Reading configuration from: " + configFile);

        try {
            if (!configFile.exists()) {
                throw new IllegalArgumentException(configFile.toString()
                        + " file is missing");
            }

            Properties cfg = new Properties();
            FileInputStream in = new FileInputStream(configFile);
            try {
                cfg.load(in);
            } finally {
                in.close();
            }

            parseProperties(cfg);
        } catch (IOException e) {
            throw new ConfigException("Error processing " + path, e);
        } catch (IllegalArgumentException e) {
            throw new ConfigException("Error processing " + path, e);
        }
    }

这个方法就很长了,我就截取一部分关键的代码了。首先便利所有的属性,设置集群中每个节点的地址,并判断是否为observer模式,再根据端口号设置监听的IP地址,配置myid。

// parseProperties
    for (Entry<Object, Object> entry : zkProp.entrySet()) {
        // 对所有属性进行遍历,依次获取key和value
        String key = entry.getKey().toString().trim();
        String value = entry.getValue().toString().trim();
        // 将获取到的key或现有的key对比,设置配置项到zk
        if (key.equals("dataDir")) {
            dataDir = value;
        } 
        ...
        // 如果是集群配置的话
        else if (key.startsWith("server.")) {
            // 小数点后面的数字作为sid
            int dot = key.indexOf('.');
            long sid = Long.parseLong(key.substring(dot + 1));
            String parts[] = splitWithLeadingHostname(value);
            if ((parts.length != 2) && (parts.length != 3) && (parts.length !=4)) {
                LOG.error(value
                    + " does not have the form host:port or host:port:port " +
                    " or host:port:port:type");
            }
            // 依次设置ip,数据端口,竞选端口
            LearnerType type = null;
            String hostname = parts[0];
            Integer port = Integer.parseInt(parts[1]);
            Integer electionPort = null;
            if (parts.length > 2){
                electionPort=Integer.parseInt(parts[2]);
            }
            // 判断是否为observer模式
            if (parts.length > 3){
                if (parts[3].toLowerCase().equals("observer")) {
                    type = LearnerType.OBSERVER;
                } else if (parts[3].toLowerCase().equals("participant")) {
                    type = LearnerType.PARTICIPANT;
                } else {
                    throw new ConfigException("Unrecognised peertype: " + value);
                }
            }
            // 保存当前的服务启动模式
            if (type == LearnerType.OBSERVER){
                observers.put(Long.valueOf(sid), new QuorumServer(sid, hostname, port, electionPort, type));
            } else {
                servers.put(Long.valueOf(sid), new QuorumServer(sid, hostname, port, electionPort, type));
            }
        }
        // 根据clientPort设置服务端启动时监听的地址
        if (clientPortAddress != null) {
            this.clientPortAddress = new InetSocketAddress(
                    InetAddress.getByName(clientPortAddress), clientPort);
        } else {
            this.clientPortAddress = new InetSocketAddress(clientPort);
        }
        // 读取服务对应的id,保存下来。
        File myIdFile = new File(dataDir, "myid");
        if (!myIdFile.exists()) {
            throw new IllegalArgumentException(myIdFile.toString()
                    + " file is missing");
        }
        BufferedReader br = new BufferedReader(new FileReader(myIdFile));
        String myIdString;
        try {
            myIdString = br.readLine();
        } finally {
            br.close();
        }
        try {
            serverId = Long.parseLong(myIdString);
            MDC.put("myid", myIdString);
        } catch (NumberFormatException e) {
            throw new IllegalArgumentException("serverid " + myIdString
                    + " is not a number");
        }                
    }

因为咱们不是单机启动,所以进入ZooKeeperServerMain的initializeAndRun()方法,解析配置文件的过程和上面一样,直接看启动的方法runFromConfig(config);

// ZooKeeperServerMain.java
    protected void initializeAndRun(String[] args) throws ConfigException, IOException
    {
        try {
            ManagedUtil.registerLog4jMBeans();
        } catch (JMException e) {
            LOG.warn("Unable to register log4j JMX control", e);
        }

        ServerConfig config = new ServerConfig();
        if (args.length == 1) {
            config.parse(args[0]);
        } else {
            config.parse(args);
        }

        runFromConfig(config);
    }

这是比较关键的方法了,启动的逻辑都在这里。先对ZooKeeperServer类进行实例化,创建快照文件处理对象FileTxnSnapLog,通过nio监听来自客户端的连接,再启动zk服务器。

public void runFromConfig(ServerConfig config) throws IOException {
        LOG.info("Starting server");
        FileTxnSnapLog txnLog = null;
        try {
            // Note that this thread isn't going to be doing anything else,
            // so rather than spawning another thread, we will just call
            // run() in this thread.
            // create a file logger url from the command line args
            // zk服务器的实例话
            final ZooKeeperServer zkServer = new ZooKeeperServer();
            // Registers shutdown handler which will be used to know the
            // server error or shutdown state changes.
            final CountDownLatch shutdownLatch = new CountDownLatch(1);
            zkServer.registerServerShutdownHandler(
                    new ZooKeeperServerShutdownHandler(shutdownLatch));
            // FileTxnSnapLog是用来处理日志和数据的
            txnLog = new FileTxnSnapLog(new File(config.dataLogDir), new File(
                    config.dataDir));
            txnLog.setServerStats(zkServer.serverStats());
            zkServer.setTxnLogFactory(txnLog);
            zkServer.setTickTime(config.tickTime);
            zkServer.setMinSessionTimeout(config.minSessionTimeout);
            zkServer.setMaxSessionTimeout(config.maxSessionTimeout);
            // 创建socket监听客户端连接
            cnxnFactory = ServerCnxnFactory.createFactory();
            // 打开socket
            cnxnFactory.configure(config.getClientPortAddress(),
                    config.getMaxClientCnxns());
            // 启动zk服务
            cnxnFactory.startup(zkServer);
            // Watch status of ZooKeeper server. It will do a graceful shutdown
            // if the server is not running or hits an internal error.
            shutdownLatch.await();
            shutdown();

            cnxnFactory.join();
            if (zkServer.canShutdown()) {
                zkServer.shutdown(true);
            }
        } catch (InterruptedException e) {
            // warn, but generally this is ok
            LOG.warn("Server interrupted", e);
        } finally {
            if (txnLog != null) {
                txnLog.close();
            }
        }
    }
  1. 创建socket句柄,这里能看到ServerCnxnFactory是由NIOServerCnxnFactory实现的。
static public ServerCnxnFactory createFactory() throws IOException {
        String serverCnxnFactoryName =
            System.getProperty(ZOOKEEPER_SERVER_CNXN_FACTORY);
        if (serverCnxnFactoryName == null) {
            serverCnxnFactoryName = NIOServerCnxnFactory.class.getName();
        }
        try {
            ServerCnxnFactory serverCnxnFactory = (ServerCnxnFactory) Class.forName(serverCnxnFactoryName)
                    .getDeclaredConstructor().newInstance();
            LOG.info("Using {} as server connection factory", serverCnxnFactoryName);
            return serverCnxnFactory;
        } catch (Exception e) {
            IOException ioe = new IOException("Couldn't instantiate "
                    + serverCnxnFactoryName);
            ioe.initCause(e);
            throw ioe;
        }
    }
  1. 将当前对象NIOServerCnxnFactory作为线程保存下来,这里就是nio编程框架了,打开ServerSocketChannel,绑定ip地址,设置OP_ACCEPT等待连接事件,监听客户端连接请求。
public void configure(InetSocketAddress addr, int maxcc) throws IOException {
        configureSaslLogin();

        thread = new ZooKeeperThread(this, "NIOServerCxn.Factory:" + addr);
        thread.setDaemon(true);
        maxClientCnxns = maxcc;
        this.ss = ServerSocketChannel.open();
        ss.socket().setReuseAddress(true);
        LOG.info("binding to port " + addr);
        ss.socket().bind(addr);
        ss.configureBlocking(false);
        ss.register(selector, SelectionKey.OP_ACCEPT);
    }

一步一步看,首先启动NIOServerCnxnFactory线程,再从快照文件中加载数据,最后创建请求处理器链。

public void startup(ZooKeeperServer zks) throws IOException,InterruptedException {
        // 启动NIOServerCnxnFactory线程
        start();
        setZooKeeperServer(zks);
        // 从快照文件中加载数据
        zks.startdata();
        // 创建请求处理器链
        zks.startup();
    }

socket数据处理

这个线程主要做的事情就是监听客户端的连接请求,与客户端进行数据交互,核心方法是doIO()。

public void run() {
        while (!ss.socket().isClosed()) {
            try {
                selector.select(1000);
                Set<SelectionKey> selected;
                synchronized (this) {
                    // 获取所有监听的key
                    selected = selector.selectedKeys();
                }
                ArrayList<SelectionKey> selectedList = new ArrayList<SelectionKey>(
                        selected);
                Collections.shuffle(selectedList);
                for (SelectionKey k : selectedList) {
                    // 处理客户端发送过来的连接请求
                    if ((k.readyOps() & SelectionKey.OP_ACCEPT) != 0) {
                        SocketChannel sc = ((ServerSocketChannel) k
                                .channel()).accept();
                        InetAddress ia = sc.socket().getInetAddress();
                        int cnxncount = getClientCnxnCount(ia);
                        // 不能超过最大连接数,默认60
                        if (maxClientCnxns > 0 && cnxncount >= maxClientCnxns){
                            LOG.warn("Too many connections from " + ia
                                     + " - max is " + maxClientCnxns );
                            sc.close();
                        } else {
                            // 创建连接,将socket注册到selector,监听读事件
                            LOG.info("Accepted socket connection from "
                                     + sc.socket().getRemoteSocketAddress());
                            sc.configureBlocking(false);
                            SelectionKey sk = sc.register(selector,
                                    SelectionKey.OP_READ);
                            NIOServerCnxn cnxn = createConnection(sc, sk);
                            // 将创建的NIOServerCnxn对象保存到SelectionKey上面
                            sk.attach(cnxn);
                            // 保存连接
                            addCnxn(cnxn);
                        }
                    } 
                    // 与客户端进行io数据交换
                    else if ((k.readyOps() & (SelectionKey.OP_READ | SelectionKey.OP_WRITE)) != 0) {
                        NIOServerCnxn c = (NIOServerCnxn) k.attachment();
                        c.doIO(k);
                    } else {
                        if (LOG.isDebugEnabled()) {
                            LOG.debug("Unexpected ops in select "
                                      + k.readyOps());
                        }
                    }
                }
                selected.clear();
            } catch (RuntimeException e) {
                LOG.warn("Ignoring unexpected runtime exception", e);
            } catch (Exception e) {
                LOG.warn("Ignoring exception", e);
            }
        }
        closeAll();
        LOG.info("NIOServerCnxn factory exited run method");
    }

这里只看读事件,即客户端向服务端发送数据的socket处理,处理读取的数据是在readPayload()方法中。

void doIO(SelectionKey k) throws InterruptedException {
        try {
            if (isSocketOpen() == false) {
                LOG.warn("trying to do i/o on a null socket for session:0x"
                         + Long.toHexString(sessionId));

                return;
            }
            if (k.isReadable()) {
                // 从socket中读取数据
                int rc = sock.read(incomingBuffer);
                if (rc < 0) {
                    throw new EndOfStreamException(
                            "Unable to read additional data from client sessionid 0x"
                            + Long.toHexString(sessionId)
                            + ", likely client has closed socket");
                }
                if (incomingBuffer.remaining() == 0) {
                    boolean isPayload;
                    if (incomingBuffer == lenBuffer) { // start of next request
                        incomingBuffer.flip();
                        isPayload = readLength(k);
                        incomingBuffer.clear();
                    } else {
                        // continuation
                        isPayload = true;
                    }
                    // 处理读取到的数据
                    if (isPayload) { // not the case for 4letterword
                        readPayload();
                    }
                    else {
                        // four letter words take care
                        // need not do anything else
                        return;
                    }
                }
            
            }
        ......
    }

这里就是初步的处理客户端的数据请求,在processPacket()中对客户端的数据字节流进行反序列化,得到RequestHeader,再根据RequestHeader的类型进行不同的处理。如果是普通的请求,像创建一个节点,就生成Request,设置自身为所有者si.setOwner(ServerCnxn.me);,再提交请求,由firstProcessor负责处理firstProcessor.processRequest(si);,接下来就是请求处理器链的逻辑啦,从这里也能看到nio编程的思想就是,将数据的收发与处理分开,这样的话可以极大的提高数据的吞吐和处理效率,因为处理数据是在另外一个线程中的。

private void readPayload() throws IOException, InterruptedException {
        if (incomingBuffer.remaining() != 0) { // have we read length bytes?
            int rc = sock.read(incomingBuffer); // sock is non-blocking, so ok
            if (rc < 0) {
                throw new EndOfStreamException(
                        "Unable to read additional data from client sessionid 0x"
                        + Long.toHexString(sessionId)
                        + ", likely client has closed socket");
            }
        }
        if (incomingBuffer.remaining() == 0) { // have we read length bytes?
            packetReceived();
            incomingBuffer.flip();
            // 读取连接请求,处理连接,创建会话
            if (!initialized) {
                readConnectRequest();
            } else {
                // 读取请求
                readRequest();
            }
            lenBuffer.clear();
            incomingBuffer = lenBuffer;
        }
    }

从快照文件中加载数据

加载数据的本质是FileTxnSnapLog.java类中的restore()方法,先反序列化快照文件。

public long restore(DataTree dt, Map<Long, Integer> sessions, 
            PlayBackListener listener) throws IOException {
        snapLog.deserialize(dt, sessions);
        return fastForwardFromEdits(dt, sessions, listener);
    }

反序列化的逻辑也很清晰,就是获取snapDir配置的路径下所有的快照文件,然后对文件字节流进行反序列化,再加载到DataTree中。之后就是获取事务文件,找到比快照文件zxid大的事务文件,然后依次处理

public long deserialize(DataTree dt, Map<Long, Integer> sessions)
            throws IOException {
        // we run through 100 snapshots (not all of them)
        // if we cannot get it running within 100 snapshots
        // we should  give up
        List<File> snapList = findNValidSnapshots(100);
        if (snapList.size() == 0) {
            return -1L;
        }
        File snap = null;
        boolean foundValid = false;
        for (int i = 0; i < snapList.size(); i++) {
            snap = snapList.get(i);
            InputStream snapIS = null;
            CheckedInputStream crcIn = null;
            try {
                LOG.info("Reading snapshot " + snap);
                snapIS = new BufferedInputStream(new FileInputStream(snap));
                crcIn = new CheckedInputStream(snapIS, new Adler32());
                InputArchive ia = BinaryInputArchive.getArchive(crcIn);
                deserialize(dt,sessions, ia);
                long checkSum = crcIn.getChecksum().getValue();
                long val = ia.readLong("val");
                if (val != checkSum) {
                    throw new IOException("CRC corruption in snapshot :  " + snap);
                }
                foundValid = true;
                break;
            } catch(IOException e) {
                LOG.warn("problem reading snap file " + snap, e);
            } finally {
                if (snapIS != null) 
                    snapIS.close();
                if (crcIn != null) 
                    crcIn.close();
            } 
        }
        if (!foundValid) {
            throw new IOException("Not able to find valid snapshots in " + snapDir);
        }
        dt.lastProcessedZxid = Util.getZxidFromName(snap.getName(), SNAPSHOT_FILE_PREFIX);
        return dt.lastProcessedZxid;
    }

获取配置路径下所有的事物文件,然后找到比lastProcessedZxid大的事务文件,再依次处理。

public long fastForwardFromEdits(DataTree dt, Map<Long, Integer> sessions,
                                     PlayBackListener listener) throws IOException {
        FileTxnLog txnLog = new FileTxnLog(dataDir);
        // 找到比lastProcessedZxid大的事务文件
        TxnIterator itr = txnLog.read(dt.lastProcessedZxid+1);
        long highestZxid = dt.lastProcessedZxid;
        TxnHeader hdr;
        try {
            while (true) {
                // iterator points to 
                // the first valid txn when initialized
                hdr = itr.getHeader();
                if (hdr == null) {
                    //empty logs 
                    return dt.lastProcessedZxid;
                }
                if (hdr.getZxid() < highestZxid && highestZxid != 0) {
                    LOG.error("{}(higestZxid) > {}(next log) for type {}",
                            new Object[] { highestZxid, hdr.getZxid(),
                                    hdr.getType() });
                } else {
                    highestZxid = hdr.getZxid();
                }
                try {
                    processTransaction(hdr,dt,sessions, itr.getTxn());
                } catch(KeeperException.NoNodeException e) {
                   throw new IOException("Failed to process transaction type: " +
                         hdr.getType() + " error: " + e.getMessage(), e);
                }
                listener.onTxnLoaded(hdr, itr.getTxn());
                if (!itr.next()) 
                    break;
            }
        } finally {
            if (itr != null) {
                itr.close();
            }
        }
        return highestZxid;
    }

请求处理器链对客户端请求的处理我们下篇再说,因为这个篇幅太长了。
总结一下,zk服务端启动是先加载配置文件,然后启动线程监听客户端连接,与客户端连接后进行数据传输,再就是读取快照文件和事务文件中的数据到内存中。有不对的地方请大神指出,欢迎大家一起讨论交流,共同进步。