近段时间尝试用可视化界面把zookeeper的数据集成到公司的后台系统中展示,进而查阅了资料研究zookeeper的使用,于是看着看着就手痒想用它的API实现一个简单的分布式锁.
本程序实现的分布式锁适用于集群单点故障,leader选举等场景

下面先从一下几个方面描述一下分布式锁的概述,问题及程序解决思路

1. 概述

分布式锁在一组进程之间提供了一种互斥机制。在任何时刻,在任何时刻只有一个进程可以持有锁。分布式锁可以在大型分布式系统中实现领导者选举,在任何时间点,持有锁的那个进程就是系统的领导者。

注意:不要将ZooKeeper自己的领导者选举和使用了ZooKeeper基本操作实现的一般领导者选混为一谈。ZooKeeper自己的领导者选举机制是对外不公开的,我们这里所描述的一般领导者选举服务则不同,他是对那些需要与主进程保持一致的分布式系统所设计的。

2. 原理

为了使用ZooKeeper来实现分布式锁服务,我们使用顺序znode来为那些竞争锁的进程强制排序。

思路很简单:

  1. 首先指定一个作为锁的znode,通常用它来描述被锁定的实体,称为/distributed(或者其它任意名字均可);
  2. 然后希望获得锁的客户端创建一些短暂顺序znode,作为锁znode的子节点。
  3. 在任何时间点,顺序号最小的客户端将持有锁

例如,有两个客户端差不多同时创建znode,分别为/distributed/lock-1/distributed/lock-2,那么创建/distributed/lock-1的客户端将会持有锁,因为它的znode顺序号最小。ZooKeeper服务是顺序的仲裁者,因为它负责分配顺序号。

  1. 通过删除znode /leader/lock-l即可简单地将锁释放;
  2. 另外,如果客户端进程死亡,对应的短暂znode也会被删除。
  3. 接下来,创建/leader/lock-2的客户端将持有锁,因为它顺序号紧跟前一个。
  4. 通过创建一个关于znode删除的watcher,可以使客户端在获得锁时得到通知。

那么针对以上的思路可以得到以下初步的算法步骤:

  1. 在锁znode下创建一个名为lock-的短暂顺序znode,并且记住它的实际路径名(create操作的返回值)。
  2. 查询锁znode的子节点并且设置一个观察
  3. 如果步骤l中所创建的znode在步骤2中所返回的所有子节点中具有最小的顺序号,则获取到锁。退出。
  4. 等待步骤2中所设观察的通知并且转到步骤2。

3. 问题及解决方案

上面的算法还是比较初步的,仔细考虑的话里面可能会有以下问题:
1. 羊群效应

问题描述:
虽然这个算法是正确的,但还是存在一些问题。第一个问题是这种实现会受到羊群效应(herd effect)的影响。考虑有成百上千客户端的情况,所有的客户端都在尝试获得锁,每个客户端都会在锁znode上设置一个观察,用于捕捉子节点的变化。每次锁被释放或另外一个进程开始申请获取锁的时候,观察都会被触发并且每个客户端都会收到一个通知。 羊群效应就是指大量客户端收到同一事件的通知,但实际上只有很少一部分需要处理这一事件。在这种情况下,只有一个客户端会成功地获取锁,但是维护过程及向所有客户端发送观察事件会产生峰值流量,这会对ZooKeeper服务器造成压力。

解决方案:(本程序已实现)
为了避免出现羊群效应,我们需要优化通知的条件。关键在于只有在前一个顺序号的子节点消失时才需要通知下一个客户端,而不是删除(或创建)任何子节点时都需要通知。在我们的例子中,如果客户端创建了znode /distributed/lock-1、/distributed/lock-2和/distributed/lock-3,那么只有当/distributed/lock-2消失时才需要通知/distributed/lock-3对照的客户端;/distributed/lock-1消失或有新的znode /distributed/lock-4加入时,不需要通知该客户端。
2. 可恢复的异常

问题描述:
这个申请锁的算法目前还存在另一个问题,就是不能处理因连接丢失而导致的create操作失败。如前所述,在这种情况下,我们不知道操作是成功还是失败。由于创建一个顺序znode是非幂等操作,所以我们不能简单地重试,因为如果第一次创建已经成功,重试会使我们多出一个永远删不掉的孤儿znode(至少到该客户端会话结束前)。不幸的结果是将会出现死锁(即该会话的第二个znode等待自己的第一个znode删除)。

解决方案: (本程序未实现)

问题在于,在重新连接之后客户端不能够判断它是否已经创建过子节点。解决方案是在znode的名称中嵌入一个ID,如果客户端出现连接丢失的情况,重新连接之后它便可以对锁节点的所有于节点进行检查,看看是否有子节点的名称中包含其ID。如果有一个子节点的名称包含其ID,它便知道创建操作已经成功,不需要再创建子节点。如果没有子节点的名称中包含其ID,则客户端可以安全地创建一个新的顺序子节点。

客户端会话的ID是一个长整数,并且在ZooKeeper服务中是唯一的,因此非常适合在连接丢失后用于识别客户端。可以通过调用Java ZooKeeper类的getSessionld()方法来获得会话的ID

在创建短暂顺序znode时应当采用lock-<sessionld>-这样的命名方式,ZooKeeper在其尾部添加顺序号之后,znode的名称会形如lock-<sessionld>-<sequenceNumber>。由于顺序号对于父节点来说是唯一的,但对于子节点名并不唯一,因此采用这样的命名方式可以诖子节点在保持创建顺序的同时能够确定自己的创建者。

3 不可恢复的异常

描述
如果一个客户端的ZooKeeper会话过期,那么它所创建的短暂znode将会被删除,已持有的锁会被释放,或是放弃了申请锁的位置。使用锁的应用程序应当意识到它已经不再持有锁,应当清理它的状态,然后通过创建并尝试申请一个新的锁对象来重新启动。注意,这个过程是由应用程序控制的,而不是锁,因为锁是不能预知应用程序需要如何清理自己的状态

具体的程序如下:

//LockException.java源代码
package zookeeper.application.my.distributedlock;

/**
 * Created by chenyuzhi on 17-10-26.
 */
public class LockException extends RuntimeException {
    public LockException(String message) {
        super(message);
    }

    public LockException(Throwable cause) {
        super(cause);
    }
}
//Zknode.java源代码
package zookeeper.application.my.distributedlock;

import org.apache.zookeeper.*;

import java.io.IOException;
import java.text.MessageFormat;
import java.util.Random;
import java.util.concurrent.CountDownLatch;

/**
 * Created by chenyuzhi on 17-10-26.
 */
public class ZkNode implements Watcher {
    private String id;
    private ZooKeeper zkClient;
    private String lockPath;
    private String currNodePath;
    private CountDownLatch latch;
    private DistributedLock lock;
    private final int sessionTimeout = 30000;

    public void doSomethingAsLeader(){
        try {
            lock.lock();
            System.out.println(MessageFormat.format("[{0}-{1}]: Now I am leader, all must follow!",
                    id,currNodePath));

            Thread.sleep(new Random().nextInt(3000)+500);

        }catch (InterruptedException e) {
            e.printStackTrace();
        }finally {
            lock.unlock();
            lock = null;
            clear();
        }


    }

    public ZkNode(String config,String lockPath,int id) {

        try {
            zkClient = new ZooKeeper(config, sessionTimeout, this);

            if(null == zkClient.exists(lockPath,false)){
                synchronized (ZkNode.class){
                    if(null == zkClient.exists(lockPath,false)){
                        zkClient.create(lockPath,new byte[0], ZooDefs.Ids.OPEN_ACL_UNSAFE,CreateMode.PERSISTENT);
                    }
                }
            }

            this.lockPath = lockPath;
            this.id = "node" + id;
            this.currNodePath = zkClient.create(lockPath+"/lock",new byte[0],
                    ZooDefs.Ids.OPEN_ACL_UNSAFE,CreateMode.EPHEMERAL_SEQUENTIAL);

            this.lock = new DistributedLock(this);


        } catch (KeeperException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    @Override
    public void process(WatchedEvent watchedEvent) {
        if(null != latch){
            latch.countDown();
        }
    }

    public void clear(){

        try {
            if(null != zkClient) {
                zkClient.close();
            }
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

    }

    public String getId() {
        return id;
    }

    public void setLatch(CountDownLatch latch) {
        this.latch = latch;
    }

    public String getLockPath() {
        return lockPath;
    }

    public String getCurrNodePath() {
        return currNodePath;
    }

    public ZooKeeper getZkClient() {
        return zkClient;
    }
    public CountDownLatch getLatch() {
        return latch;
    }
}
//DistributedLock.java源代码
package zookeeper.application.my.distributedlock;

import org.apache.zookeeper.KeeperException;
import org.apache.zookeeper.ZooKeeper;
import org.apache.zookeeper.data.Stat;

import java.text.MessageFormat;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.Condition;
import java.util.concurrent.locks.Lock;

/**
 * Created by chenyuzhi on 17-10-26.
 */
public class DistributedLock implements Lock {
    private final ZooKeeper zkClient;
    private final String lockNodePath;
    private final String currNodePath;
    private final ZkNode zkNode;
    private String waitNodePath;



    public DistributedLock(ZkNode zkNode) {
        this.lockNodePath = zkNode.getLockPath();
        this.currNodePath = zkNode.getCurrNodePath();
        this.zkClient = zkNode.getZkClient();
        this.zkNode = zkNode;
    }

    @Override
    public void lock() {
        if(tryLock()){
            return;
        }
        waitForLock();
    }

    @Override
    public void lockInterruptibly() throws InterruptedException {

    }

    @Override
    public boolean tryLock() {
        try {
            List<String> nodeChildren = zkClient.getChildren(lockNodePath,false);
            String currNodeName = this.currNodePath.substring(currNodePath.lastIndexOf("/") + 1);
            Collections.sort(nodeChildren);
            if(currNodeName.equals(nodeChildren.get(0)))
                return true;
            this.waitNodePath = this.lockNodePath + "/" +
                    nodeChildren.get(Collections.binarySearch(nodeChildren,currNodeName) - 1);
            return false;

        }catch (Exception e){
            throw new LockException(e);
        }
    }

    @Override
    public boolean tryLock(long time, TimeUnit unit) throws InterruptedException {
        if(tryLock()){
            return true;
        }
        return waitForLock(time,unit);
    }

    @Override
    public void unlock() {
        try {
            if(null != zkClient.exists(currNodePath,false)){
                System.out.println(MessageFormat.format("[{0}-{1}]: Unlocking!",
                        zkNode.getId(),currNodePath));
                zkClient.delete(currNodePath,-1);
            }
        } catch (KeeperException e) {
            throw new LockException(e);
        } catch (InterruptedException e) {
            throw new LockException(e);
        }
    }

    private boolean waitForLock(long time, TimeUnit unit){
        boolean retValue = false;
        try {
            Stat stat = zkClient.exists(this.waitNodePath,true); 

            if(stat != null){
                System.out.println(MessageFormat.format("[{0}-{1}]: waiting for  {2}!",
                        zkNode.getId(),currNodePath,waitNodePath));
                this.zkNode.setLatch(new CountDownLatch(1)); 
                retValue = this.zkNode.getLatch().await(time,unit);
                this.zkNode.setLatch(null);
            }
            return retValue;
        } catch (InterruptedException e) {
            throw new LockException(e);
        } catch (KeeperException e) {
            throw new LockException(e);
        }
    }

    private boolean waitForLock(){

        try {
            Stat stat = zkClient.exists(this.waitNodePath,true);  //maybe has a bug because the continuity of thread executor

            if(stat != null){
                System.out.println(MessageFormat.format("[{0}-{1}]: waiting for  {2}!",
                        zkNode.getId(),currNodePath,waitNodePath));
                this.zkNode.setLatch(new CountDownLatch(1));  //maybe has a bug because the continuity of thread executor
                this.zkNode.getLatch().await();
                this.zkNode.setLatch(null);
            }
            return true;
        } catch (InterruptedException e) {
            throw new LockException(e);
        } catch (KeeperException e) {
            throw new LockException(e);
        }
    }

    @Override
    public Condition newCondition() {
        return null;
    }
}
//ConcurrentTest.java源代码
package zookeeper.application.my.distributedlock;

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.CopyOnWriteArrayList;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicInteger;

/**
 * Created by chenyuzhi on 17-10-26.
 */
public class ConcurrentTest {
    private CopyOnWriteArrayList<Long> list = new CopyOnWriteArrayList<Long>();
    private CountDownLatch doneSignal;
    private AtomicInteger err = new AtomicInteger();//原子递增
    public static void main(String[] args){
        new ConcurrentTest().test(15);
    }

    private void test(int nodeNum){
        doneSignal = new CountDownLatch(nodeNum);
        for(int i=0;i<nodeNum;i++){
            final int  index = i;
            new Thread(new Runnable() {
                @Override
                public void run() {
                    ZkNode zkNode = new ZkNode("127.0.0.1:2183","/distributed",index);
                    long start = System.currentTimeMillis();
                    zkNode.doSomethingAsLeader();
                    long end = (System.currentTimeMillis() - start);
                    list.add(end);
                    doneSignal.countDown();
                }
            }).start();
        }

        try {
            doneSignal.await();
            getExeTime();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }

    /**
     * 计算平均响应时间
     */
    private void getExeTime() {
        int size = list.size();
        List<Long> _list = new ArrayList<Long>(size);
        _list.addAll(list);
        Collections.sort(_list);
        long min = _list.get(0);
        long max = _list.get(size-1);
        long sum = 0L;
        for (Long t : _list) {
            sum += t;
        }
        long avg = sum/size;
        System.out.println("min: " + min);
        System.out.println("max: " + max);
        System.out.println("avg: " + avg);

    }
}

需要的maven依赖:

<dependency>
    <groupId>org.apache.zookeeper</groupId>
    <artifactId>zookeeper</artifactId>
    <version>3.4.6</version>
</dependency>

运行结果:

zookeeper.application.my.distributedlock.ConcurrentTest
log4j:WARN No appenders could be found for logger (org.apache.zookeeper.ZooKeeper).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[node3-/distributed/lock0000000017]: Now I am leader, all must follow!
[node14-/distributed/lock0000000023]: waiting for  /distributed/lock0000000022!
[node5-/distributed/lock0000000021]: waiting for  /distributed/lock0000000020!
[node8-/distributed/lock0000000025]: waiting for  /distributed/lock0000000024!
[node4-/distributed/lock0000000028]: waiting for  /distributed/lock0000000027!
[node1-/distributed/lock0000000019]: waiting for  /distributed/lock0000000018!
[node9-/distributed/lock0000000029]: waiting for  /distributed/lock0000000028!
[node0-/distributed/lock0000000022]: waiting for  /distributed/lock0000000021!
[node10-/distributed/lock0000000030]: waiting for  /distributed/lock0000000029!
[node7-/distributed/lock0000000026]: waiting for  /distributed/lock0000000025!
[node6-/distributed/lock0000000031]: waiting for  /distributed/lock0000000030!
[node12-/distributed/lock0000000027]: waiting for  /distributed/lock0000000026!
[node11-/distributed/lock0000000024]: waiting for  /distributed/lock0000000023!
[node2-/distributed/lock0000000018]: waiting for  /distributed/lock0000000017!
[node13-/distributed/lock0000000020]: waiting for  /distributed/lock0000000019!
[node3-/distributed/lock0000000017]: Unlocking!
[node2-/distributed/lock0000000018]: Now I am leader, all must follow!
[node2-/distributed/lock0000000018]: Unlocking!
[node1-/distributed/lock0000000019]: Now I am leader, all must follow!
[node1-/distributed/lock0000000019]: Unlocking!
[node13-/distributed/lock0000000020]: Now I am leader, all must follow!
[node13-/distributed/lock0000000020]: Unlocking!
[node5-/distributed/lock0000000021]: Now I am leader, all must follow!
[node5-/distributed/lock0000000021]: Unlocking!
[node0-/distributed/lock0000000022]: Now I am leader, all must follow!
[node0-/distributed/lock0000000022]: Unlocking!
[node14-/distributed/lock0000000023]: Now I am leader, all must follow!
[node14-/distributed/lock0000000023]: Unlocking!
[node11-/distributed/lock0000000024]: Now I am leader, all must follow!
[node11-/distributed/lock0000000024]: Unlocking!
[node8-/distributed/lock0000000025]: Now I am leader, all must follow!
[node8-/distributed/lock0000000025]: Unlocking!
[node7-/distributed/lock0000000026]: Now I am leader, all must follow!
[node7-/distributed/lock0000000026]: Unlocking!
[node12-/distributed/lock0000000027]: Now I am leader, all must follow!
[node12-/distributed/lock0000000027]: Unlocking!
[node4-/distributed/lock0000000028]: Now I am leader, all must follow!
[node4-/distributed/lock0000000028]: Unlocking!
[node9-/distributed/lock0000000029]: Now I am leader, all must follow!
[node9-/distributed/lock0000000029]: Unlocking!
[node10-/distributed/lock0000000030]: Now I am leader, all must follow!
[node10-/distributed/lock0000000030]: Unlocking!
[node6-/distributed/lock0000000031]: Now I am leader, all must follow!
[node6-/distributed/lock0000000031]: Unlocking!
min: 1032
max: 31075
avg: 16046

Process finished with exit code 0

在程序中我在两处地方标注了可能出现bug的地方,这是因为:
1. 若当前线程执行到

Stat stat = zkClient.exists(this.waitNodePath,true);

后,当前线程退出资源的占用,然后恰巧此时它刚才设置观察的znode deletele了,那么后面的

this.zkNode.setLatch(new CountDownLatch(1)); 
this.zkNode.getLatch().await();

两个语句执行后将导致当前线程一直出于阻塞状态,因为设置观察的znode deletele了,不会有watcher的回调执行latch.countDown();

  1. 若当前线程执行到
this.zkNode.setLatch(new CountDownLatch(1));

后,当前线程退出资源的占用,然后恰巧此时它刚才设置观察的znode deletele了,那么也是同样的原因:

this.zkNode.getLatch().await();

该语句执行后将导致当前线程一直出于阻塞状态,因为设置观察的znode deletele了,不会有watcher的回调执行latch.countDown();

解决方法
最好的解决办法是采取类似于传统数据的事务形式去强制此段代码的连续执行(即一致性),使其要么同时成功,要么同时失败,但是在本例中暂时还找不到利用比较简单的逻辑是去实现这个想法的方法.

有人说可以利用同步锁去实现(比如synchronized),但是在分布式情况下,你不能保证当前线程等待的前面znode对应的节点突然挂掉,导致回调仍然可能会发生以上两处地方

由于小弟编程能力尚浅,有些特殊情况的转换没能考虑好,希望各位可以提出,或者贴出更完善的解析程序供大家分享,在此处权当抛砖引玉了。