Hadoop高并发技术怎么实现的高并发hashmap

转载

我心依旧 2024-06-27 21:08:01

文章标签 Hadoop高并发技术怎么实现的链表 ci 数组 文章分类 Hadoop 大数据

一、hashMap

HashMap是基于哈希表(散列表)，实现Map接口的双列集合，数据结构是“链表散列”，也就是数组+链表，key唯一的value可以重复，允许存储null 键null 值，元素无序。

下图为hashMap的结构简图

Hadoop高并发技术怎么实现的高并发hashmap_ci

hashMap实现浅谈

初始化

public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

loadFactor为负载因子，默认值0.75f。

initialCapacity默认长度为16

最大容量为2^30

从源码上我们可以看到HashMap在构造时不会新建Entry数组

接下来我们看看put方法

public V put(K key, V value) {
        return putVal(hash(key), key, value, false, true);
    }
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        //定义一个Node数组tab，和一个node节点
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        // 如果存储元素的table为空，则进行必要字段的初始化
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        // 如果根据hash值获取的结点为空，则新建一个结点
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
             // 如果是红黑树结点的话，进行红黑树插入
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        // 链表长度大于8时，将链表转红黑树
                        if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

数组会在put方法是建立我们可以看到 hashmap的存储格式是有链表+红黑树，当链表长度变成8是他自动转成红黑树

get()方法也会是首先计算key的 hashCode 找到数组中对应位置的某一元素，通过key的equals方法在对应位置的链表中找到要的元素

1、get方法

public V get(Object key) {
        Node<K,V> e;

    // 调用getNode方法来获取键值对，如果没有找到返回null，找到了就返回键值对的值
        return (e = getNode(hash(key), key)) == null ? null : e.value;
    }

getNode方法

final Node<K,V> getNode(int hash, Object key) {

        Node<K,V>[] tab; Node<K,V> first, e; int n; K k;
    // 节点数组赋值、数组长度赋值、通过位运算得到求模结果确定链表的首节点
        if ((tab = table) != null && (n = tab.length) > 0 &&
            (first = tab[(n - 1) & hash]) != null) {
            if (first.hash == hash && // always check first node
                ((k = first.key) == key || (key != null && key.equals(k))))
                return first;
              // 如果首节点比对不相同、那么看看是否存在下一个节点，如果存在的话，可以继续比对，如果不存在就意味着key没有匹配的键值对    
            if ((e = first.next) != null) {
                if (first instanceof TreeNode)
                    return ((TreeNode<K,V>)first).getTreeNode(hash, key);
                do {
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        return e;
                } while ((e = e.next) != null);
            }
        }
        return null;
    }

hashmap扩容

hashmap扩容分两步

1.创建新的Entry数组长度是原来的两倍

2.将原先的数据rehash到新的数组里面

rehash的原因

Hash的公式---> index = HashCode（Key） & （Length - 1）

代入值你可以发现位运算的值会发生变化

特别注意

插入链表时候java8之后改为从尾部插入

具体原因博主暂时无法整理故挂出一个链接

https://www.jianshu.com/p/72181e25afb9

hashmap是线程不安全的故我们先引入hashTable

public class Hashtable<K,V> extends Dictionary<K,V> implements Map<K,V>, Cloneable,
java.io.Serializable {
 
public synchronized V put(K key, V value) {
    // 省略代码
 }
public synchronized V remove(Object key) {
   
 }
public synchronized V get(Object key) {
    
 }
public synchronized int size() {
    return count;
 }
}

可以看到hashtable 加入了synchronized关键字保证了线程安全不过会造成线程阻塞影响性能

接下来我们看下 ConcurrentHashMap

jdk1.8之前的concurrentHashMap实现原理对数据进行分片每一个分片每一个segment就是一个hashtable

Hadoop高并发技术怎么实现的高并发hashmap_ci_02

jdk1.8之后对链表进行加锁锁住里链表

链表是一个比较小的数据结构每个链表存的数据很少锁的粒度比较小并发读相对来大说很多

源码如下

final V putVal(K key, V value, boolean onlyIfAbsent) {
        if (key == null || value == null) throw new NullPointerException();
        int hash = spread(key.hashCode());
        int binCount = 0;
        for (Node<K,V>[] tab = table;;) {
            Node<K,V> f; int n, i, fh;
            if (tab == null || (n = tab.length) == 0)
                tab = initTable();
            else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
                if (casTabAt(tab, i, null,
                             new Node<K,V>(hash, key, value, null)))
                    break;                   // no lock when adding to empty bin
            }
            else if ((fh = f.hash) == MOVED)
                tab = helpTransfer(tab, f);
            else {
                V oldVal = null;
                synchronized (f) {
                    if (tabAt(tab, i) == f) {
                        if (fh >= 0) {
                            binCount = 1;
                            for (Node<K,V> e = f;; ++binCount) {
                                K ek;
                                if (e.hash == hash &&
                                    ((ek = e.key) == key ||
                                     (ek != null && key.equals(ek)))) {
                                    oldVal = e.val;
                                    if (!onlyIfAbsent)
                                        e.val = value;
                                    break;
                                }
                                Node<K,V> pred = e;
                                if ((e = e.next) == null) {
                                    pred.next = new Node<K,V>(hash, key,
                                                              value, null);
                                    break;
                                }
                            }
                        }
                        else if (f instanceof TreeBin) {
                            Node<K,V> p;
                            binCount = 2;
                            if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
                                                           value)) != null) {
                                oldVal = p.val;
                                if (!onlyIfAbsent)
                                    p.val = value;
                            }
                        }
                    }
                }
                if (binCount != 0) {
                    if (binCount >= TREEIFY_THRESHOLD)
                        treeifyBin(tab, i);
                    if (oldVal != null)
                        return oldVal;
                    break;
                }
            }
        }
        addCount(1L, binCount);
        return null;
    }

可以看出来节点头部为null进行cas操作保证线程安全，不为null的话则用synchronized锁住链表头部进行操作

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。