python thread分离 python threadlocal原理

转载

mob64ca13f587aa 2023-11-09 16:20:58

文章标签 python thread分离数据弱引用赋值 文章分类 Python 后端开发

Thread Local对象

Thread Local对象被称为是线程本地对象，也就是每个线程之间的数据是隔离的，不共享的。多个线程共同操作同一个Thread Local对象，但是在Thread Local的内部，实现了数据的线程隔离机制，这样多个线程虽然操作的同一个全局变量，而实际上的数据实现了线程之间的隔离，不会有线程安全的问题。而在Thread Local内部实现线程的数据隔离的方式就是按照线程来储存数据。基本原理如下：

Thread Local对象使用一个大字典来所有线程的的数据，而在这个字典中，保存了多个小字典(源码中称为thread dict，也就是线程字典)，每个小字典的key为线程的id，当一个线程操作这个Thread Local中的数据时，首先获取当前线程的id，更具这个线程的id，找到小字典，在小字典中读写数据。每个线程的id 是唯一的，所以每个线程只会读写自己的小字典，也就实现了线程数据的自动隔离，但是这还不够，因为多个线程共用一个Thread Local对象，操作同一个大字典，同样可能因为线程争抢而发生线程安全问题，所以使用了锁避免这样的问题发生。

源码解析

Local类

Thread Local对象的代码相当简洁，定义了几个简单的属性和方法，定义了 __new__方法，在对象上的_local__impl上绑定一个对象，也就是用来储存所有线程的大字典对象，然后在大字典中，为当前线程创建一个小字典。

class local:
    # 该对象只能有该两个属性。
    __slots__ = '_local__impl', '__dict__'

    def __new__(cls, *args, **kw):
        # 构造对象
        if (args or kw) and (cls.__init__ is object.__init__):
            raise TypeError("Initialization arguments are not supported")
        self = object.__new__(cls)
        # 初始化一个_localimpl()，该对象即储存所有小字典（thread dict）,不同的线程访问时该对象时，让线程访问自己thread dict。
        impl = _localimpl()      # 该对象内容之后详解
        impl.localargs = (args, kw)
        impl.locallock = RLock()    # 所有线程都需要访问impl,所以初始化了一个Rlock
        
        # 在local对象的self._local__impl 绑定上了一个_localimpl()
        object.__setattr__(self, '_local__impl', impl)
        
     # impl对象的create_dict是为当前的线程创建一个小字典，该字典的key为该线程的id值， 
     # {id(Thread):(ref(Thread), {thread dict}), id(Thread):(ref(Thread), {thread dict})}
     # 由于key为自己线程的id，所以该
        impl.create_dict()
        return self

    def __getattribute__(self, name):
        # 通过属性访问的方式访问threadlocal对象的属性和方法均回映射到该方法
        with _patch(self):
            return object.__getattribute__(self, name)

    def __setattr__(self, name, value):
        # 对该threadlocal 对象的属性进行赋值时候，将会执行该方法
        if name == '__dict__':
            raise AttributeError(
                "%r object attribute '__dict__' is read-only"
                % self.__class__.__name__)
        with _patch(self):
            return object.__setattr__(self, name, value)

    def __delattr__(self, name):
        # 删除该threadlocal对象上属性，将会执行该方法
        if name == '__dict__':
            raise AttributeError(
                "%r object attribute '__dict__' is read-only"
                % self.__class__.__name__)
        with _patch(self):
            return object.__delattr__(self, name)

所以从__new__方法的代码中可以看出，当我们实例化一个threadlocal对象后,即local = threading.Local()该对象_local__impl绑定了一个_localimpl()对象，并重写了__getattribute__, __setattr__, __delattr__三个魔术方法，对应了该对象的属性访问，设置属性值，删除属性值的三个操作。

local.attr__getattribute__方法
local.attr= 123__setattr__方法
del local.attr__delattr__方法

重写这三个方法的目的是为了改变这三个操作的默认行为，在访问该对象的属性前（访问该对象的属性实际是访问该对象__dict__中对应key的value），将本线程对应的数据绑定到该对象上的__dict__上，此时，该对象的属性绑定上了该_localimpl()中该线程的数据。然后再通过属性访问的方式获取。

_localimpl()对象

该对象的作用是构建一个大字典，字典中的每个key为所有需要访问该threadlocal的对象的线程id值。每个线程对象根据自己的id访问到对应的key，而在每一个key对应的值中，保存了一个ref引用和线程保存的数据。即

{
    id(Thread1):(ref(Thread1), {thread dict}),
    id(Thread2):(ref(Thread2), {thread dict})
}

thread dict中保存线程的数据，而保存ref(Thread1)的目的，是为了使用该弱引用绑定一个delete方法，也也就是当线程被销毁时，该_localimpl()对象中保存的对应线程的数据也一并进行垃圾回收。

源码如下：

class _localimpl:
    """A class managing thread-local dicts"""
    __slots__ = 'key', 'dicts', 'localargs', 'locallock', '__weakref__'

    def __init__(self):
        # The key used in the Thread objects' attribute dicts.
        # We keep it a string for speed but make it unlikely to clash with
        # a "real" attribute.
        # 该对象的id作为该对象的key，一个threadlocal 对应一个_localimpl()对象，对应一个该key
        self.key = '_threading_local._localimpl.' + str(id(self))
        # 定义一个保存所有线程的字典 { id(Thread) -> (ref(Thread), thread-local dict) }
        self.dicts = {}

    def get_dict(self):
        """Return the dict for the current thread. Raises KeyError if none
        defined."""
        # 调用get_dict时候，实际是从大字典中按照线程id获取对应小字典并返回。
        thread = current_thread()
        return self.dicts[id(thread)][1]

    def create_dict(self):
        """Create a new dict for the current thread, and return it."""
        # 定义该线程的小字典对象
        localdict = {}
        key = self.key
        thread = current_thread()   # 获取当前线程对象，并得到id值，
        idt = id(thread)
        def local_deleted(_, key=key):
            # When the localimpl is deleted, remove the thread attribute.
            thread = wrthread()
            if thread is not None:
                del thread.__dict__[key]
        def thread_deleted(_, idt=idt):
            # When the thread is deleted, remove the local dict.
            # Note that this is suboptimal if the thread object gets
            # caught in a reference loop. We would like to be called
            # as soon as the OS-level thread ends instead.
            local = wrlocal()
            if local is not None:
                dct = local.dicts.pop(idt)
        # 使用弱引用，当self对象 被销毁时，会调用local_deleted方法。当线程对象thread销毁时，会调用thread_deleted的方法，用于清除数据。
        wrlocal = ref(self, local_deleted)
        wrthread = ref(thread, thread_deleted)
        thread.__dict__[key] = wrlocal
        # 弱引用和小字典保存到self.dicts这个大字典对应的key下
        self.dicts[idt] = wrthread, localdict
        return localdict

上下文管理器_patch

在Local对象中定义的三个魔术方法中，都可以看到如下的代码。

with _patch(self):
    return object....

Local对象每次可以从自己的__dict__属性中准确的获取到当前线程的数据，就是在_patch中实现了对self对象__dict__属性的操作。源码如下

@contextmanager
def _patch(self):
    # 获取_localimpl对象
    impl = object.__getattribute__(self, '_local__impl')
    try:
        # 从_localimpl对象中调用get_dict() 方法，返回当前线程的 thread dict
        dct = impl.get_dict() 
    except KeyError:   # 如果当前线程没有对应的字典，则立即创建
        dct = impl.create_dict()
        args, kw = impl.localargs
        self.__init__(*args, **kw)
    # 将得到的thread dict 字典绑定到Local对象上，在本次访问结束前，该Local的__dict__中只能保存当前线程对用的数据，所以需要枷锁保证其他线程无法进行赋值
    with impl.locallock:
        object.__setattr__(self, '__dict__', dct)
        yield

@contextmanager为一个上下文管理，可以简单的理解为，被该装饰器装饰的生成器函数，满足上下文协议，也就可以使用with _patch()这样的语法。同时，当进入with 语句块内容，会执行生成器yield 关键字之前的内容，而退出with 语句块时，执行 yield 关键字之后的内容。从上述的源码中，显然只关注了进入with时的操作，也就是说，在获取local 对象的__dict__之前， _patch()函数需要从_localimpl()对象的大字典中获取当前线程的thread dict 小字典，并绑定到Local对象的__dict__，这样在本次访问时，该Local对象只保存了该当前线程的数据，而当其他线程访问Local时，_patch()会将其他线程的数据从_localimpl()中绑定到Local对象上。Local始终只会服务于当前调用他的线程，从而多个线程保存到一个Local对象的数据实现了隔离。