一、多线程介绍

  线程(Thread)也叫轻量级进程,是操作系统能够进行运算调度的最小单位,它被包涵在进程之中,是进程中的实际运作单位。线程自己不拥有系统资源,只拥有一点儿在运行中必不可少的资源,但它可与同属一个进程的其它线程共享进程所拥有的全部资源。一个线程可以创建和撤消另一个线程,同一进程中的多个线程之间可以并发执行。

 

多线程的大部分操作方法和功能与python多进程的方法大差不差,相似到我都不愿意再写一遍(多进程参考),可以说是一模一样,统一了入口就变得简单了很多

但多线程也有些不一样的地方

  • 进程之间不能共享内存,但线程之间共享内存非常容易。
  • 操作系统在创建进程时,需要为该进程重新分配系统资源,但创建线程的代价则小得多。因此,使用多线程来实现多任务并发执行比使用多进程的效率高。
  • Python 语言内置了多线程功能支持,而不是单纯地作为底层操作系统的调度方式,从而简化了 Python 的多线程编程。

借用别人的话来说 (参考链接)

GIL(Global Interpreter Lock)全局解释器锁
在非python环境中,单核情况下,同时只能有一个任务执行。多核时可以支持多个线程同时执行。但是在python中,无论有多少核,同时只能执行一个线程。究其原因,这就是由于GIL的存在导致的。

GIL的全称是Global Interpreter Lock(全局解释器锁),来源是python设计之初的考虑,为了数据安全所做的决定。某个线程想要执行,必须先拿到GIL,我们可以把GIL看作是“通行证”,并且在一个python进程中,GIL只有一个。拿不到通行证的线程,就不允许进入CPU执行。GIL只在cpython中才有,因为cpython调用的是c语言的原生线程,所以他不能直接操作cpu,只能利用GIL保证同一时间只能有一个线程拿到数据。而在pypy和jpython中是没有GIL的。

Python多线程的工作过程:
python在使用多线程的时候,调用的是c语言的原生线程。

拿到公共数据
申请gil
python解释器调用os原生线程
os操作cpu执行运算
当该线程执行时间到后,无论运算是否已经执行完,gil都被要求释放
进而由其他进程重复上面的过程
等其他进程执行完后,又会切换到之前的线程(从他记录的上下文继续执行),整个过程是每个线程执行自己的运算,当执行时间到就进行切换(context switch)。
python针对不同类型的代码执行效率也是不同的:

1、CPU密集型代码(各种循环处理、计算等等),在这种情况下,由于计算工作多,ticks计数很快就会达到阈值,然后触发GIL的释放与再竞争(多个线程来回切换当然是需要消耗资源的),所以python下的多线程对CPU密集型代码并不友好。
2、IO密集型代码(文件处理、网络爬虫等涉及文件读写的操作),多线程能够有效提升效率(单线程下有IO操作会进行IO等待,造成不必要的时间浪费,而开启多线程能在线程A等待时,自动切换到线程B,可以不浪费CPU的资源,从而能提升程序执行效率)。所以python的多线程对IO密集型代码比较友好。

使用建议?

python下想要充分利用多核CPU,就用多进程。因为每个进程有各自独立的GIL,互不干扰,这样就可以真正意义上的并行执行,在python中,多进程的执行效率优于多线程(仅仅针对多核CPU而言)。

GIL在python中的版本差异:

1、在python2.x里,GIL的释放逻辑是当前线程遇见IO操作或者ticks计数达到100时进行释放。(ticks可以看作是python自身的一个计数器,专门做用于GIL,每次释放后归零,这个计数可以通过sys.setcheckinterval 来调整)。而每次释放GIL锁,线程进行锁竞争、切换线程,会消耗资源。并且由于GIL锁存在,python里一个进程永远只能同时执行一个线程(拿到GIL的线程才能执行),这就是为什么在多核CPU上,python的多线程效率并不高。
2、在python3.x中,GIL不使用ticks计数,改为使用计时器(执行时间达到阈值后,当前线程释放GIL),这样对CPU密集型程序更加友好,但依然没有解决GIL导致的同一时间只能执行一个线程的问题,所以效率依然不尽如人意。

 

Timer: 在python的多线程中有自有的Timer类,用于延迟执行线程

import os
import random
import time
from threading import BrokenBarrierError, Barrier, Timer, Thread


def pass_func():
    print("3 Process passed")


def func(num, barrier):
    print("I'm [{}], my pid: [{}] start at time: [{}]".format(num, os.getpid(), time.ctime()))
    try:
        bid = barrier.wait(10)
        time.sleep(num)
        print("[{}] 已经凑足了 [{}]个了".format(num, bid))
        if int(bid) == 2:
            barrier.abort()
    except BrokenBarrierError as e:
        pass
    else:
        print("[{}] go ahead".format(num))


if __name__ == '__main__':
    barrier = Barrier(parties=3, action=pass_func, timeout=0.5)
    processes = []
    print("start ctime", time.ctime())
    for i in range(10):
        p = Timer(5, function=func, args=(i, barrier))
        p.start()
        processes.append(p)

    for i in processes:
        i.join()

    执行结果

start ctime Sun Dec 25 22:17:53 2022
I'm [5], my pid: [58112] start at time: [Sun Dec 25 22:17:58 2022]I'm [3], my pid: [58112] start at time: [Sun Dec 25 22:17:58 2022]I'm [4], my pid: [58112] start at time: [Sun Dec 25 22:17:58 2022]
I'm [1], my pid: [58112] start at time: [Sun Dec 25 22:17:58 2022]I'm [0], my pid: [58112] start at time: [Sun Dec 25 22:17:58 2022]
I'm [8], my pid: [58112] start at time: [Sun Dec 25 22:17:58 2022]
3 Process passed

[0] 已经凑足了 [1]个了
I'm [2], my pid: [58112] start at time: [Sun Dec 25 22:17:58 2022]I'm [6], my pid: [58112] start at time: [Sun Dec 25 22:17:58 2022]
3 Process passed
I'm [9], my pid: [58112] start at time: [Sun Dec 25 22:17:58 2022]I'm [7], my pid: [58112] start at time: [Sun Dec 25 22:17:58 2022]




[0] go ahead
3 Process passed
[1] 已经凑足了 [1]个了
[1] go ahead
[2] 已经凑足了 [1]个了
[2] go ahead
[4] 已经凑足了 [0]个了
[4] go ahead
[5] 已经凑足了 [2]个了
[5] go ahead
[6] 已经凑足了 [2]个了
[6] go ahead
[7] 已经凑足了 [0]个了
[7] go ahead
[8] 已经凑足了 [0]个了
[8] go ahead
[9] 已经凑足了 [2]个了
[9] go ahead

进程已结束,退出代码0

可以看到结果的每一个线程启动的时候都是延迟了5秒钟的

但是貌似Barrier在多线程中的效果也不是很明确,看起来总像是并没有起到阻拦的作用(暂不确定是否有用)

 

线程间的通信可以直接使用全局变量,因为都是属于同一个进程下的线程之间都是资源共享的,但是为了数据的准确性可以加入互斥锁

多线程间没有manager,manager是进程间的资源共享的,线程间的资源本就是共享的,但是多线程依然是有Event,Contidition的

 

 

在进城池或线程池中可以通过添加回调函数(add_done_back())来获取结果,在多进程类(Process)中可以使用Pipe的方式来获取异常以及结果(将前面多进程获取异常结果的例子中的输送异常信息改造成输送结果),在多线程中可以直接继承多线程类的方式来获取结果,如下:

import random
import time
from threading import Thread, get_ident


def func(num):
    time.sleep(random.randrange(5))
    print("I'm [{}], my ident: [{}]".format(num, get_ident()))
    return "return from [{}]".format(num)


class MyThread(Thread):
    def __init__(self, *args, **kwargs):
        super(MyThread, self).__init__(*args, **kwargs)
        self._result = None

    def run(self) -> None:
        try:
            if self._target is not None:
                self._result = self._target(*self._args, **self._kwargs)
        finally:
            del self._target, self._args, self._kwargs

    @property
    def result(self):
        return self._result


threads = []

for i in range(5):
    t = MyThread(target=func, args=(i,))
    t.start()
    threads.append(t)
    
for i in threads:
    i.join()

for i in threads:
    print(i.result)

    执行结果

I'm [2], my ident: [56676]
I'm [3], my ident: [50876]
I'm [4], my ident: [52848]
I'm [0], my ident: [56660]
I'm [1], my ident: [56812]
return from [0]
return from [1]
return from [2]
return from [3]
return from [4]

进程已结束,退出代码0

 

 线程间的数据交流可以直接用queue的Queue,而不是multiprocessing里面的Queue

threading里面的currentThread 函数实例化返回的是当前线程的对象,如currentThread().name 是当前线程的名字,

 

setprofile(): 

setprofile(func)

setprofile()是Python中线程模块的内置方法。 它用于为线程模块创建的所有线程设置配置文件功能。 每个函数的func函数都传递给sys.profile() 。此方法的返回类型为<class'NoneType'> ,它不返回任何内容。 它为所有线程设置配置文件功能。

 

func :这是必需的参数,该参数传递给每个线程的sys.setprofile()。 该函数在run()方法之前执行。

# Python program to explain the use of 
# setprofile()  method in Threading Module
 
import time
import threading
 
def trace_profile(): 
    print("Current thread's profile")
    print("Name:", str(threading.current_thread().getName()))
    print("Thread id:", threading.get_ident())
 
def thread_1(i):
    time.sleep(5)
    threading.setprofile(trace_profile())
    print("Value by Thread-1:",i)
    print()
    
def thread_2(i):
    threading.setprofile(trace_profile())
    print("Value by Thread-2:",i)
    print()
    
def thread_3(i):
    time.sleep(4)
    threading.setprofile(trace_profile())
    print("Value by Thread-3:",i)
    print()
    
def thread_4(i):
    time.sleep(1)
    threading.setprofile(trace_profile())
    print("Value by Thread-4:",i)
    print()
 
# Creating sample threads 
threading.setprofile(trace_profile())
thread1 = threading.Thread(target=thread_1, args=(1,))
thread2 = threading.Thread(target=thread_2, args=(2,))
thread3 = threading.Thread(target=thread_3, args=(3,))
thread4 = threading.Thread(target=thread_4, args=(4,))
 
# Starting the threads
thread1.start()
thread2.start()
thread3.start()
thread4.start()

    执行结果

Current thread's profile
Name: MainThread
Thread id: 140461120771840
Current thread's profile
Name: Thread-2
Thread id: 140461086283520
Value by Thread-2: 2
 
Current thread's profile
Name: Thread-4
Thread id: 140461086283520
Value by Thread-4: 4
 
Current thread's profile
Name: Thread-3
Thread id: 140461077890816
Value by Thread-3: 3
 
Current thread's profile
Name: Thread-1
Thread id: 140461094676224
Value by Thread-1: 1

 

settrace() 是Python中线程模块的内置方法。 它用于为线程模块创建的所有线程设置跟踪功能。 对于每个方法, func函数都传递给sys.settrace() 。

# Python program to explain the use of 
# settrace()  method in Threading Module
 
import time
import threading
 
def trace_function(): 
    print("Passing the trace function and current thread is:", str(threading.current_thread().getName())) 
 
def thread_1(i):
    time.sleep(5)
    threading.settrace(trace_function())
    print("Value by Thread-1:",i)
    print()
    
def thread_2(i):
    threading.settrace(trace_function())
    print("Value by Thread-2:",i)
    print()
    
def thread_3(i):
    time.sleep(4)
    threading.settrace(trace_function())
    print("Value by Thread-3:",i)
    print()
    
def thread_4(i):
    time.sleep(1)
    threading.settrace(trace_function())
    print("Value by Thread-4:",i)
    print()
 
# Creating sample threads 
threading.settrace(trace_function())
thread1 = threading.Thread(target=thread_1, args=(1,))
thread2 = threading.Thread(target=thread_2, args=(2,))
thread3 = threading.Thread(target=thread_3, args=(3,))
thread4 = threading.Thread(target=thread_4, args=(4,))
 
# Starting the threads
thread1.start()
thread2.start()
thread3.start()
thread4.start()

    执行结果

Passing the trace function and current thread is: MainThread
Passing the trace function and current thread is: Thread-2
Value by Thread-2: 2
 
Passing the trace function and current thread is: Thread-4
Value by Thread-4: 4
 
Passing the trace function and current thread is: Thread-3
Value by Thread-3: 3
 
Passing the trace function and current thread is: Thread-1
Value by Thread-1: 1

 

local方法,在全局中实例化一个全局对象,但是在单独的线程中对这个对象存储的数据在其他线程中是不可见的

先看局部变量示例:

import threading


def worker():
    sum = 0
    for i in range(10):
        sum += 1
    print("thread id : [{}], sum = [{}]".format(threading.current_thread(), sum))


for i in range(10):
    threading.Thread(target=worker).start()

    执行结果

thread id : [<Thread(Thread-1 (worker), started 53336)>], sum = [10]
thread id : [<Thread(Thread-2 (worker), started 63024)>], sum = [10]
thread id : [<Thread(Thread-3 (worker), started 68592)>], sum = [10]
thread id : [<Thread(Thread-4 (worker), started 68560)>], sum = [10]
thread id : [<Thread(Thread-5 (worker), started 58188)>], sum = [10]
thread id : [<Thread(Thread-6 (worker), started 52368)>], sum = [10]
thread id : [<Thread(Thread-7 (worker), started 60332)>], sum = [10]
thread id : [<Thread(Thread-8 (worker), started 65924)>], sum = [10]
thread id : [<Thread(Thread-9 (worker), started 65740)>], sum = [10]
thread id : [<Thread(Thread-10 (worker), started 64800)>], sum = [10]

进程已结束,退出代码0

 

再看全局变量示例:(注意sum的位置)

import threading

sum = 0

def worker():
    global sum
    for i in range(10):
        sum += 1
    print("thread id : [{}], sum = [{}]".format(threading.current_thread(), sum))


for i in range(10):
    threading.Thread(target=worker).start()

    执行结果

thread id : [<Thread(Thread-1 (worker), started 64552)>], sum = [10]
thread id : [<Thread(Thread-2 (worker), started 66748)>], sum = [20]
thread id : [<Thread(Thread-3 (worker), started 67532)>], sum = [30]
thread id : [<Thread(Thread-4 (worker), started 53156)>], sum = [40]
thread id : [<Thread(Thread-5 (worker), started 63684)>], sum = [50]
thread id : [<Thread(Thread-6 (worker), started 63164)>], sum = [60]
thread id : [<Thread(Thread-7 (worker), started 58192)>], sum = [70]
thread id : [<Thread(Thread-8 (worker), started 54532)>], sum = [80]
thread id : [<Thread(Thread-9 (worker), started 67728)>], sum = [90]
thread id : [<Thread(Thread-10 (worker), started 67320)>], sum = [100]

进程已结束,退出代码0

 

引入local之后的变化:

from threading import Thread, local, current_thread

sum = local()


def worker():
    global sum
    sum.val = 1
    for i in range(10):
        sum.val += 1
    print("thread id : [{}], sum = [{}]".format(current_thread().name, sum.val))


for i in range(10):
    Thread(target=worker).start()

    执行结果

thread id : [Thread-1 (worker)], sum = [11]
thread id : [Thread-2 (worker)], sum = [11]
thread id : [Thread-3 (worker)], sum = [11]
thread id : [Thread-4 (worker)], sum = [11]
thread id : [Thread-5 (worker)], sum = [11]
thread id : [Thread-6 (worker)], sum = [11]
thread id : [Thread-7 (worker)], sum = [11]
thread id : [Thread-8 (worker)], sum = [11]
thread id : [Thread-9 (worker)], sum = [11]
thread id : [Thread-10 (worker)], sum = [11]

进程已结束,退出代码0

 

错误示范:

from threading import Thread, local, current_thread

string_a = "abcde"
sum = local()
sum.val = 123
print("sum ==>", sum, type(sum), sum.val)


def worker():
    print("string_a ==> ", string_a)
    print("sum ==> ", sum)
    print("sum.val ==> ", sum.val)  # 线程之间无法访问local存储的其他的属性


Thread(target=worker).start()

    执行结果

sum ==> <_thread._local object at 0x0000020592B3A2F0> <class '_thread._local'> 123
string_a ==>  abcde
sum ==>  <_thread._local object at 0x0000020592B3A2F0>
Exception in thread Thread-1 (worker):
Traceback (most recent call last):
    .........
    line 12, in worker
    print("sum.val ==> ", sum.val)
                          ^^^^^^^
AttributeError: '_thread._local' object has no attribute 'val'

进程已结束,退出代码0

 

 

stack_size(size):  是Python中线程模块的内置方法。 它用于返回创建新线程时所需的线程堆栈大小。

size :这是一个可选参数,它指定要用于后续创建的线程的堆栈大小。 它的值必须为0或任何正整数; 默认值为0。

# Python program to explain the use of 
# stack_size()  method in Threading Module
 
import time
import threading
 
def thread_1(i):
    time.sleep(5)
    print("Value by Thread-1:",i)
    print()
 
def thread_2(i):
    time.sleep(4)
    print("Value by Thread-2:",i)
    print()
    
def thread_3(i):
    print("Value by Thread-3:",i)
    print()
    
def thread_4(i):
    time.sleep(1)
    print("Value by Thread-4:",i)
    print()        
 
# Creating sample threads 
thread1 = threading.Thread(target=thread_1, args=(100,))
thread2 = threading.Thread(target=thread_2, args=(200,))
thread3 = threading.Thread(target=thread_3, args=(300,))
thread4 = threading.Thread(target=thread_4, args=(400,))
 
print(threading.stack_size())
 
# Starting the threads
thread1.start()
thread2.start()
thread3.start()
thread4.start()

    执行结果

0
Value by Thread-3: 300
 
Value by Thread-4: 400
 
Value by Thread-2: 200
 
Value by Thread-1: 100

 

 

excepthook() (待确定)

1,自定义的 excepthook 是赋值给 sys.excepthook 的。
2,程序启动时,bulit-in 的 sys.excepthook 会被保存在 sys.__excepthook__ 中。
3,当工作线程发生异常并被捕获时,如果有用户自定义的 excepthook,就应该交由该函数处理。

额外拓展:

"""
1.对于一个未匹配到的例外,python解释器最后会调用sys.excepthook()并传入3个自变量:例外类型、例外实例
  和traceback对象,也就是sys.exc_info()返回元组中的3个值。默认显示相关例外的追踪信息。
2.如果想自定义sys.excepthook()被调用时的行为,可以自定义一个接受3个自变量的函数给sys.excepthook。
"""

import time, sys


def m():
    return 1 / 0


def n():
    m()


def p():
    n()


def myExcepthook(ttype, tvalue, ttraceback):
    print("例外类型:{}".format(ttype))
    print("例外对象:{}".format(tvalue))
    i = 1
    while ttraceback:
        print("第{}层堆栈信息".format(i))
        tracebackCode = ttraceback.tb_frame.f_code
        print("文件名:{}".format(tracebackCode.co_filename))
        print("函数或者模块名:{}".format(tracebackCode.co_name))
        ttraceback = ttraceback.tb_next
        i += 1


if __name__ == '__main__':
    sys.excepthook = myExcepthook
    p()

    time.sleep(3)
    print("继续执行")

    执行结果

例外类型:<class 'ZeroDivisionError'>
例外对象:division by zero
第1层堆栈信息
文件名:.........\study_1.py
函数或者模块名:<module>
第2层堆栈信息
文件名::.........\study_1.py
函数或者模块名:p
第3层堆栈信息
文件名::.........\study_1.py
函数或者模块名:n
第4层堆栈信息
文件名::.........\study_1.py
函数或者模块名:m

进程已结束,退出代码1