python进程写锁 python 进程池锁

转载

hackernew 2024-03-07 21:05:06

文章标签 python进程写锁 html 多线程多进程 文章分类 Python 后端开发

第十一课 python进阶多线程、多进程和线程池编程

tags:

Docker
慕课网

categories:

多线程
多进程
线程池
进程池

文章目录

第十一课 python进阶多线程、多进程和线程池编程

第一节 GIL和多线程

1. 1 GIL全局解释器锁
1.2 多线程编程(继承类实现多线程常用)
1.3 线程通信方式- 共享变量
1.4 线程通信方式- Queue

第二节多线程同步

2.1 线程同步-锁Lock、Rlock
2.2 线程同步-Condition条件变量
2.3 线程同步-信号量Semaphore

第三节线程池和进程池

3.1 线程池
3.2 多进程和多线程比较

第四节多进程

4.1 多进程编码
4.2 进程间的通信-Queue
4.3 进程间的通信-Pipe

第一节 GIL和多线程

python进程写锁 python 进程池锁_python进程写锁

1. 1 GIL全局解释器锁

全局解释器锁GIL的全称是Global Interpreter Lock,

python中一个线程对应于c语言中的一个线程
GIL使得同一个时刻只有一个线程在一个cpu上执行字节码, 无法将多个线程映射到多个cpu上执行
GIL会根据执行的字节码行数以及时间片释放gil, gil在遇到io的操作时候主动释放

# dis包可以查看代码运行的 字节码
import dis


def add(a):
    a = a+1
    return a
 
print(dis.dis(add))

GIL的特点：Python在多线程下，每个线程的执行方式为：

获取GIL
执行代码直到sleep或者是python虚拟机将其挂起
释放GIL

一个CPU只能执行一个线程, 例如一个CPU 有三个线程, 首先线程A执行, 然后线程A达到释放条件进行释放GIL, 线程B和线程C进行竞争GIL, 谁抢到GIL, 继续执行.
GIL无法保证线程安全

total = 0

def add():
    global total
    for i in range(1000000):
        total += 1
def desc():
    global total
    for i in range(1000000):
        total -= 1

import threading
thread1 = threading.Thread(target=add)
thread2 = threading.Thread(target=desc)
thread1.start()
thread2.start()

# 阻塞等待线程1和线程2 执行完
thread1.join()
thread2.join()
print(total)

1.2 多线程编程(继承类实现多线程常用)

直接实现多线程。setDaemon(守护线程) join(阻塞线程)

#对于io操作来说，多线程和多进程性能差别不大
#1.通过Thread类实例化

import time
import threading

def get_detail_html(url):
    print("get detail html started")
    time.sleep(2)
    print("get detail html end")


def get_detail_url(url):
    print("get detail url started")
    time.sleep(4)
    print("get detail url end")


if  __name__ == "__main__":
    thread1 = threading.Thread(target=get_detail_html, args=("https://qnhyn.com/{}".format(1),))
    thread2 = threading.Thread(target=get_detail_url, args=("https://qnhyn.com/{}".format(2),))
    # 守护进程当主线程退出的时候， 子线程kill掉
    # thread1.setDaemon(True)
    # thread2.setDaemon(True)
    start_time = time.time()

    thread1.start()
    thread2.start()
	# 设置阻塞线程
    thread1.join()
    thread2.join()
    print("last time: {}".format(time.time()-start_time))

继承Thread类，实现run方法的方式实现多线程(实际开发中比较常见)，适用于代码量多逻辑较为复杂的场景。

class GetDetailHtml(threading.Thread):
    def __init__(self, name):
        super().__init__(name=name)

    def run(self):
        print("get detail html started")
        time.sleep(2)
        print("get detail html end")

class GetDetailUrl(threading.Thread):
    def __init__(self, name):
        super().__init__(name=name)

    def run(self):
        print("get detail url started")
        time.sleep(4)
        print("get detail url end")

if  __name__ == "__main__":
    thread1 = GetDetailHtml("get_detail_html")
    thread2 = GetDetailUrl("get_detail_url")
    start_time = time.time()
    thread1.start()
    thread2.start()

    # 等待线程执行完才执行主线程
    thread1.join()
    thread2.join()

    #当主线程退出的时候， 子线程kill掉
    print ("last time: {}".format(time.time()-start_time))

1.3 线程通信方式- 共享变量

使用一个全局变量, 然后不同线程可以访问并修改这个变量
这里不推荐使用。下面例子其实线程不安全的。pop可能造成数据重复或丢失。
除非我们对锁足够了解，否则不要使用共享变量方式进行进程之间的通信数据方式。

#通过共享变量方式实现线程间共享
import time
import threading

detail_url_list = []
# from chapter11 import variables
# 为甚么不能有下面方式导入 这里有个坑 如果有其他线程对get_url_list进行修改 我们是观察不到的
#from chapter11.variables import detail_url_list
#detail_url_list = variables.detail_url_list

def get_detail_html(detail_url_list):
    #爬取文章详情页
    while True:
        if len(detail_url_list):
            #for url in detail_url_list:
            url = detail_url_list.pop()
            print("get detail html started")
            time.sleep(2)
            print("get detail html end")


def get_detail_url(detail_url_list):
    # 爬取文章列表页
    while True:
        print("get detail url started")
        time.sleep(4)
        for i in range(20):
            detail_url_list.append("http://projectsedu.com/{id}".format(id=i))
        print("get detail url end")


if  __name__ == "__main__":
    thread_detail_url = threading.Thread(target=get_detail_url, args=(detail_url_list,))
    # 因为文章列表页一下可以抓20个url 它肯定比列表详情页执行的快呀 我们可以开多个线程去抓去列表详情页
    # 线程多了 线程之间的切换消耗也比较大
    for i in range(10):
        html_thread = threading.Thread(target=get_detail_html, args=(detail_url_list,))
        html_thread.start()
    start_time = time.time()
    print ("last time: {}".format(time.time()-start_time))

1.4 线程通信方式- Queue

通过queue的方式进行线程间同步 : from queue import Queue
队列的put和get都是线程安全的操作。底部用了deque，在字节码的基础上达到了线程安全。
里面常用的方法：

qsize 获取队列的长度
empty 队列是否为空 (如果为空,get会阻塞)
full 判断是否已经满了 (如果满了,put会阻塞)
get_nowait 和 put_nowait 异步方法不需要等get和put成功后返回。
join queue一直阻塞(主线程) 直到接收到一个task_done的信号(主线程)。比如:我爬取1000条数据主动退出。就可以用这个。

#通过queue的方式进行线程间同步
from queue import Queue


import time
import threading


def get_detail_html(queue):
    #爬取文章详情页
    while True:
        # get方法是一个阻塞的方法 如果队列为空会一直停在这里
        url = queue.get()
        # for url in detail_url_list:
        print("get detail html started")
        time.sleep(2)
        print("get detail html end")


def get_detail_url(queue):
    # 爬取文章列表页
    while True:
        print("get detail url started")
        time.sleep(4)
        for i in range(20):
            queue.put("http://projectsedu.com/{id}".format(id=i))
        print("get detail url end")


#1. 线程通信方式- 共享变量
if __name__ == "__main__":
    # 设置最大值 如果设置的过大可能会对内存使用造成影响
    detail_url_queue = Queue(maxsize=1000)
    thread_detail_url = threading.Thread(target=get_detail_url, args=(detail_url_queue,))
    for i in range(10):
        html_thread = threading.Thread(target=get_detail_html, args=(detail_url_queue,))
        html_thread.start()
    # # thread2 = GetDetailUrl("get_detail_url")
    start_time = time.time()
    thread_detail_url.start()
    # thread_detail_url1.start()
    #
    # thread1.join()
    # thread2.join()

    # 队列阻塞 发送task_done消息继续执行。
    #detail_url_queue.task_done()
    #detail_url_queue.join()

    #当主线程退出的时候， 子线程kill掉
    print ("last time: {}".format(time.time()-start_time))

第二节多线程同步

2.1 线程同步-锁Lock、Rlock

用锁锁住代码段，只能有一个代码段在运行。from threading import Lock
加锁lock.acquire()和释放锁lock.release()

用**锁会影响性能(**加锁和释放都会)
用锁容易造成死锁(可以用Rlock解决)

常见死锁的情况

连续加锁
A（a，b） B(a、b)

Rlock可重入锁。同一个线程里面，可以连续调用多次acquire，一定要注意acquire的次数要和release的次数相等。

from threading import Lock, RLock

#在同一个线程里面，可以连续调用多次acquire， 一定要注意acquire的次数要和release的次数相等
total = 0
lock = RLock()
def add():
    #1. dosomething1
    #2. io操作
    # 1. dosomething3
    global lock
    global total
    for i in range(1000000):
        # 加锁
        lock.acquire()
        lock.acquire()
        total += 1
        # 释放
        lock.release()
        lock.release()


def desc():
    global total
    global lock
    for i in range(1000000):
        lock.acquire()
        total -= 1
        lock.release()

import threading
thread1 = threading.Thread(target=add)
thread2 = threading.Thread(target=desc)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
print(total)

#1. 用锁会影响性能
#2. 锁会引起死锁
#死锁的情况 A（a，b）
"""
A(a、b)
acquire (a)
acquire (b)

B(a、b)
acquire (a)
acquire (b)
"""

2.2 线程同步-Condition条件变量

条件变量， 用于复杂的线程间同步
我们看一下下面的列子(天猫精灵和小爱同学的对话)----这里最重要的是，一言一语。
假如我们采用Lock方式实现。如下

# Lock实现对话 发现不能控制一人一句 而是一下说完。

import threading

#条件变量， 用于复杂的线程间同步
class XiaoAi(threading.Thread):
    def __init__(self, lock):
        super().__init__(name="小爱")
        self.lock = lock

    def run(self):
        self.lock.acquire()
        print("{} : 在 ".format(self.name))
        self.lock.release()

        self.lock.acquire()
        print("{} : 好啊 ".format(self.name))
        self.lock.release()

class TianMao(threading.Thread):
    def __init__(self, lock):
        super().__init__(name="天猫精灵")
        self.lock = lock

    def run(self):

        self.lock.acquire()
        print("{} : 小爱同学 ".format(self.name))
        self.lock.release()

        self.lock.acquire()
        print("{} : 我们来对古诗吧 ".format(self.name))
        self.lock.release()
        
if __name__ == "__main__":
    lock = threading.Lock()
    xiaoai = XiaoAi(lock)
    tianmao = TianMao(lock)

    xiaoai.start()
    tianmao.start()

# 输出如下
小爱 : 在 
小爱 : 好啊
天猫精灵 : 小爱同学
天猫精灵 : 我们来对古诗吧

对于上面情况我们可以使用条件变量Condition的方法来实现小爱和天猫精灵的对话。

Condition实现了__enter__和__exit__方法。可以使用with方法。
Condition重要方法acquire, 调用了Lock的acquire。
Condition重要方法release, 调用了Lock的release.
Condition重要方法wait, 允许我们等待某个条件变量的通知。使用前先用with加锁否则报错。
Condition重要方法notify, 发送通知。
condition有两层锁(底层锁是condition内部的，另一把是调用wait时加上的)， condition内部的底层锁会在线程调用了wait方法的时候释放self._release_save()。
同时每次调用wait的时候分配一把锁并放入到cond的等待队列_waiters中。
等到调用notify方法的从队列中弹出锁并释放。

import threading


#通过condition完成协同读诗
class XiaoAi(threading.Thread):
    def __init__(self, cond):
        super().__init__(name="小爱")
        self.cond = cond

    def run(self):
        with self.cond:
            self.cond.wait()
            print("{} : 在 ".format(self.name))
            self.cond.notify()

            self.cond.wait()
            print("{} : 好啊 ".format(self.name))
            self.cond.notify()

            self.cond.wait()
            print("{} : 君住长江尾 ".format(self.name))
            self.cond.notify()

            self.cond.wait()
            print("{} : 共饮长江水 ".format(self.name))
            self.cond.notify()

            self.cond.wait()
            print("{} : 此恨何时已 ".format(self.name))
            self.cond.notify()

            self.cond.wait()
            print("{} : 定不负相思意 ".format(self.name))
            self.cond.notify()

class TianMao(threading.Thread):
    def __init__(self, cond):
        super().__init__(name="天猫精灵")
        self.cond = cond

    def run(self):
        with self.cond:
            print("{} : 小爱同学 ".format(self.name))
            self.cond.notify()
            self.cond.wait()

            print("{} : 我们来对古诗吧 ".format(self.name))
            self.cond.notify()
            self.cond.wait()

            print("{} : 我住长江头 ".format(self.name))
            self.cond.notify()
            self.cond.wait()

            print("{} : 日日思君不见君 ".format(self.name))
            self.cond.notify()
            self.cond.wait()

            print("{} : 此水几时休 ".format(self.name))
            self.cond.notify()
            self.cond.wait()

            print("{} : 只愿君心似我心 ".format(self.name))
            self.cond.notify()
            self.cond.wait()



if __name__ == "__main__":
    from concurrent import futures
    cond = threading.Condition()
    xiaoai = XiaoAi(cond)
    tianmao = TianMao(cond)

    #启动顺序很重要
    #在调用with cond之后才能调用wait或者notify方法
    #condition有两层锁， 一把底层锁会在线程调用了wait方法的时候释放， 上面的锁会在每次调用wait的时候分配一把并放入到cond的等待队列中，等到notify方法的唤醒
    xiaoai.start()
    tianmao.start()

2.3 线程同步-信号量Semaphore

Semaphore信号量是一个更高级的锁机制，是用于控制进入某段代码的线程数量的锁。

semaphore管理一个内置的计数器，每当调用acquire()时内置计数器-1；调用release() 时内置计数器+1
semaphore控制为3，也就是说，同时有3个线程可以用这个锁，剩下的线程也之只能是阻塞等待。
它的内部实现还是调用了condition

文件写一般只是用于一个线程写，读可以允许有多个

#做爬虫
import threading
import time

class HtmlSpider(threading.Thread):
    def __init__(self, url, sem):
        super().__init__()
        self.url = url
        self.sem = sem

    def run(self):
        time.sleep(2)
        print("got html text success")
        self.sem.release()

class UrlProducer(threading.Thread):
    def __init__(self, sem):
        super().__init__()
        self.sem = sem

    def run(self):
        for i in range(20):
            self.sem.acquire()
            html_thread = HtmlSpider("https://baidu.com/{}".format(i), self.sem)
            html_thread.start()

if __name__ == "__main__":
    sem = threading.Semaphore(3)
    url_producer = UrlProducer(sem)
    url_producer.start()

第三节线程池和进程池

3.1 线程池

from concurrent.futures import Future 这个包是我们做线程池和进程池编程非常容易。而且他们的接口是非常一致的。futures可以让多线程和多进程编码接口一致
使用Seamphore，你创建了多少线程，实际就会有多少线程进行执行，只是可同时执行的线程数量会受到限制。
但使用线程池，你创建的线程只是作为任务提交给线程池执行，实际工作的线程由线程池创建，并且实际工作的线程数量由线程池自己管理。
线程池，为什么要线程池？好想信号量也能完成类似的功能

主线程中可以获取某一个线程的状态或者某一个任务的状态，以及返回值
当一个线程完成的时候我们主线程能立即知道
executor = ThreadPoolExecutor(max_workers=2) 创建线程池
task1 = executor.submit(get_html, (3)) ， submit函数提交执行的函数到线程池中, submit 是立即返回非阻塞的。通过submit可以知道一些状态。

submit返回。

调用done()判定某个任务是否完成
调用cancel()取消掉某个任务

获取已经完成的task任务。

as_completed
executor.map

等待事件发生。wait(all_task, return_when=FIRST_COMPLETED)
源码理解，它的设计理念才是比较重要的。from concurrent.futures import Future

Future又称未来对象，或者task的返回容器

from concurrent.futures import ThreadPoolExecutor, as_completed, wait, FIRST_COMPLETED
from concurrent.futures import Future
from multiprocessing import Pool

#未来对象，task的返回容器


#线程池， 为什么要线程池
#主线程中可以获取某一个线程的状态或者某一个任务的状态，以及返回值
#当一个线程完成的时候我们主线程能立即知道
#futures可以让多线程和多进程编码接口一致
import time

def get_html(times):
    time.sleep(times)
    print("get page {} success".format(times))
    return times



executor = ThreadPoolExecutor(max_workers=2)
#通过submit函数提交执行的函数到线程池中, submit 是立即返回 非阻塞的
# task1 = executor.submit(get_html, (3))
# task2 = executor.submit(get_html, (2))


#要获取已经成功的task的返回
urls = [3,2,4]
all_task = [executor.submit(get_html, (url)) for url in urls]
wait(all_task, return_when=FIRST_COMPLETED)
print("main")
# for future in as_completed(all_task):
#     data = future.result()
#     print("get {} page".format(data))
#通过executor的map获取已经完成的task的值
# for data in executor.map(get_html, urls):
#     print("get {} page".format(data))


# #done方法用于判定某个任务是否完成
# print(task1.done())
# print(task2.cancel())
# time.sleep(3)
# print(task1.done())
#
# #result方法可以获取task的执行结果
# print(task1.result())

3.2 多进程和多线程比较

耗cpu的操作(数学计算和图像处理、挖矿 )，用多进程编程
对于io操作来说，使用多线程编程
进程切换代价要高于线程

# 多线程下 cpu计算 4.288530349731445
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
#多进程编程
#耗cpu的操作，用多进程编程， 对于io操作来说， 使用多线程编程，进程切换代价要高于线程

#1. 对于耗费cpu的操作，多进程优秀于多线程
def fib(n):
    if n<=2:
        return 1
    return fib(n-1)+fib(n-2)

#对于io操作来说，多线程优于多进程
# def random_sleep(n):
#     time.sleep(n)
#     return n

if __name__ == "__main__":
    with ThreadPoolExecutor(3) as executor:
        all_task = [executor.submit(fib, (num)) for num in range(20,35)]
        start_time = time.time()
        for future in as_completed(all_task):
            data = future.result()
            print("exe result: {}".format(data))

        print("last time is: {}".format(time.time()-start_time))

对于多进程而言。

无论使用from concurrent.futures import ProcessPoolExecutor
还是使用import multiprocessing
它一定要写在if __name__ == “__main__”:下否则会报错

# 多进程下 cpu 计算 2.4265103340148926
import time
from concurrent.futures import ProcessPoolExecutor, as_completed
#多进程编程
#耗cpu的操作，用多进程编程， 对于io操作来说， 使用多线程编程，进程切换代价要高于线程
#1. 对于耗费cpu的操作，多进程优秀于多线程
def fib(n):
    if n<=2:
        return 1
    return fib(n-1)+fib(n-2)
    
#对于io操作来说，多线程优于多进程
# def random_sleep(n):
#     time.sleep(n)
#     return n
if __name__ == "__main__":
    with ProcessPoolExecutor(3) as executor:
        all_task = [executor.submit(fib, (num)) for num in range(20,35)]
        start_time = time.time()
        for future in as_completed(all_task):
            data = future.result()
            print("exe result: {}".format(data))

        print("last time is: {}".format(time.time()-start_time))

第四节多进程

4.1 多进程编码

linux下编写，运行后得到结果。

import os
#fork只能用于linux/unix中
pid = os.fork()
print("bobby")
if pid == 0:
  print('子进程 {} ，父进程是： {}.' .format(os.getpid(), os.getppid()))
else:
  print('我是父进程：{}.'.format(pid))

子进程会把父进程的数据完全复制过来。所以它们之间的数据是完全隔离的。
from concurrent.futures import ProcessPoolExecutor 这种方式是我们多进程编程的首选。因为它设计精良，多线程和多进程接口一样。底层调用了multiprocessing
import multiprocessing比较灵活。容易理解多进程编程底层。实际应用中不推荐。

join前先close 否则会报错
multiprocessing中的进程池multiprocessing.Pool
imap

import multiprocessing

#多进程编程
import time
def get_html(n):
    time.sleep(n)
    print("sub_progress success")
    return n


if __name__ == "__main__":
    progress = multiprocessing.Process(target=get_html, args=(2,))
    print(progress.pid)
    progress.start()
    print(progress.pid)
    progress.join()
    print("main progress end")

    # 使用线程池 进程数==cpu数量效率最高。
    # pool = multiprocessing.Pool(multiprocessing.cpu_count())
    # result = pool.apply_async(get_html, args=(3,))
    #
    # #等待所有任务完成
    # pool.close()
    # pool.join()
    #
    # print(result.get())

    #imap
    # for result in pool.imap(get_html, [1,5,3]):
    #     print("{} sleep success".format(result))

    # for result in pool.imap_unordered(get_html, [1,5,3]):
    #     print("{} sleep success".format(result))

4.2 进程间的通信-Queue

如果我们直接从from queue import Queue，这种方式不能用于进程。
但是**from multiprocessing import Process, Queue，**这里的Queue可以用来进程间数据共享。而且它的接口和上面的Queue几乎一样。

import time
from multiprocessing import Process, Queue
# from queue import Queue

def producer(queue):
    queue.put("a")
    time.sleep(2)

def consumer(queue):
    time.sleep(2)
    data = queue.get()
    print(data)

if __name__ == "__main__":
    queue = Queue(10)
    my_producer = Process(target=producer, args=(queue,))
    my_consumer = Process(target=consumer, args=(queue,))
    my_producer.start()
    my_consumer.start()
    my_producer.join()
    my_consumer.join()

共享变量方式不能用于我们的多进程编程之间的数据共享。可以适用于多线程

# 共享变量 发现a 在每个进程中相互隔离 不会各自影响
import time
from multiprocessing import Process


def producer(a):
    a += 100
    time.sleep(2)

def consumer(a):
    time.sleep(2)
    print(a)

if __name__ == "__main__":
    a = 1
    my_producer = Process(target=producer, args=(a,))
    my_consumer = Process(target=consumer, args=(a,))
    my_producer.start()
    my_consumer.start()
    my_producer.join()
    my_consumer.join()

multiprocessing中的queue不能用于pool进程池, 下面代码会报错的。
pool中的进程间通信需要使用manager中的queue

import time
from multiprocessing import Process, Queue, Pool, Manager


def producer(queue):
    queue.put("a")
    time.sleep(2)

def consumer(queue):
    time.sleep(2)
    data = queue.get()
    print(data)

if __name__ == "__main__":
    # queue = Queue(10) # 它没有输出
    queue = Manager().Queue(10)
    pool = Pool(2)

    pool.apply_async(producer, args=(queue,))
    pool.apply_async(consumer, args=(queue,))

    pool.close()
    pool.join()

4.3 进程间的通信-Pipe

通过pipe实现进程间通信, pipe只能适用于两个进程间的通信
pipe的性能高于queue因为queue的实现中加了很多锁。
Manager中有很多用于进程间同步的方法，和线程同步使用一样。可以自己试一下。

import time
from multiprocessing import Process, Queue, Pool, Manager, Pipe


def add_data(p_dict, key, value):
    p_dict[key] = value

if __name__ == "__main__":
    progress_dict = Manager().dict()
    from queue import PriorityQueue

    first_progress = Process(target=add_data, args=(progress_dict, "bobby1", 22))
    second_progress = Process(target=add_data, args=(progress_dict, "bobby2", 23))

    first_progress.start()
    second_progress.start()
    first_progress.join()
    second_progress.join()

    print(progress_dict)

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：java怎样加密能使数据编码串变短 java加密字符串程序

下一篇：数据库表频繁更新会导致查询索引失效吗数据库更新一条记录

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

python进程写锁 python 进程池 锁

python进程写锁 python 进程池 锁

第十一课 python进阶多线程、多进程和线程池编程

文章目录

第一节 GIL和多线程

1. 1 GIL全局解释器锁

1.2 多线程编程(继承类实现多线程常用)

1.3 线程通信方式- 共享变量

1.4 线程通信方式- Queue

第二节 多线程同步

2.1 线程同步-锁Lock、Rlock

2.2 线程同步-Condition条件变量

2.3 线程同步-信号量Semaphore

第三节 线程池和进程池

3.1 线程池

3.2 多进程和多线程比较

第四节 多进程

4.1 多进程编码

4.2 进程间的通信-Queue

4.3 进程间的通信-Pipe

51CTO博客

python进程写锁 python 进程池锁

python进程写锁 python 进程池锁

第二节多线程同步

第三节线程池和进程池

第四节多进程