python多进程结合多线程 python3多进程和多线程

转载

mob6454cc73e9a6 2024-04-16 20:36:21

文章标签 python多进程结合多线程子进程 python 多进程 文章分类 Python 后端开发

文章目录

进程和线程
多进程

os基础
multiprocessing
Pool
子进程
进程间通信
小结

多线程

创建实例
lock
多核CPU
小结

进程和线程比较
参考文档

进程和线程

为了实现多任务，我们采用多进程or多线程的模式。多任务的实现方式有3种：

多进程模式
多线程模式
多进程+多线程模式
在开启多任务的时候，有时候各个任务之间应该是相互通信和协调的，并不是各自封闭的。所以多进程和多线程的程序编写难度比较高。

多进程

os基础

Unix/Linux操作系统提供了一个fork()调用，与普通函数不同的是，这个函数被调用一次返回两次，因为操作系统自动把当前进程复制了一份（分别称为父进程和子进程），分别在父进程和子进程内返回。子进程永远返回0，而父进程返回子进程的ID，因为父进程可以fork出很多子进程，所以父进程要记录下每个子进程的ID，而子进程只需要使用getppid()就可以拿到父进程的ID。
Python的OS模块封装了常用的系统调用，可以在Python程序中创建子进程：

#!/usr/bin/python3
# coding=utf-8
import os
print('Process (%d) start...'%os.getppid())
pid = os.fork()
if pid == 0:
    print('child process (%d) and parent is (%d)'%(os.getpid(),os.getppid()))
else:
    print('I (%s) just created a child process (%s).'%(os.getpid(),pid))

输出结果

Process (51147) start…
I (13491) just created a child process (13492).
child process (13492) and parent is (13491)

multiprocessing

multiprocessing是一个跨平台版本的多进程模块。multiprocessing模块提供了一个process类来代表一个进程对象。下面演示一个启动一个子进程并等待其结束的例子：

#!/usr/bin/python3
# coding=utf-8
import os
from multiprocessing import Process 
def run_proc(name):
    print('Run child process %s (%s)...'%(name, os.getpid()))
if __name__ == "__main__":
    print('run child process %s.'%os.getpid())
    p = Process(target=run_proc,args=('test',))
    print('child process will start')
    p.start()
    p.join()
    print('child process end')

输出结果：

run child process 1280.
child process will start
Run child process test (1281)…
child process end

创建子进程时，只需要传入一个执行函数和函数的参数，创建一个Process实例，用start()方法启动，这样创建进程比fork()还要简单。
join()方法可以等待子进程结束后再继续往下运行，通常用于进程间的同步。

Pool

如果要启动大量的子进程，可以用进程池的方式批量创建子进程：

#!/usr/bin/python3
# coding=utf-8
import os
from multiprocessing import Pool
import os, time, random
def long_time_task(name):
    print('Run task %s (%s)...' % (name, os.getpid()))
    start = time.time()
    time.sleep(random.random() * 3)
    end = time.time()
    print('Task %s runs %0.2f seconds.' % (name, (end - start)))

if __name__=='__main__':
    print('Parent process %s.' % os.getpid())
    p = Pool(4)
    for i in range(5):
        p.apply_async(long_time_task, args=(i,))
    print('Waiting for all subprocesses done...')
    p.close()
    p.join()
    print('All subprocesses done.')

输出结果：

Parent process 2728.
Waiting for all subprocesses done…
Run task 0 (2729)…
Run task 1 (2730)…
Run task 2 (2731)…
Run task 3 (2732)…
Task 3 runs 0.34 seconds.
Run task 4 (2732)…
Task 4 runs 0.18 seconds.
Task 0 runs 2.40 seconds.
Task 1 runs 2.42 seconds.
Task 2 runs 2.59 seconds.
All subprocesses done.

对Pool对象调用join()方法会等待所有子进程执行完毕，调用join()之前必须先调用close()，调用close()之后就不能继续添加新的Process了。

请注意输出的结果，task 0，1，2，3是立刻执行的，而task 4要等待前面某个task完成后才执行，受限于操作系统硬件，所以最多同时执行4个进程。

子进程

子进程可以是自身，也可以是一个外部进程。我们创建了子进程后，还需要控制子进程的输入和输出。
subprocess模块可以让我们非常方便地启动一个子进程，然后控制其输入和输出。
下面的例子演示了如何在Python代码中运行命令nslookup www.python.org，这和命令行直接运行的效果是一样的：

#!/usr/bin/python3
# coding=utf-8
import subprocess
print('$ nslooup www.python.org')
r = subprocess.call(['nslookup','www.python.org'])
print('exit code:',r)

输出结果：

$ nslooup www.python.org Server: 192.168.1.1
Address: 192.168.1.1#53
Non-authoritative answer:
www.python.org canonical name = dualstack.python.map.fastly.net.
Name: dualstack.python.map.fastly.net Address: 151.101.76.223
exit code: 0

如果子进程需要输入，则可用communication()方法

import subprocess

print('$ nslookup')
p = subprocess.Popen(['nslookup'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
output, err = p.communicate(b'set q=mx\npython.org\nexit\n')
print(output.decode('utf-8'))
print('Exit code:', p.returncode)

输出结果：

$ nslooup www.python.org Server: 192.168.1.1
Address: 192.168.1.1#53
Non-authoritative answer:
www.python.org canonical name = dualstack.python.map.fastly.net.
Name: dualstack.python.map.fastly.net Address: 151.101.76.223
exit code: 0

进程间通信

Process之间肯定是需要通信的，操作系统提供了很多机制来实现进程间的通信。Python的multiprocessing模块包装了底层的机制，提供了Queue、Pipes等多种方式来交换数据。

我们以Queue为例，在父进程中创建两个子进程，一个往Queue里写数据，一个从Queue里读数据：

#!/usr/bin/python3
# coding=utf-8
from multiprocessing import Process, Queue
import os, time, random
# 写数据进程执行的代码:
def write(q):
    print('Process to write: %s' % os.getpid())
    for value in ['A', 'B', 'C']:
        print('Put %s to queue...' % value)
        q.put(value)
        time.sleep(random.random())

# 读数据进程执行的代码:
def read(q):
    print('Process to read: %s' % os.getpid())
    while True:
        value = q.get(True)
        print('Get %s from queue.' % value)

if __name__=='__main__':
    # 父进程创建Queue，并传给各个子进程：
    q = Queue()
    pw = Process(target=write, args=(q,))
    pr = Process(target=read, args=(q,))
    # 启动子进程pw，写入:
    pw.start()
    # 启动子进程pr，读取:
    pr.start()
    # 等待pw结束:
    pw.join()
    # pr进程里是死循环，无法等待其结束，只能强行终止:
    pr.terminate()

输出结果：

Process to write: 4557
Put A to queue…
Process to read: 4558
Get A from queue.
Put B to queue…
Get B from queue.
Put C to queue…
Get C from queue.

小结

在Unix/Linux下，可以使用fork()调用实现多进程。
要实现跨平台的多进程，可以使用multiprocessing模块。
进程间通信是通过Queue、Pipes等实现的。

多线程

多任务可以由多进程完成，也可以由一个进程内的多线程完成。
由于线程是操作系统直接支持的执行单元，因此，高级语言通常都内置多线程的支持，Python也不例外，并且，Python的线程是真正的Posix Thread，而不是模拟出来的线程。

Python的标准库提供了两个模块：_thread和threading，_thread是低级模块，threading是高级模块，对_thread进行了封装。绝大多数情况下，我们只需要使用threading这个高级模块。

创建实例

启动一个线程就是把一个函数传入并创建Thread实例，然后调用start()开始执行：

#!/usr/bin/python3
# coding=utf-8
import time, threading

# 新线程执行的代码:
def loop():
    print('thread %s is running...' % threading.current_thread().name)
    n = 0
    while n < 5:
        n = n + 1
        print('thread %s >>> %s' % (threading.current_thread().name, n))
        time.sleep(1)
    print('thread %s ended.' % threading.current_thread().name)

print('thread %s is running...' % threading.current_thread().name)
t = threading.Thread(target=loop, name='LoopThread')
t.start()
t.join()
print('thread %s ended.' % threading.current_thread().name)

输出结果：

thread MainThread is running…
thread LoopThread is running…
thread LoopThread >>> 1
thread LoopThread >>> 2
thread LoopThread >>> 3
thread LoopThread >>> 4
thread LoopThread >>> 5
thread LoopThread ended.
thread MainThread ended.

任何进程默认启动一个线程，我们把该线程称为主线程，主线程又可以启动新的线程，Python的threading模块有个current_thread()函数，它永远返回当前线程的实例。主线程实例的名字叫MainThread，子线程的名字在创建时指定，我们用LoopThread命名子线程。名字仅仅在打印时用来显示，完全没有其他意义，如果不起名字Python就自动给线程命名为Thread-1，Thread-2……

lock

多进程和多线程的区别在于变量的作用范围，多进程同一个变量在不同进程中都有一个备份，而多线层是共享同一个变量。所以多线程存在一个很危险的问题是多个线程同时修改变量。

如下题，如果我们要确保balance计算正确，就要给change_it()上一把锁，当某个线程开始执行change_it()时，我们说，该线程因为获得了锁，因此其他线程不能同时执行change_it()，只能等待，直到锁被释放后，获得该锁以后才能改。由于锁只有一个，无论多少线程，同一时刻最多只有一个线程持有该锁，所以，不会造成修改的冲突。创建一个锁就是通过threading.Lock()来实现：

#!/usr/bin/python3
# coding=utf-8
import time, threading

# 假定这是你的银行存款:
balance = 0
lock = threading.Lock()

def run_thread(n):
    for i in range(100000):
        # 先要获取锁:
        lock.acquire()
        try:
            # 放心地改吧:
            change_it(n)
        finally:
            # 改完了一定要释放锁:
            lock.release()


def change_it(n):
    # 先存后取，结果应该为0:
    global balance
    balance = balance + n
    balance = balance - n

def run_thread(n):
    for i in range(100000):
        change_it(n)

t1 = threading.Thread(target=run_thread, args=(5,))
t2 = threading.Thread(target=run_thread, args=(8,))
t1.start()
t2.start()
t1.join()
t2.join()
print(balance)

输出结果

0

当多个线程同时执行lock.acquire()时，只有一个线程能成功地获取锁，然后继续执行代码，其他线程就继续等待直到获得锁为止。
获得锁的线程用完后一定要释放锁，否则那些苦苦等待锁的线程将永远等待下去，成为死线程。所以我们用try…finally来确保锁一定会被释放。
锁的优势在于可以使代码完整地执行下去，但是坏处是不能并发执行，而且当锁非常多的时候，容易造成死锁。

多核CPU

很荣幸的是，我电脑4核的，那么，我来写个死循环看看电脑CPU占用率叭！

import threading, multiprocessing

def loop():
    x = 0
    while True:
        x = x ^ 1

for i in range(multiprocessing.cpu_count()):
    t = threading.Thread(target=loop)
    t.start()

快乐！跑满一核的快乐嘤嘤嘤？

python多进程结合多线程 python3多进程和多线程_多进程

小结

多线程编程，模型复杂，容易发生冲突，必须用锁加以隔离，同时，又要小心死锁的发生。
Python解释器由于设计时有GIL全局锁，导致了多线程无法利用多核。
ython解释器执行代码时，有一个GIL锁：Global Interpreter Lock，任何Python线程执行前，必须先获得GIL锁，然后，每执行100条字节码，解释器就自动释放GIL锁，让别的线程有机会执行。这个GIL全局锁实际上把所有线程的执行代码都给上了锁，所以，多线程在Python中只能交替执行，即使100个线程跑在100核CPU上，也只能用到1个核。
Python虽然不能利用多线程实现多核任务，但可以通过多进程实现多核任务。多个Python进程有各自独立的GIL锁，互不影响。

进程和线程比较

多任务的实现一般采用master-worker模式，master负责分配任务，worker负责执行任务。一般多线程/多进程中，主线程/主进程是master，而其他线程/进程就是worker。
多进程模式最大的优点是稳定性高。主进程不容易挂，子进程挂了一个不影响其他进程。最大的缺点是创建进程的开销大，Unix/Linux下用fork()的时候问题还不大，但是Windows下成本就比较高了。而且，受内存和CPU限制，如果进程太多系统调用都成问题。
线程切换的代价高：保存现场、恢复现场等。