python threading 并发数 python的并发

转载

网猴儿 2023-07-10 19:33:19

文章标签 主线程多线程子线程 文章分类 Python 后端开发

1，并发基本概念

并发和并行

并发：几个CPU可以做一大堆事
并行：几个CPU只能做几件事，真正同时运行

进程/线程/协程

进程：资源分配的最小单位，独立内存
线程：CPU调度的最小单位，共享内存，切换比进程快
协程：多协程只使用一个线程（CPU感知不到协程），规定代码块的执行顺序，进程/线程的调度由操作系统来决定，切换耗时较大

进程/线程/协程实现服务器的并发

多进程：实现简单，开销大性能差。每收到一个请求，创建一个新的进程，例如ForkingTCPServer。
多线程：涉及线程同步，可能有死锁。每收到一个请求，创建一个新的线程，例如ThreadingTCPServer。
协程：实现复杂，性能强大。每收到一个请求，放入一个事件列表，让主进程通过非阻塞I/O方式来处理请求，例如nginx。

Python的GIL

多进程处理CPU密集型，线程/协程处理I/O密集型。

Python有GIL，同一时刻只能有一个线程执行Python字节码，即一个Python进程只能使用一个CPU，即使开了多线程也只能使用一个CPU。但是Python标准库中所有执行阻塞型I/O操作的函数，在等待操作系统返回结果时都会释放GIL，意味着这个层次可以使用多线程，可以创建数以千计的线程处理I/O密集型。也就是说GIL只会对CPU密集型的程序产生影响，规避GIL限制主要有两种常用策略：一是使用多进程，二是使用C语言扩展，把计算密集型的任务转移到C语言中，使其独立于Python，在C代码中释放GIL。

多进程，Python多进程可以应付CPU密集型，其他语言中多线程也可以解决CPU密集？

多线程也能处理I/O密集型，为什么还要有协程呢？详见流畅的Python P448，协程有很多优点，减少线程间切换，做好全方位保护，自身会同步。。。。其他语言中，协程意义不大，因为多线程可以解决I/O阻塞？

IO操作（引自廖雪峰blog）

在IO编程一节中，我们已经知道，CPU的速度远远快于磁盘、网络等IO。在一个线程中，CPU执行代码的速度极快，然而，一旦遇到IO操作，如读写文件、发送网络数据时，就需要等待IO操作完成，才能继续进行下一步操作。这种情况称为同步IO。

在IO操作的过程中，当前线程被挂起，而其他需要CPU执行的代码就无法被当前线程执行了。

因为一个IO操作就阻塞了当前线程，导致其他代码无法执行，所以我们必须使用多线程或者多进程来并发执行代码，为多个用户服务。每个用户都会分配一个线程，如果遇到IO导致线程被挂起，其他用户的线程不受影响。

多线程和多进程的模型虽然解决了并发问题，但是系统不能无上限地增加线程。由于系统切换线程的开销也很大，所以，一旦线程数量过多，CPU的时间就花在线程切换上了，真正运行代码的时间就少了，结果导致性能严重下降。

由于我们要解决的问题是CPU高速执行能力和IO设备的龟速严重不匹配，多线程和多进程只是解决这一问题的一种方法。

另一种解决IO问题的方法是异步IO。当代码需要执行一个耗时的IO操作时，它只发出IO指令，并不等待IO结果，然后就去执行其他代码了。一段时间后，当IO返回结果时，再通知CPU进行处理。

同步和异步

进程/线程：同步机制，这里同步指的是在单个进程/线程内是同步的，同步意味着可能会被阻塞
协程/回调：异步机制，异步可以不被阻塞

五种IO模式

阻塞I/O
非阻塞I/O
I/O复用：select，poll，epoll。epoll优点：没有最大文件数限制。Nginx、twisted使用epoll，Nginx 1G内存支持10W个连接。windows不支持epoll。一般只有做游戏或特别复杂爬虫可能用到。
信号驱动I/O
异步I/O：asyncio

同步IO（阻塞IP，非阻塞IO，I/O复用）：数据准备好以后，用户还得read一下，即还需要等待内核态到用户态的转变，可能会卡

异步IO：完全不用等内核态/用户态转变

2，多线程 - threading

2.1，多线程的实现

主要用到：

threading.current_thread() ：打印当前线程，看下是主线程还是子线程
threading.active_count() : 看下当前活跃的线程个数
t = threading.Thread(target, args)：线程类的实例化
t.start()：启动线程
t.join()：连接线程，主线程连接t线程后，会等待该线程结束

开启多线程，执行如下代码：

st = time.time()

def run(n):
    time.sleep(2)
    print('线程{}：{}'.format(n, threading.current_thread()))

#### 创建子线程并运行 ####
ts = []
for i in range(6):
    t = threading.Thread(target=run, args=(i,))
    t.start()
    ts.append(t)

#### 打印当前线程数 ####
print('当前线程数：', threading.active_count())

#### 将主线程"连接"到所有子线程，等待所有子线程完成后再往下运行 ####
for t in ts:
    t.join()

#### 主线程打印输出 ####
print('主线程：', threading.current_thread()) 
print('耗时:', time.time() - st)

整体耗时只有2s，如果是串行执行run需要耗时2*6=12s，输出结果为：

当前线程数： 1 # 子线程已经运行结束，所以只看到1个线程，放在join之前打印会显示7个
线程4：<Thread(Thread-5, started 29768)>
线程5：<Thread(Thread-6, started 11404)>
线程2：<Thread(Thread-3, started 8368)>
线程1：<Thread(Thread-2, started 19244)>
线程0：<Thread(Thread-1, started 1120)>
线程3：<Thread(Thread-4, started 10296)>
主线程： <_MainThread(MainThread, started 26604)>
耗时: 2.0163538455963135

注意点：

如果主线程不join到子线程，则主线程不等子线程执行完毕就会打印输出，会看到耗时只有0.0s。

如果主线程在t.start()后面仅跟着t.join()，会变成串行执行多个线程。

2.2，守护线程 - daemon

设置守护线程方法：

t = threading.Thread(target=run, args=(1,), daemon=True) # 实例化时创建
t.setDaemon(True) # 实例化后创建，但必须要在t.start之前，否则会报错

如果主线程不join子线程，虽然主线程不等待子线程执行完毕就先打印了，但是主线程还是会等待子线程得到执行后，才最终结束：

st = time.time()
def run(n):
    time.sleep(2)
    print('线程{}：{}'.format(n, threading.current_thread()))

t = threading.Thread(target=run, args=(1,))
t.start()

print('主线程：', threading.current_thread())
print('当前线程数：', threading.active_count())
print('耗时:', time.time() - st)

运行结果：

主线程： <_MainThread(MainThread, started 25672)>
当前线程数： 2
耗时: 0.0
线程1：<Thread(Thread-1, started 21288)>   # 主线程虽然先完成了上述打印，但还是等待子线程运行完毕后才结束

如果设置了守护线程，则主线程不会等待子线程运行完毕后，就先行结束，同时主线程结束后daemon线程会自动销毁：

st = time.time()
def run(n):
    time.sleep(2)
    print('线程{}：{}'.format(n, threading.current_thread()))

t = threading.Thread(target=run, args=(1,), daemon=True)   # 设置daemon线程方法一
# t.setDaemon(True)    # 设置daemon线程方法二
t.start()

print('主线程：', threading.current_thread())
print('当前线程数：', threading.active_count())
print('耗时:', time.time() - st)

运行结果：

主线程： <_MainThread(MainThread, started 25672)>
当前线程数： 2
耗时: 0.0

备注：python cookbook称daemon线程无法被join()，但python3.6.4中实验是可以被join()的。

3，多线程 - concurrent.futures.ThreadPoolExecutor()

futures.ThreadPoolExecutor(max_workers)：max_workers：默认最大线程40个，以8核CPU为例，执行90个任务需耗时6s，如果设定最大线程100个，则执行90个任务需耗时2s。

3.1，map

执行相同的函数，依次返回结果：

st = time.time()

def run(n):
    time.sleep(2)
    print('线程{}：{}'.format(n, threading.current_thread()))
    return n

with futures.ThreadPoolExecutor() as executor:   # 创建executor
    results = executor.map(run, range(6))        # 获取子线程执行结果，存入results

print(list(results))        # [0, 1, 2, 3, 4, 5]
print(time.time() - st)     # 2.0158472061157227

创建excutor以及获取子线程执行结果results，如果不使用with...as...：

executor = futures.ThreadPoolExecutor()   
results = executor.map(run, range(6))

传多个参数：

def run(m, n):
    time.sleep(1)
    print('线程{}运行结果是:{}\n'.format(threading.current_thread(), m + n))


with futures.ThreadPoolExecutor() as executor:
    executor.map(run, (i for i in range(10, 16)), (j for j in range(0, 6)))

3.2，submit + as_completed

可以执行不同的函数，并且先结束的先返回结果

st = time.time()

def run1(n):
    time.sleep(2)
    print('线程{}：{}'.format(n, threading.current_thread()))
    return n

def run2(n):
    time.sleep(4)
    print('线程{}：{}'.format(n, threading.current_thread()))
    return n

with futures.ThreadPoolExecutor() as executor:  # 创建executor
　　 do = [executor.submit(run1, i) for i in range(3)] + [executor.submit(run2, i) for i in range(3, 6)]
　　 results = [i.result() for i in futures.as_completed(do)] # 获取返回结果，存入results

print(list(results))      # [1, 0, 2, 3, 5, 4]，前3个先返回
print(time.time() - st)   # 4.016391754150391

创建excutor以及获取results，如果不使用with...as...：

executor = futures.ThreadPoolExecutor()  
do = [executor.submit(run1, i) for i in range(3)] + [executor.submit(run2, i) for i in range(3, 6)]
results = [i.result() for i in futures.as_completed(do)]

3.3，多线程并发爬取某网站图片举例

某网页关于某主题可能有N个(N<40)图片，图片命名有规律，假如下载图片需1s，串行执行就是N秒，多线程只需要1s：

def get_jpg(jpg_path, save_path, fname):  # 获取单个jpg
    # jpg_path:图片路径，save_path:保存路径，fname:图片名称
    if not os.path.exists('{}/{}.jpg'.format(save_path, fname)):  # 已经存在就不用下载了
        headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}
        resp = requests.get(jpg_path, headers=headers, stream=True)
        if resp.status_code == 200:
            with open('{}/{}.jpg'.format(save_path, fname), 'wb') as f:
                for chunk in resp:
                    f.write(chunk)

def get_jpgs(content, save_path, fname):   # 获取多个jpg
    # content：网页内容，save_path：保存路径，fname：图片名称
    select = etree.HTML(content)
    jpgs = select.xpath('//div[@class="photo-frame"]/img/@src')
    jpgs = [i for i in jpgs if '-' in i]       # 需要抓取图片路径中有‘-’的

    if jpgs:
        args = [[] for i in range(3)]
        for i, jpg in enumerate(jpgs):
            fname_new = '{}-{}'.format(fname, i)
            jpg_path = '{}jp-{}'.format(*jpg.split('-'))
            args[0].append(jpg_path)
            args[1].append(save_path)
            args[2].append(fname_new)

        with ThreadPoolExecutor() as excutor:
            res = excutor.map(get_jpg, *args)  # map传参时需要注意，有n个参数就传n个列表
        [r for r in res]                       # 执行

4，线程间通信

python cookbook：简单的(Event，Semaphore，Condition)，复杂的(Queue，actor模式)

本章内容主要来自老男孩课件

4.1，threading.Lock()

python2.x中的用户锁：有了GIL还是可能有资源同时修改的情况，GIL每100多次重新释放？ — 解决方法是真正执行加减时串行，再加一层用户锁

python3上加不加用户锁都默认有锁，但是还是建议加，因为python3官方没有声明默认加锁？

python3中没有这个问题，不用考虑加用户锁，用户锁会被程序变串行

lock = threading.Lock()
def run(n)
    lock.acuqire()
    global num
    num += 1
    time.sleep(1)  # 可以看到程序编程串行了，50个线程就等50秒
    lock.release()

4.2，递归锁

一般用不到，多把锁把程序锁死，这时候就需要用到递归锁

4.3，互斥锁mutex

信号量，同一时间允许n个线程同时修改数据，简单理解信号量同时有多把锁

threading.BoundedSemaphore(5) # 最多允许5个线程

应用场景：可以同一时间只放多少个连接

4.4，threading.Event()

常用方法：

event = threading.Event() # 生成事件
event.set()               # 设置一个标志位
event.clear()             # 清空标志位
event.wait()              # 等待标志位被设定
event.is_set()            # 判断标志位是否被设定

举例，event实现红绿灯：

event = threading.Event()

def light():
    event.set() # 开始时要设置下标志位，一开始是绿灯
    count = 0
    while True:
        if count > 20 and count < 31:
            event.clear()
            print('\033[41;1m 红灯 \033[0m')
        elif count > 30:
            event.set()
            print('\033[42;1m 绿灯 \033[0m')
            count = 0
    else:
        print('\033[42;1m 绿灯 \033[0m')
        time.sleep(1)
        count += 1

def car():
    while True:
        if event.is_set(): # 设置标志位，表示绿灯
            print('绿灯行')
            time.sleep(1)
        else:
            print('红灯停')
            event.wait()

light = threading.Thread(target=light)
light.start()
car = threading.Thread(target=car)
car.start()

4.5，threading.Condition()

4.6，threading.Semaphore()

4.7，queue.Queue()

2个作用：解耦（使得排队方和处理方没有关联关系，即低耦合），提高允许效率

队列可以简单理解为一个有顺序的容器

列表/元组有序，字典无序

列表取出数据后，还在列表里；队列取出数据后，不在队列中

队列方法：

class queue.Queue(maxsize=0) #先入先出FIFO
class queue.LifoQueue(maxsize=0) #后入先出LIFO
class queue.PriorityQueue(maxsize=0) #存储数据时可设置优先级的队列，根据优先级绝对出去的顺序
Queue.qsize()
Queue.empty() #return True if empty
Queue.full() # return True if full
Queue.put(item, block=True, timeout=None)  # block队列满时抛不抛异常
Queue.put_nowait(item)
Queue.get(block=True, timeout=None)   # block=True默认就是卡住，False不卡住，timeout是卡住的时间
Queue.get_nowait()
Queue.task_done()
Queue.join()    # block直到queue被消费完毕

例如：

q1 = queue.Queue()
q1.put('d1')
q1.put('d2')
q1.get()  # 输出d1
q1.get()  # 输出d2
q1.get()  # 会卡住，等在那

卡住的解决方法：

1）q1.get_nowait() # 不会卡住，如果取不到会抛出异常

2）用qsize判断后再取

队列应用实例：生产者-消费者模型

import queue
import threading
q = queue.Queue()
def Producer(name):
    for i in range(10):
    q.put('骨头%s'%i)
def Consumer(name):
    while q.qsize() > 0:
        print('%s吃%s'%(name, q.get()))
        p = threading.Thread(target=Producer, args=('生产者', ))  # 这里必须生产者后面有逗号，不然报错
        c = threading.Thread(target=Consumer, args=('消费者', ))
p.start()
c.start()

4.8，actor模式

5，多进程 - multiprocessing

5.1，多进程的实现

基本与多线程的实现相同。

multiprocessing.current_process() ：打印当前进程，看下是主进程还是子进程
multiprocessing.active_children(): 看下当前活跃的子进程列表
p = multiprocessing.Process(target, args)：进程类的实例化
p.start()：启动子进程
p.join()：连接子进程，主进程连接p进程后，会等待该进程结束
os.getppid()：获取父进程id
os.getpid()：获取自己的进程id

必须在if __name__ == '__main__':下执行。

def run(n):
    time.sleep(2)
    print('进程{}：{}'.format(n, multiprocessing.current_process()))

if __name__ == '__main__':
    st = time.time()

    #### 创建子进程并运行 ####
    ps = []
    for i in range(6):
        p = multiprocessing.Process(target=run, args=(i, ))
        p.start()
        ps.append(p)

    #### 打印当前子进程列表 ####
    print('当前子进程列表：', multiprocessing.active_children())

    #### 将主进程"连接"到所有子进程，等待所有子进程完成后再往下运行 ####
    for p in ps:
        p.join()

    #### 主进程打印输出 ####
    print('主进程：', multiprocessing.current_process())
    print('耗时:', time.time() - st)

5.2，进程池

apply：阻塞式，同步执行，单进程，和串行执行没什么区别，官方建议废除apply

def run(n):
    time.sleep(2)
    print('进程{}：{}'.format(n, multiprocessing.current_process()))

if __name__ == '__main__':
    st = time.time()

    #### 依次创建和调用6个进程 ####
    pool = multiprocessing.Pool(6)  
    for i in range(6):
        pool.apply(run, (i, ))

    #### 主进程打印输出 ####
    print('主进程：', multiprocessing.current_process())
    print('耗时:', time.time() - st)   # 串行执行，共耗时12s

apply_async：非阻塞式，异步执行，多进程可同时执行

def run(n):
    time.sleep(2)
    print('进程{}：{}'.format(n, multiprocessing.current_process()))

if __name__ == '__main__':
    st = time.time()

    #### 创建和调用6个进程 ####
    pool = multiprocessing.Pool(3)
    for i in range(3):
        pool.apply_async(run, (i, ))
    pool.close()
    pool.join()

    #### 主进程打印输出 ####
    print('主进程：', multiprocessing.current_process())
    print('耗时:', time.time() - st)  # 并行执行，共耗时2s

进程池的回调举例，把进程需要写入文件的内容作为返回值返回给汇合的回调函数，使用回调函数向文件中写入内容。下例将12345678写入123.txt内容：

def mycallback(x):
    with open('123.txt', 'a+') as f:
        f.write(str(x))
    time.sleep(2)

def fun(num):
    return num

if __name__ == '__main__':
    st = time.time()
    pool = multiprocessing.Pool()
    
    #### 创建和调用6个进程 ####
    for i in range(8):
        pool.apply_async(fun, args=(i,), callback=mycallback)  # 回调函数汇总后执行写入
    pool.close() 
    pool.join()   # 注意必须先close再join


    #### 主进程打印输出 ####
    print('主进程：', multiprocessing.current_process())  
    print('耗时:', time.time() - st)  # 共耗时16s

6，多进程 - concurrent.futures.ProcessPoolExecutor()

参考多线程 - concurrent.futures.ThreadPoolExecutor()，只需要改为futures.ProcessPoolExecutor()。

必须在if __name__ == '__main__':下执行。

def run(n):
    time.sleep(2)
    print('进程{}：{}'.format(n, multiprocessing.current_process()))
    return n

if __name__ == '__main__':
    st = time.time()
    with futures.ProcessPoolExecutor(max_workers=8) as executor:  # 创建executor
        results = executor.map(run, range(6))  # 获取子进程执行结果，存入results
    print(list(results))     # [0, 1, 2, 3, 4, 5]
    print(time.time() - st)  # 2.3747103214263916

进程里执行线程的简单例子例子一，来自老男孩python

def thread_run():
    print(threading.get_ident())  # 打印线程号

def run(name):
    time.sleep(2)
    print('hello', name)
    t = threading.Thread(target=thread_run,)
    t.start()

if __name__ == '__main__':
    for i in range(10):
        p = multiprocessing.Process(target=run, args=('bob',))
        p.start()

进程里执行线程的简单例子例子二，来自老男孩python

def info(title):
    print(title)
    print('module name', __name__)
    print('parent process', os.getppid())
    print('process id', os.getpid())

def f(name):
    info('\033[31;1m called from process function f \033[0m')  # 子进程里启动info，会显示父进程号16346
    print('hello', name)

if __name__ == '__main__':
    info('\033[32;1m main process line \033[0m')  # 主进程里启动info，会显示进程号16346
    p = Process(target=f, args=('bob',))
    p.start()

7，进程间通信

来自老男孩python

线程间交互要考虑加锁，进程间交互不用考虑加锁，因为进程间共享数据实际上是做了拷贝，不是同一份数据。必须找一个翻译，帮助进程间交互。

7.1，multiprocessing.Queue()

上一节说的是线程Queue(queue.Queue)，出了进程就访问不到，例如同时启动2个cmd窗口，一个put数据，另一个get数据get不到

本节的Queue是进程Queue(multiprocessing.Queue)，可以在进程间访问，实际上是2个Q，拷贝Q，使用看上去像共享Q

线程Q1：

def f():
    q.put([42, None, 'hello'])  # 子线程能够访问线程Q

if __name__ == '__main__':
    q = multiprocessing.Queue()
    p = threading.Thread(target=f, ) # 主线程启动子线程
    p.start()
    print(q.get())  # 能够取出子线程放的数据

线程Q2：

def f():
    q.put([42, None, 'hello'])  # 子进程无法访问线程Q，会报错

if __name__ == '__main__':
    q = multiprocessing.Queue()
    p = multiprocessing.Process(target=f, )  # 主进程启动子进程
    p.start()
    print(q.get())

进程Q：

def f(qq):
    qq.put([42, None, 'hello'])

if __name__ == '__main__':
    qq = multiprocessing.Queue()
    p = multiprocessing.Process(target=f, args=(qq, ))  # 需要把进程Q传给子进程
    p.start()
    print(qq.get())  # OK，能够取出子进程放的数据

7.2，管道Pipes

def f(conn):
    conn.send([42, None, 'Hello from child']) # 管道子头发送消息
    conn.close() # 关闭管道

if __name__ == '__main__':
    parent_conn, child_conn = multiprocessing.Pipe()
    p = multiprocessing.Process(target=f, args=(child_conn,)) # 管道子头给另一个进程
    p.start()
    print(parent_conn.recv()) # 管道父头

备注：父进程也能给子进程发消息

7.3，managers
队列和管道无法实现数据共享，但是manager可以实现数据共享

def f(d, l):
    d[1] = '1' # 字典添加一个item
    l.append(os.getpid()) # 列表添加进程id
    print('子进程的列表', l)

if __name__ == '__main__':
    with multiprocessing.Manager() as manager:
        d = manager.dict() # 生成一个字典，可在多个进程间共享和传递
        l = manager.list(range(5)) # 生成一个列表，可在多个进程间共享和传递
        p_list = []
    for i in range(10):
        p = multiprocessing.Process(target=f, args=(d, l))
        p.start()
        p_list.append(p)
    for res in p_list:
        res.join()
    print('最后的字典', d) # 字典是一样的，因为字典自动去重，改成不一样的也能像列表不停加上
    print('最后的列表', l) # 列表会不停的加上进程号

7.4，进程锁

from multiprocessing import Process, Lock

def f(l, i):
    l.acquire()  # 锁定
    print('hello world', i)
    l.release()   # 开启
if __name__ == '__main__':
    lock = Lock() # 生成一个锁
    for num in range(10):
        Process(target=f, args=(lock, num)).start() # 锁传给进程

备注：每个进程是独立的，不需要锁，但是进程是共享屏幕的，如果大家都抢着在屏幕上打印数据，出现可能helloworld没打完，其他进程会抢着打印

8，协程

线程的切换会保存到CPU寄存器，但是协程不会保存到寄存器，CPU感知不到协程

协程不用加锁，因为协程是单线程的

缺点：协程是单线程的，无法利用多核资源，阻塞时会阻塞掉整个程序

I/O比较耗时，协程之所以能大并发，主要是遇到I/O就切换，什么时候切回去呢？

8.1，协程的实现 - 官方yield

协程基本方法：

预激：1）手动next()，2）装饰器，3）yield from
发送值：send
终止：1）发送哨符值，2）close()，3）throw()输入未处理异常
异常处理：throw输入捕捉到的异常
获取返回值：1）PEP8，2）yield from

例子1，协程的预激和发送值：

def test1(a):
    b = yield a
    c = yield a + b
    d = yield a + b + c
    print(d)

t1 = test1(1)      # 绑定调用方
print(next(t1))    # 输出：1，预激协程，有返回值a=1
print(t1.send(1))  # 输出：2，发送数据至b，有返回值a+b=2
print(t1.send(1))  # 输出：3，发送数据至c，有返回值a+b+c=3
t1.send(10)        # 输出：10，发送数据至d，执行print(d)，然后报错

例子2，协程的终止和异常处理：

class DemoException(Exception):
    """"""

def test2():
    t = 0
    while True:
        try:
            x = yield t
        except DemoException:
            print('DemoException handled. Continuing...')
        else:
            t += x
            print(t)


t2 = test2()
next(t2)
t2.send(1)    # 输出：1
t2.send(1)    # 输出：2
t2.throw(DemoException)      # 输出：DemoException handled. Continuing...
t2.send(1)    # 输出：3
t2.send(1)    # 输出：4
t2.throw(ZeroDivisionError)  # 报错：ZeroDivisionError
t2.send(1)    # 不会执行

例子3，协程返回值（PEP8方式）：

def test3():
    total = 0
    while True:
        a = yield
        if a is None:
            break
        total += a
    return total

t3 = test3()
next(t3)
t3.send(1)
t3.send(1)
t3.send(1)
try:
    t3.send(None)
except StopIteration as exc:
    res = exc
print(res)  # 输出：3

9，异步IO

9.1，asyncio - 官方

同步 —> 线程 —> 异步IO实现转圈打印

1）同步

i = 0
while i < 50:
    char = '|/-\\'[(divmod(i , 4)[1])]
    status = char + ' thinking!'
    sys.stdout.write(status)
    sys.stdout.flush()
    sys.stdout.write('\x08' * len(status))
    time.sleep(.1)
    i += 1
sys.stdout.write('answer: 42')

2）线程threading

s = True

def spin(msg):
    for char in itertools.cycle('|/-\\'):
        status = char + ' ' + msg
        sys.stdout.write(status)
        sys.stdout.flush()
        sys.stdout.write('\x08' * len(status))
        time.sleep(.1)
        if not s:
            break

def slow_function():
    time.sleep(3)
    return 42

def supervisor():
    global s   # 必须声明global，因为后面本地s=False，不声明解释器会认为是local
    spinner = threading.Thread(target=spin, args=('thinking!', ))
    spinner.start()
    result = slow_function()
    s = False
    spinner.join()
    return result

def main():
    result = supervisor()
    print('answer:', result)

if __name__ == '__main__':
    main()

3）异步asyncio

@asyncio.coroutine
def spin(msg):
    for char in itertools.cycle('|/-\\'):
        status = char + ' ' + msg
        sys.stdout.write(status)
        sys.stdout.flush()
        sys.stdout.write('\x08' * len(status))
        try:
            yield from  asyncio.sleep(.1)
        except asyncio.CancelledError:
            break

@asyncio.coroutine
def slow_function():
    yield from asyncio.sleep(3)
    return 42

@asyncio.coroutine
def supervisor():
    spinner = asyncio.async(spin('thinking!'))
    result = yield from slow_function()
    spinner.cancel()
    return result

def main():
    loop = asyncio.get_event_loop()
    result = loop.run_until_complete(supervisor())
    loop.close()
    print('anwser:', result)

if __name__ == '__main__':
    main()

备注：spin()和slow_function()中，可以把@asyncio.coroutine替换为async def；把yield from替换为await，但是supervisor()不行，asyncio.async会报错

9.2，gevent - 第三方

备注：本节内容来自文字部分来自廖雪峰blog，案例来自老男孩python

gevent是第三方库，通过greenlet实现协程，其基本思想是：

当一个greenlet遇到IO操作时，比如访问网络，就自动切换到其他的greenlet，等到IO操作完成，再在适当的时候切换回来继续执行。由于IO操作非常耗时，经常使程序处于等待状态，有了gevent为我们自动切换协程，就保证总有greenlet在运行，而不是等待IO。

由于切换是在IO操作时自动完成，所以gevent需要修改Python自带的一些标准库，这一过程在启动时通过monkey patch完成：

案例一，greenlet手动切换，类似于yeild

import greenlet
def test1():
    print(12)
    gr2.switch()  # 切换到gr2
    print(34)
    gr2.switch()  # 切换到gr2
def test2():
    print(56)
    gr1.switch()
    print(78)

gr1=greenlet.greenlet(test1)  # 启动一个协程
gr2=greenlet.greenlet(test2)
gr1.switch()

案例二，gevent自动切换，对greenlet进行封装，实现了自动切换，gevent只能在Unix/Linux下运行，在Windows下不保证正常安装和运行。

import gevent

def func1():
    print('\033[31;1m李闯在跟海涛搞...\033[0m')
    gevent.sleep(2)
    print('\033[31;1m李闯又回去跟继续跟海涛搞...\033[0m')

def func2():
    print('\033[32;1m李闯切换到了跟海龙搞...\033[0m')
    gevent.sleep(1)
    print('\033[32;1m李闯搞完了海涛，回来继续跟海龙搞...\033[0m')

gevent.joinall([gevent.spawn(func1), gevent.spawn(func2)])

gevent自己判断I/O操作，实现了func1和func2之间遇到sleep进行自动切换，如果有func3，会依次执行func1，func2，func3，再切换回去轮回

案例三，gevent协程实现并发爬虫

from urllib import request
import gevent
def f(url, filename):
    print('get:%s'%url)
    resp=request.urlopen(url)
    data=resp.read()
    f=open('%s.html'%filename,'wb')
    f.write(data)
    f.close()
    print('%d bytes received from %s'%(len(data), url))
gevent.joinall([#相当于3个协程都执行这个参数
    gevent.spawn(f, 'https://www.python.org/', '111'),
    gevent.spawn(f, 'https://www.yahoo.com/', '222'),
    gevent.spawn(f, 'https://github.com/', '333'),
])

gevent检测不到urlib的I/O操作，所以还是串行的，所以需要给它打个monkeypatch补丁，需要加上：

from gevent import monkey
monkey.patch_all()  # 把当前程序的所有的I/O操作给我单独的做上标记

10，并发的综合实验

以8核CPU为例

import time
from functools import partial
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import ProcessPoolExecutor
import multiprocessing
import asyncio

def run(x, y):
    time.sleep(5)
    return '{}:{}'.format(x, y)

p = partial(run, '序号')  # 有时需要传个固定参数，可以用partial冻结，想冻结y可以用y='序号'

单线程串行：100s

for i in range(20):
    res = run('序号', i)
    print(res)

多线程（futures.ThreadPoolExecutor）：5s，默认最大开40个线程

with ThreadPoolExecutor() as excutor:
    res = excutor.map(p, range(20))

for i in res:
    print(i)

多进程(futures.ThreadPoolExecutor）：15s，最终一次打印20个

with ProcessPoolExecutor() as excutor:
    res = excutor.map(p, range(20))

for i in res:
    print(i)

多进程(进程池 + map): 15s，最终一次打印20个

ps = multiprocessing.Pool()
res = ps.map(p, (range(20)))
for i in res:
    print(i)

多进程(进程池 + apply_sync): 15s，分3次打印20个，先好的8个先打印

res = []
ps = multiprocessing.Pool()
for i in range(20):
    res.append(ps.apply_async(run, (i, '序号')))  # apply是同步执行，应该使用apply_sync

for i in res:
    print(i.get())

协程(异步IO asyncio)：5s

async def run(x, y):
    await asyncio.sleep(5)
    return '{}:{}'.format(x, y)

tasks = [asyncio.ensure_future(run('序号', i)) for i in range(20)]
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
loop.close()

for t in tasks:
    print(t.result())

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。