python 异步线程更新界面

转载

jimoshalengzhou 2024-07-15 06:19:08

文章标签 python 异步线程更新界面 python 开发语言事件循环多线程 文章分类 Python 后端开发

单线程方式：

无论哪门编程语言，并发编程都是一项很常用很重要的技巧。例如，爬虫就被广泛应用在工业界的各个领域，我们每天在各个网站、各个 App 上获取的新闻信息，很大一部分便是通过并发编程版的爬虫获得。
正确合理地使用并发编程，无疑会给程序带来极大的性能提升。因此，本节就带领大家一起学习 Python 中的 Futures 并发编程。首先，先带领大家从代码的角度来理解并发编程中的 Futures，并进一步来比较其与单线程的性能区别。

# -*- coding:utf-8 -*-


import requests
import time


def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))


def download_all(sites):
    for site in sites:
        download_one(site)


def main():
    sites = [
        'https://www.tuicool.com/search?kw=pytorch+&t=1',
        'https://www.tuicool.com/a/',
        'https://www.tuicool.com/ah/101000000/'
    ]
    start_time = time.time()
    download_all(sites)
    end_time = time.time()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))


if __name__ == '__main__':
    main()

输出结果：

Read 6215 from https://www.tuicool.com/search?kw=pytorch+&t=1
Read 8021 from https://www.tuicool.com/a/
Read 44835 from https://www.tuicool.com/ah/101000000/
Download 3 sites in 1.9893198013305664 seconds

Process finished with exit code 0

这种方式应该是最直接也最简单的：
先是遍历存储网站的列表；
然后对当前网站执行下载操作；
等到当前操作完成后，再对下一个网站进行同样的操作，一直到结束。

可以看到，总共耗时约 1.93s。单线程的优点是简单明了，但是明显效率低下，因为上述程序的绝大多数时间都浪费在了 I/O 等待上。程序每次对一个网站执行下载操作，都必须等到前一个网站下载完成后才能开始。如果放在实际生产环境中，我们需要下载的网站数量至少是以万为单位的，不难想象，这种方案根本行不通。

多线程方式：
接着再来看多线程版本的代码实现：

# -*- coding:utf-8 -*-



import concurrent.futures
import requests
import threading
import time

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))

def download_all(sites):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        executor.map(download_one, sites)

def main():
    sites = [
        'https://www.tuicool.com/search?kw=pytorch+&t=1',
        'https://www.tuicool.com/a/',
        'https://www.tuicool.com/ah/101000000/'
    ]
    start_time = time.time()
    download_all(sites)
    end_time = time.time()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))

if __name__ == '__main__':
    main()

输出结果：

Read 44835 from https://www.tuicool.com/ah/101000000/
Read 6220 from https://www.tuicool.com/search?kw=pytorch+&t=1
Read 8021 from https://www.tuicool.com/a/
Download 3 sites in 0.6527125835418701 seconds

Process finished with exit code 0

可以看到，总耗时是 0.65s 左右，效率一下子提升了很多。
注意，虽然线程的数量可以自己定义，但是线程数并不是越多越好，因为线程的创建、维护和删除也会有一定的开销，所以如果设置的很大，反而可能会导致速度变慢。我们往往需要根据实际的需求做一些测试，来寻找最优的线程数量。

多进程版本：

# -*- coding:utf-8 -*-



import concurrent.futures
import requests
import threading
import time

def download_one(url):
    resp = requests.get(url)
    print('Read {} from {}'.format(len(resp.content), url))

def download_all(sites):
    with concurrent.futures.ProcessPoolExecutor() as executor:
        executor.map(download_one, sites)

def main():
    sites = [
        'https://www.tuicool.com/search?kw=pytorch+&t=1',
        'https://www.tuicool.com/a/',
        'https://www.tuicool.com/ah/101000000/'
    ]
    start_time = time.time()
    download_all(sites)
    end_time = time.time()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))

if __name__ == '__main__':
    main()

输出结果：

Read 44835 from https://www.tuicool.com/ah/101000000/
Read 8021 from https://www.tuicool.com/a/
Read 6229 from https://www.tuicool.com/search?kw=pytorch+&t=1
Download 3 sites in 1.164093255996704 seconds

Process finished with exit code 0

函数 ProcessPoolExecutor() 表示创建进程池，使用多个进程并行的执行程序。不过，这里通常省略参数 workers，因为系统会自动返回 CPU 的数量作为可以调用的进程数。
但是，并行的方式一般用在 CPU heavy 的场景中，因为对于 I/O heavy 的操作，多数时间都会用于等待，相比于多线程，使用多进程并不会提升效率。反而很多时候，因为 CPU 数量的限制，会导致其执行效率不如多线程版本。

异步方式：

# -*- coding:utf-8 -*-

import asyncio
import aiohttp
import time


async def download_one(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            print('Read {} from {}'.format(resp.content_length, url))


async def download_all(sites):
    tasks = [asyncio.ensure_future(download_one(site)) for site in sites]
    await asyncio.gather(*tasks)


def main():
    sites = [
        'https://www.tuicool.com/search?kw=pytorch+&t=1',
        'https://www.tuicool.com/a/',
        'https://www.tuicool.com/ah/101000000/'
    ]
    start_time = time.perf_counter()

    loop = asyncio.get_event_loop()
    try:
        loop.run_until_complete(download_all(sites))
    finally:
        loop.close()

    end_time = time.perf_counter()
    print('Download {} sites in {} seconds'.format(len(sites), end_time - start_time))


if __name__ == '__main__':
    main()

运行结果：

Read None from https://www.tuicool.com/search?kw=pytorch+&t=1
Read None from https://www.tuicool.com/ah/101000000/
Read None from https://www.tuicool.com/a/
Download 3 sites in 1.1439555240569648 seconds

Process finished with exit code 0

事实上，Asyncio 和其他 Python 程序一样，是单线程的，它只有一个主线程，但可以进行多个不同的任务。这里的任务，指的就是特殊的 future 对象，我们可以把它类比成多线程版本里的多个线程。

这些不同的任务，被一个叫做事件循环（Event Loop）的对象所控制。所谓事件循环，是指主线程每次将执行序列中的任务清空后，就去事件队列中检查是否有等待执行的任务，如果有则每次取出一个推到执行序列中执行，这个过程是循环往复的。

为了简化讲解这个问题，可以假设任务只有两个状态：，分别是预备状态和等待状态：
预备状态是指任务目前空闲，但随时待命准备运行；
等待状态是指任务已经运行，但正在等待外部的操作完成，比如 I/O 操作。

在这种情况下，事件循环会维护两个任务列表，分别对应这两种状态，并且选取预备状态的一个任务（具体选取哪个任务，和其等待的时间长短、占用的资源等等相关）使其运行，一直到这个任务把控制权交还给事件循环为止。

当任务把控制权交还给事件循环对象时，它会根据其是否完成把任务放到预备或等待状态的列表，然后遍历等待状态列表的任务，查看他们是否完成：如果完成，则将其放到预备状态的列表；反之，则继续放在等待状态的列表。而原先在预备状态列表的任务位置仍旧不变，因为它们还未运行。

这样，当所有任务被重新放置在合适的列表后，新一轮的循环又开始了，事件循环对象继续从预备状态的列表中选取一个任务使其执行…如此周而复始，直到所有任务完成。

值得一提的是，对于 Asyncio 来说，它的任务在运行时不会被外部的一些因素打断，因此 Asyncio 内的操作不会出现竞争资源（多个线程同时使用同一资源）的情况，也就不需要担心线程安全的问题了。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：targetJava程序

下一篇：Nginx 开启同源策略和referer检测

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

python 异步线程 更新界面

python 异步线程 更新界面

51CTO博客

python 异步线程更新界面

python 异步线程更新界面