python 多线程处理

In this article, you will learn

在本文中,您将学习

  • Difference between Multi-Threading and MultiProcessing and when to use them
  • Implement MultiProcessing in Python using multiprocessing and concurrent.futures 使用多和concurrent.futures实现多重处理在Python

(What is MultiProcessing?)

  • Multiprocessing allows you to spawn multiple processes within a program. 多重处理使您可以在一个程序中产生多个进程 。
  • It allows you to leverage multiple CPU cores on your machine 它允许您利用计算机上的多个CPU内核
  • Multiple processes within a program do not share the memory 程序中的多个进程不共享内存
  • Side steps the GIL(Global Interpreter Lock) limitation of Python which allows only one thread to hold control of the Python interpreter 绕过了Python的GIL(全局解释器锁定)限制,它仅允许一个线程来控制Python解释器
  • Used for computation or CPU intensive programs 用于计算或CPU密集型程序

then what is Multi-threading and when to use it?

那么什么是多线程以及何时使用它?

A Thread is the

线程是

  • Smallest set of independent commands executed in a program 程序中执行的最小独立命令集
  • Multiple threads within an application can execute simultaneously on a CPU referred to as MultiThreading 应用程序中的多个线程可以被称为多线程在CPU上同时执行
  • Runs always within a program and cannot run on its own 始终在程序内运行,不能单独运行
  • Used when programs ar network bound or there is heavy I/O operation 当程序受网络限制或I / O操作繁重时使用
  • Memory is shared between multiple threads within a process and hence has lower resources consumption 内存在一个进程中的多个线程之间共享 ,因此具有较低的资源消耗

Below is the code to demonstrate that Multiprocessing does not share a memory, whereas Multi-Threading shares memory.

下面的代码演示多处理不共享内存,而多线程共享内存。

In the piece of code below, we check if the number passed in the list is a prime number or not. We will do this using both Multi-threading as well as using Multiprocessing

在下面的代码段中,我们检查列表中传递的数字是否为质数。 我们将同时使用多线程和多处理

(Shared Global variable using Multi-threading)

We have created a global list variable, prime_list, to store all prime numbers.

我们创建了一个全局列表变量prime_list来存储所有素数。

Iterate through all the list_of_num and check if the number is a prime number.

遍历所有list_of_num并检查该数字是否为质数。

import threading
import timeprime_list=[]def get_prime_numbers(numbers):
    for num in numbers:
        time.sleep(1)
        if check_if_prime(num)==True:
            prime_list.append(num)
    return prime_listdef check_if_prime(num):
    if num > 1:
       # check for factors
       for i in range(2,num):
           if (num % i) == 0:
               return False
               break
       else:
           return True

    # if input number is less than or equal to 1,then not a prime
    else:
       return Falsedef main():
    start_time=time.perf_counter()
list_of_num=[9,13,12312, 121, 1913, 97, 57, 34, 37, 89, 81 , 87, 23, 27]

t1 = threading.Thread(target=get_prime_numbers, args=[list_of_num])
    t1.start()
    t1.join()
    end_time=time.perf_counter()

print(f"Total time of Thread execution {round(end_time- start_time,4)} for the function ")
    print(f'Prime numbers  {prime_list} from {list_of_num}')if __name__=='__main__':
    main()

As the threads share the memory or the global variable, the values are persisted, and we can see that in the output of the program.

当线程共享内存或全局变量时,这些值将保留下来,我们可以在程序输出中看到这一点。

(Shared Global variable using Multiprocessing)

It is the same functionality as above, but we have used the Multiprocessing library instead of threading.

它与上面的功能相同,但是我们使用了Multiprocessing库而不是线程。

For Multiprocessing, create an instance of the Process and pass the function to be executed to target and all the arguments to args. This is the same as in Multi-Threading.

对于Multiprocessing,创建Process的实例,然后将要执行的函数传递给target并将所有参数传递给args。 这与多线程中的相同。

Use the process.start() to start the process.

使用process.start()启动该过程。

The main Process waits for the sub-process to complete using join()

主流程使用join()等待子流程完成

Click here to know more about Multi-Threading.

单击此处以了解有关多线程的更多信息。

import multiprocessing
import timeprime_list=[]def get_prime_numbers(numbers):
    for num in numbers:
        time.sleep(1)
        if check_if_prime(num)==True:
            prime_list.append(num)
    return prime_listdef check_if_prime(num):
    if num > 1:
       # check for factors
       for i in range(2,num):
           if (num % i) == 0:
               return False
               break
       else:
           return True

    # if input number is less than or equal to 1 else  not prime
    else:
       return Falsedef main():
    start_time=time.perf_counter()
list_of_num=[9,13,12312, 121, 1913, 97, 57, 34, 37, 89, 81 , 87, 23, 27]
    print("No. of CPU's ", multiprocessing.cpu_count())
    p1 = multiprocessing.Process(target=get_prime_numbers, args=[list_of_num])
    p1.start()
    p1.join()
    end_time=time.perf_counter()

print(f"Total time of Thread execution {round(end_time- start_time,4)} for the function ")
    print(f'Prime numbers  {prime_list} from {list_of_num}')if __name__=='__main__':
    main()

As each Process have its copy of the global variable, we see that global variable, prime_list is empty

由于每个进程都有其全局变量的副本,因此我们看到全局变量prime_list为空

you can also go to the Task Manager, and under the details tab see multiple python.exe running

您还可以转到任务管理器,然后在“详细信息”选项卡下查看正在运行的多个python.exe

Task Manager showing multiple python processes 任务管理器显示多个python进程

(ProcessPoolExecutor)

ProcessPoolExector is an easy way to implement and spawn multiple processes using concurrent.futures.

ProcessPoolExector是使用concurrent.futures . .futures实现和产生多个进程的简便方法concurrent.futures .

concurrent.futures has an abstract class Executor, and it has two concrete subclasses

parallel.futures有一个抽象类Executor,它有两个具体的子类

  • ThreadPoolExecutor: For multi Threading ThreadPoolExecutor:用于多线程
  • ProcessPoolExecutor: For multi Processing ProcessPoolExecutor:用于多处理

In the below, we have used ProcessPoolExecutor to read the CSV file and then print the number of records in the file.

在下面,我们使用ProcessPoolExecutor读取CSV文件,然后打印该文件中的记录数。

import concurrent.futures
import time
import pandas as pddef read_data(file):
    t1= time.perf_counter()
    data= pd.read_csv(file)
    #data = data.sort_index(ascending=False)
    time.sleep(1)
    t2= time.perf_counter()
    print(f"Took {round(t2-t1,4)} (sec) time to read data in {file}" )
    return f' No. of Records in file {file} are {len(data)}'def main():
    start_time=time.perf_counter()

with concurrent.futures.ThreadPoolExecutor() as executor:
        file_list=['pollution.csv', 'good_inpu.csv', 'iris.csv']
        return_results=[executor.submit(read_data, file) for file in file_list]


  for f in concurrent.futures.as_completed(return_results):
        print(f.result())

    end_time=time.perf_counter()
print(f"Total time of execution {round(end_time- start_time,4)} for the function")if __name__=='__main__':
    main()

(Conclusion:)

For I/O or network-intensive programs, use multi-threading using Threading class or ThreadPoolExecutor. For Computation or CPU intensive programs in Python and to sidestep GIL limitation, use MultiProcessing using multiprocessing or ProcessPoolExecutor.

对于I / O或网络密集型程序,请通过Threading类或ThreadPoolExecutor使用多线程 。 对于Python中的计算或CPU密集型程序,为了避免GIL限制,请使用通过multiprocessing或ProcessPoolExecutor进行的 MultiProcessing 。

翻译自: https://levelup.gitconnected.com/multi-threading-and-multiprocessing-in-python-3d5662f4a528

python 多线程处理