在利用Python进行系统管理的时候,特别是同时操作多个文件目录,或者远程控制多台主机,并行操作可以节约大量的时间。当被操作对象数目不大时,可以直接利用multiprocessing中的Process动态成生多个进程,十几个还好,但如果是上百个,上千个目标,手动的去限制进程数量却又太过繁琐,此时可以发挥进程池的功效。
Pool可以提供指定数量的进程供用户调用,当有新的请求提交到pool中时,如果池还没有满,那么就会创建一个新的进程用来执行该请求;但如果池中的进程数已经达到规定最大值,那么该请求就会等待,直到池中有进程结束,才会创建新的进程来它。
例1 使用进程池
1 import multiprocessing
2 import time, os
3 def func(msg):
4 print(f"msg: {msg} processed by {os.getpid()} and parent pid is {os.getppid()}")
5 time.sleep(3)
6 print("end")
7 if __name__ == "__main__":
8 print(os.getpid())
9 pool = multiprocessing.Pool(processes=4)
10 for i in range(5):
11 msg = "hello %d" % (i)
12 pool.apply_async(func, (msg, ))
13 # 维持执行的进程总数为processes,当一个进程执行完毕后会添加新的进程进去
14 print("Mark~ Mark~ Mark~~~~~~~~~~~~~~~~~~~~~~")
15 pool.close()
16 pool.join()
17 # 调用join之前,先调用close函数,否则会出错。执行完close后不会有新的进程加入到pool,
18 # join函数等待所有子进程结束
19 print("Sub-process done.")
某次运行输出结果:
17296
Mark~ Mark~ Mark~~~~~~~~~~~~~~~~~~~~~~
msg: hello 0 processed by 47052 and parent pid is 17296
msg: hello 1 processed by 15264 and parent pid is 17296
msg: hello 2 processed by 52680 and parent pid is 17296
msg: hello 3 processed by 18600 and parent pid is 17296
end
end
msg: hello 4 processed by 47052 and parent pid is 17296
end
end
end
Sub-process done.
函数解释:
- apply_async(func, args=(), kwds={}, callback=None, error_callback=None) 它是非阻塞,apply(func, args=(), kwds={})是阻塞的(理解区别,看例1例2结果区别)
- close() 关闭pool,使其不在接受新的任务。
- terminate() 结束工作进程,不在处理未完成的任务。
- join() 主进程阻塞,等待子进程的退出, join方法要在close或terminate之后使用。
因为为非阻塞,主函数会自己执行自个的,不搭理进程的执行,所以运行完for循环后直接输出“mMsg: hark~ Mark~ Mark~~~~~~~~~~~~~~~~~~~~~~”,主程序在pool.join()处等待各个进程的结束。
例2 使用进程池(阻塞)
1 import multiprocessing
2 import time, os
3 def func(msg):
4 print(f"msg: {msg} processed by {os.getpid()} and parent pid is {os.getppid()}")
5 time.sleep(3)
6 print("end")
7 if __name__ == "__main__":
8 print(os.getpid())
9 pool = multiprocessing.Pool(processes=4)
10 for i in range(5):
11 msg = "hello %d" % (i)
12 # pool.apply_async(func, (msg, ))
13 pool.apply(func, (msg, ))
14 # 维持执行的进程总数为processes,当一个进程执行完毕后会添加新的进程进去
15 print("Mark~ Mark~ Mark~~~~~~~~~~~~~~~~~~~~~~")
16 pool.close()
17 pool.join()
18 # 调用join之前,先调用close函数,否则会出错。执行完close后不会有新的进程加入到pool,
19 # join函数等待所有子进程结束
20 print("Sub-process done.")
某次运行执行结果:
3552
msg: hello 0 processed by 40088 and parent pid is 3552
end
msg: hello 1 processed by 44100 and parent pid is 3552
end
msg: hello 2 processed by 13824 and parent pid is 3552
end
msg: hello 3 processed by 24148 and parent pid is 3552
end
msg: hello 4 processed by 40088 and parent pid is 3552
end
Mark~ Mark~ Mark~~~~~~~~~~~~~~~~~~~~~~
Sub-process done.
阻塞的效果:本代码块变为串行的
例3 使用进程池(并关注结果)
1 import multiprocessing
2 import time, os
3 def func(msg):
4 print(f"msg: {msg} processed by {os.getpid()} and parent pid is {os.getppid()}")
5 time.sleep(3)
6 print("end")
7 return "done " + msg
8 if __name__ == "__main__":
9 pool = multiprocessing.Pool(processes=4)
10 result = []
11 for i in range(5):
12 msg = "hello %d" % (i)
13 result.append(pool.apply_async(func, (msg, )))
14 pool.close()
15 pool.join()
16 for res in result:
17 print(":::", res.get())
18 print("Sub-process(es) done.")
某次运行执行结果:
msg: hello 0 processed by 50408 and parent pid is 40748
msg: hello 1 processed by 6236 and parent pid is 40748
msg: hello 2 processed by 52408 and parent pid is 40748
msg: hello 3 processed by 52316 and parent pid is 40748
end
end
msg: hello 4 processed by 50408 and parent pid is 40748
end
end
end
::: done hello 0
::: done hello 1
::: done hello 2
::: done hello 3
::: done hello 4
Sub-process(es) done.
使用get()函数得到每个返回结果的值
例4 使用多个进程池(执行多个不同任务)
import multiprocessing
import os, time, random
def Lee():
print("\nRun task Lee-%s" % (os.getpid())) # os.getpid()获取当前的进程的ID
start = time.time()
time.sleep(random.random() * 10) # random.random()随机生成0-1之间的小数
end = time.time()
print('Task Lee, runs %0.2f seconds.' % (end - start))
def Marlon():
print("\nRun task Marlon-%s" % (os.getpid()))
start = time.time()
time.sleep(random.random() * 40)
end = time.time()
print('Task Marlon runs %0.2f seconds.' % (end - start))
def Allen():
print("\nRun task Allen-%s" % (os.getpid()))
start = time.time()
time.sleep(random.random() * 30)
end = time.time()
print('Task Allen runs %0.2f seconds.' % (end - start))
def Frank():
print("\nRun task Frank-%s" % (os.getpid()))
start = time.time()
time.sleep(random.random() * 20)
end = time.time()
print('Task Frank runs %0.2f seconds.' % (end - start))
if __name__ == '__main__':
function_list = [Lee, Marlon, Allen, Frank]
print("parent process %s" % (os.getpid()))
pool = multiprocessing.Pool(4)
for func in function_list:
pool.apply_async(func) # Pool执行函数,apply执行函数,当有一个进程执行完毕后,
# 会添加一个新的进程到pool中
print('Waiting for all subprocesses done...')
pool.close()
pool.join() # 调用join之前,一定要先调用close() 函数,否则会出错,
# close()执行后不会有新的进程加入到pool,join函数等待素有子进程结束
print('All subprocesses done.')
某次运行执行结果:
parent process 29224
Waiting for all subprocesses done...
Run task Lee-33772
Run task Marlon-14784
Run task Allen-24860
Run task Frank-29684
Task Lee, runs 2.53 seconds.
Task Allen runs 6.69 seconds.
Task Marlon runs 9.31 seconds.
Task Frank runs 15.88 seconds.
All subprocesses done.
例5 multiprocessing pool map
1 import multiprocessing
2 import os
3 def m1(x):
4 print('%s is running and parent is %s'%(os.getpid(),os.getppid()))
5 print(x * x)
6 if __name__ == '__main__':
7 print(os.getpid())
8 # pool = multiprocessing.Pool(multiprocessing.cpu_count())
9 pool = multiprocessing.Pool(4)
10 print(multiprocessing.cpu_count())
11 i_list = range(8)
12 pool.map(m1, i_list)
pool = multiprocessing.Pool(4)
>>>
26040
4
23768 is running and parent is 26040
0
20896 is running and parent is 26040
1
23768 is running and parent is 26040
4
20896 is running and parent is 26040
9
20896 is running and parent is 26040
25
23768 is running and parent is 26040
16
30380 is running and parent is 26040
49
20896 is running and parent is 26040
36
问题:http://bbs.chinaunix.net/thread-4111379-1-1.html
1 import multiprocessing
2
3 class someClass(object):
4 def __init__(self):
5 pass
6 def f(self, x):
7 return x*x
8 def go(self):
9 pool = multiprocessing.Pool(processes=4)
10 result = pool.apply_async(self.f, [10])
11 print(result.get())
12 print(pool.map(self.f, range(10)))
13 if __name__ == '__main__': #没有这个语句会报错
14 s = someClass()
15 s.go()
import multiprocessing
import logging
def create_logger(i):
print(i)
class CreateLogger(object):
def __init__(self, func):
self.func = func
if __name__ == '__main__':
ilist = range(10)
cl = CreateLogger(create_logger)
pool = multiprocessing.Pool(multiprocessing.cpu_count())
pool.map(cl.func, ilist)
print("hello------------>")
例6 进程池的另一种使用方法
concurrent.futures模块
1 import time
2 from concurrent.futures import ProcessPoolExecutor
3 def func(name):
4 print(f"{name}开始")
5 time.sleep(0.5)
6 print(f"{name}结束")
7 if __name__ == '__main__':
8 p = ProcessPoolExecutor(max_workers=3) # 创建一个进程池
9 for i in range(1, 10):
10 p.submit(func, f"进程{i}") # 往进程池内提交任务
11 p.shutdown() # 主进程等待子进程结束
12 print("主进程结束")
例7 线程池的使用
import time
from concurrent.futures import ThreadPoolExecutor
def func(name):
print(f"{name}开始")
time.sleep(0.5)
print(f"{name}结束")
if __name__ == '__main__':
p = ThreadPoolExecutor(max_workers=3) # 创建一个线程池,里面最多有3个线程同时工作
for i in range(1, 10):
p.submit(func, f"线程{i}")
p.shutdown() # 主线程等待子线程结束
print("主线程结束")
在concurrent.futures模块中,线程池和进程池的调用方法是一样的。该例同时展示了获取返回值的方式,调用result方法。
import os
import time
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import threading
import random
def f(n):
time.sleep(random.randint(1, 3))
# print(n)
# print("进程(%s) %s的平方: %s" % (os.getpid(), n, n*n))
print("线程(%s) %s的平方: %s" % (threading.current_thread().getName(), n, n * n))
return n * n
if __name__ == '__main__':
pool = ThreadPoolExecutor(max_workers=5)
# pool = ProcessPoolExecutor(max_workers=5)
ret_list = []
for i in range(10):
ret = pool.submit(f, i) # 异步提交任务,f函数名称或者方法名称,i给f函数的参数
# print(ret.result()) #join
ret_list.append(ret)
# pool.shutdown() #锁定线程池,不让新任务再提交进来了.轻易不用
for i in ret_list:
print(i.result())
线程(ThreadPoolExecutor-0_2) 2的平方: 4
线程(ThreadPoolExecutor-0_0) 0的平方: 0
线程(ThreadPoolExecutor-0_3) 3的平方: 9
0
线程(ThreadPoolExecutor-0_1) 1的平方: 1
1
4
9
线程(ThreadPoolExecutor-0_3) 7的平方: 49线程(ThreadPoolExecutor-0_4) 4的平方: 16
线程(ThreadPoolExecutor-0_2) 5的平方: 25
16
25
线程(ThreadPoolExecutor-0_0) 6的平方: 36
36
49
线程(ThreadPoolExecutor-0_4) 9的平方: 81
线程(ThreadPoolExecutor-0_1) 8的平方: 64
64
81
使用了shutdown方法的输出结果>>>
线程(ThreadPoolExecutor-0_0) 0的平方: 0
线程(ThreadPoolExecutor-0_1) 1的平方: 1
线程(ThreadPoolExecutor-0_4) 4的平方: 16
线程(ThreadPoolExecutor-0_3) 3的平方: 9线程(ThreadPoolExecutor-0_0) 5的平方: 25
线程(ThreadPoolExecutor-0_2) 2的平方: 4
线程(ThreadPoolExecutor-0_1) 6的平方: 36
线程(ThreadPoolExecutor-0_2) 9的平方: 81线程(ThreadPoolExecutor-0_0) 8的平方: 64
线程(ThreadPoolExecutor-0_4) 7的平方: 49
0
1
4
9
16
25
36
49
64
81
1 import time
2 from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
3 import random
4 def f(n):
5 time.sleep(random.randint(1, 3))
6 return n * n
7 def call_back(m):
8 print(m)
9 print(m.result())
10 if __name__ == '__main__':
11 pool = ThreadPoolExecutor(max_workers=5)
12 pool.submit(f, 2).add_done_callback(call_back)