进程间状态信息
同样的,Event类可以在进程之间传递状态信息。事件可以在设置状态和未设置状态之间切换。还可以添加一个可选的超时值,超时后状态可以从未设置变为设置。
1: import multiprocessing
2: import time
3: def wait_for_event(e):
4: print("wait for event:starting")
5: e.wait()
6: print("wait for event:e_is_set()->", e.is_set())
7: def wait_for_event_timeout(e, t):
8: print("wait_for_event_timeout:starting")
9: e.wait(t)
10: print("wait_for_event_timeout:e.is_set()->", e.is_set())
11: if __name__ == '__main__':
12: e = multiprocessing.Event()
13: w1 = multiprocessing.Process(
14: name="block",
15: target=wait_for_event,
16: args=(e, )
17: )
18: w1.start()
19: w2 = multiprocessing.Process(
20: name="nonblock",
21: target=wait_for_event_timeout,
22: args=(e, 2)
23: )
24: w2.start()
25: print("main:waiting before calling Event.set()")
26: time.sleep(3)
27: e.set()
28: print("main:event is set")
结果,wait(t)到时间就会返回。也可以e.set()直接更改设置。
1: main:waiting before calling Event.set()
2: wait for event:starting
3: wait_for_event_timeout:starting
4: wait_for_event_timeout:e.is_set()-> False
5: main:event is set
6: wait for event:e_is_set()-> True
控制资源的访问
如果需要在多个进程间共享一个资源,可以使用一个Lock锁来避免访问冲突。
使用的api如下:
lock = multiprocessing.Lock() 实例化一个锁对象
lock.acquire() 加锁
lock.release() 释放锁
with lock: # 拿到锁执行代码后并释放锁,注意不要嵌套。
但是锁的问题比较复杂并且效率低,所有我们一般避免使用共享的数据,而是使用消息传递和队列(Queue)
同步操作
Condition对象可以来同步一个工作流的各个部分。使其中一部分并行运行,另外一些顺序运行,即使它们在不同的进程中。
下面是一个简单的例子:
1: import multiprocessing
2: import time
3: def stage_1(cond):
4: name = multiprocessing.current_process().name
5: print("starting", name)
6: with cond:
7: print("{} done and ready for stage 2".format(name))
8: # 激活等待的进程
9: cond.notify_all()
10: def stage_2(cond):
11: name = multiprocessing.current_process().name
12: print("starting", name)
13: with cond:
14: cond.wait()
15: print("{} running".format(name))
16: if __name__ == '__main__':
17: condition = multiprocessing.Condition()
18: s1 = multiprocessing.Process(
19: name="s1",
20: target=stage_1,
21: args=(condition, ),
22: )
23: s2_client = [
24: multiprocessing.Process(
25: name="stage_2[{}]".format(i),
26: target=stage_2,
27: args=(condition, ),
28: ) for i in range(1, 3)]
29: for c in s2_client:
30: c.start()
31: time.sleep(1)
32: s1.start()
33:
34: s1.join()
35: for c in s2_client:
36: c.join()
结果:(根据机器的配置结果有轻微的不同)
1: starting stage_2[1]
2: starting stage_2[2]
3: starting s1
4: s1 done and ready for stage 2
5: stage_2[1] running
6: stage_2[2] running
控制资源的并发访问
有时候允许多个进程同时访问一个资源,但是要限制总数。比如一个网络应用可能支持固定数目的并发下载。用Semaphore来管理这些连接.3是允许同时访问的最大进程数。
1: s = multiprocess.Semaphore(3)
2: jobs = [
3: multiprocess.Process(
4: target=worker,
5: name=str(i),
6: args=(s,)
7: )
8: for i in range(10)]
管理共享状态
管理器multiprocess.Manager()除了支持字典之外,还支持列表。
1: import multiprocessing
2: def worker(d, key, value):
3: mgr = multiprocessing.Manager()
4: d[key] = value
5: if __name__ == '__main__':
6: mgr = multiprocessing.Manager()
7: d = mgr.dict()
8: jobs = [
9: multiprocessing.Process(
10: target=worker,
11: args=(d, i, i*2),
12: ) for i in range(10)
13: ]
14: for i in jobs:
15: i.start()
16: for j in jobs:
17: j.join()
18: print("D->", d)
结果。由于这个列表是通过管理器创建的,所以它会由所有的进程共享。
1: D-> {1: 2, 3: 6, 0: 0, 5: 10, 8: 16, 2: 4, 7: 14, 6: 12, 4: 8, 9: 18}
共享命名空间
namespace = mgr.Namespace()
下面是一个简单的示例
1: import multiprocessing
2: def producer(ns, event):
3: ns.value = "this is a value"
4: event.set()
5: def consumer(ns, event):
6: try:
7: print("Before event:{}".format(ns.value))
8: except Exception as err:
9: print("Before event error:", str(err))
10: event.wait()
11: print("After event:", ns.value)
12: if __name__ == '__main__':
13: mgr = multiprocessing.Manager()
14: namespace = mgr.Namespace()
15: event = multiprocessing.Event()
16: p = multiprocessing.Process(
17: target=producer,
18: args=(namespace, event)
19: )
20: c = multiprocessing.Process(
21: target=consumer,
22: args=(namespace, event),
23: )
24: c.start()
25: p.start()
26: c.join()
27: p.join()
结果:
1: Before event error: 'Namespace' object has no attribute 'value'
2: After event: this is a value
对于命名空间中可变值内容的更新不会自动传播。如果需要更新要将它再次关联到命名空间对象。
进程池
Pool类可以管理固定数目的工作进程。
1: import multiprocessing
2: def do_calculation(data):
3: return data * 2
4: def start_process():
5: print("starting", multiprocessing.current_process().name)
6: if __name__ == '__main__':
7: inputs = list(range(10))
8: print("Input :", inputs)
9: builtin_outputs = list(map(do_calculation, inputs))
10: print("Built-in:", builtin_outputs)
11: pool_size = multiprocessing.cpu_count()*2
12: pool = multiprocessing.Pool(
13: processes=pool_size,
14: initializer=start_process,
15: )
16: pool_outputs = pool.map(do_calculation, inputs)
17: pool.close()
18: pool.join()
19: print("Pool:", pool_outputs)
close与join让任务与主进程同步。结果:
1: Input : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
2: Built-in: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
3: starting SpawnPoolWorker-1
4: starting SpawnPoolWorker-2
5: starting SpawnPoolWorker-4
6: starting SpawnPoolWorker-5
7: starting SpawnPoolWorker-6
8: starting SpawnPoolWorker-3
9: starting SpawnPoolWorker-8
10: starting SpawnPoolWorker-7
11: Pool: [0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
当然pool类也可由设置maxtasksperchild参数来告诉池对象,在完成一些任务之后要重新启动一个工作进程,来避免长时间运行的工作进程消耗更多的系统资源。