python 按概率 python按概率抽样

转载

blueice 2024-01-17 06:03:25

文章标签 python 按概率 python 采样概率论 Python 文章分类 Python 后端开发

一、概率列表+样本列表

任务描述：我们常常拥有一个概率列表和样本列表，表示每一个样本被选中的概率，并且在概率列表中，概率之和为1。比如，[0.7, 0.2, 0.1]和['钢铁侠', '美国队长', '雷神']，两个列表中的元素一一对应；并且，这两个列表共同表示：'钢铁侠'有0.7的概率被选中，'美国队长'有0.2的概率被选中，'雷神'有0.1的概率被选中，我们的目的是想要通过[0.7, 0.2, 0.1]这样的离散概率分布来对['钢铁侠', '美国队长', '雷神']进行采样，并且只采一个样本(当然也可以采多个样本)。

实际上这样的任务在Python中可以使用相当简单的方式实现，具体请看我的代码。

代码

import random

# input: probability distribution and correspondence
list_probability = [0.005, 0.015, 0.08, 0.25, 0.3, 0.25, 0.08, 0.015, 0.005]
list_player_role = ['黑寡妇', '蜘蛛侠', '绿巨人', '雷神', '钢铁侠', '奇异博士', '美国队长', '黑豹', '鹰眼']
# sampling
result = random.choices(list_player_role, weights=list_probability, k=1)[0]
# output: sampling one by probability distribution
print(result)

# check the sampling whether is following the probability distribution or not
frequency = [0, 0, 0, 0, 0, 0, 0, 0, 0]
trying_times = 100000
for i in range(trying_times):
    result = random.choices(list_player_role, weights=list_probability, k=1)[0]
    if result == list_player_role[0]:
        frequency[0] += 1
    elif result == list_player_role[1]:
        frequency[1] += 1
    elif result == list_player_role[2]:
        frequency[2] += 1
    elif result == list_player_role[3]:
        frequency[3] += 1
    elif result == list_player_role[4]:
        frequency[4] += 1
    elif result == list_player_role[5]:
        frequency[5] += 1
    elif result == list_player_role[6]:
        frequency[6] += 1
    elif result == list_player_role[7]:
        frequency[7] += 1
    elif result == list_player_role[8]:
        frequency[8] += 1
    else:
        raise Exception('There is something wrong in sampling...')
for i in range(len(frequency)):
    print('角色：%s\t概率: %.3f\t频率: %d/%d=%.4f' % (list_player_role[i], list_probability[i], frequency[i], trying_times, frequency[i]/trying_times))

输出

钢铁侠
角色：黑寡妇   概率: 0.005   频率: 489/100000=0.0049
角色：蜘蛛侠   概率: 0.015   频率: 1558/100000=0.0156
角色：绿巨人   概率: 0.080   频率: 8011/100000=0.0801
角色：雷神   概率: 0.250   频率: 25094/100000=0.2509
角色：钢铁侠   概率: 0.300   频率: 29957/100000=0.2996
角色：奇异博士   概率: 0.250   频率: 24958/100000=0.2496
角色：美国队长   概率: 0.080   频率: 7867/100000=0.0787
角色：黑豹   概率: 0.015   频率: 1551/100000=0.0155
角色：鹰眼   概率: 0.005   频率: 515/100000=0.0052

可以看到输出结果中每一个频率都接近于它所对应的概率，这说明采样的过程确实是遵从我们指定的概率分布的。

二、仅有概率列表

任务描述：不指定样本列表，仅仅有一个概率列表，然后经过采样后输出概率列表中的一个索引。例如，输入[0.7, 0.2, 0.1]，输出1，那么1则表示采到了概率0.2。如果输出2，那么表示采到了概率0.1；如果输出0，那么表示采到了概率0.7。

代码

import random

# input: probability distribution and correspondence
list_probability = [0.005, 0.015, 0.08, 0.25, 0.3, 0.25, 0.08, 0.015, 0.005]

# sampling
index = list(range(len(list_probability)))
probability_index = random.choices(index, weights=list_probability, k=1)[0]

# output: sampling one by probability distribution
print(probability_index)

输出

5

上述的采样过程仅在Python中的list进行测试，按理来说numpy、pytorch等开源库也是会有相应的实现方法的。经过网上检索，确实有，请移步至sampling from a tensor in torch。

三、pytorch实现

代码

import torch

# input: discrete probability distribution
p = torch.tensor([0.005, 0.015, 0.08, 0.25, 0.3, 0.25, 0.08, 0.015, 0.005])

# sampling: sample one by given probability distribution
index = p.multinomial(num_samples=1, replacement=False)     # replacement=False指不重复采样

# output: the formulation of sampling result
print(index.item())

输出