optics算法 python openpose算法

转载

langrisser 2024-02-13 10:09:17

文章标签 optics算法 python 计算机视觉人工智能深度学习神经网络 文章分类 Python 后端开发

数据及代码见最后

1.数据集与路径配置

首先，我们需要下载coco数据集，并指定数据集的路径，数据集下载地址见文末。

optics算法 python openpose算法_optics算法 python

2.数据的处理和标签的制作

首先，需要读取数据和标签文件。由于网络输出的是关键点的热度图和PAF亲和度图，因此，需要为此制作标签。

读取数据和标签文件

数据使用的是coco数据集，首先使用pycocotools库读取图片及标签。标注数据由x,y坐标值和k组成,k值取0表示点不存在,取1表示点被遮挡，取2表示点既标注了也没被遮挡。因此，需要筛选掉k值为0的情况。

标签的制作

标签的制作需要制作各个关键点的热度图和PAF标签。关键点热图的个数等于关键点的个数+背景。而对于PAF向量，为了简化关键点的匹配，使完全图问题简化为二分图问题。作者预先定义了19种不同的肢体，每种肢体有x方向和y方向。需要注意的是，网络输出的是特征图，下采样了8倍。因此，制作标签时基于特征图的维度。

关键点的热度图标签

关键点的热度图的标签的制作步骤如下:

首先，得到各种关键点的位置，比如眼部、手臂等
对关键点进行筛选,去除越界的关键点
为每一个关键点构建一个热度图,即在一定范围内，以关键点为中心,构建高斯分布,超出范围则置0。如果多个人的同一关键点出现了重叠,则做累加。

论文中热度图值的公式如下:

optics算法 python openpose算法_optics算法 python_02

代码如下:

def putGaussianMaps(center, accumulate_confid_map, sigma, grid_y, grid_x, stride):

    start = stride / 2.0 - 0.5
    y_range = [i for i in range(int(grid_y))]
    x_range = [i for i in range(int(grid_x))]
    xx, yy = np.meshgrid(x_range, y_range)
    xx = xx * stride + start
    yy = yy * stride + start
    d2 = (xx - center[0]) ** 2 + (yy - center[1]) ** 2
    exponent = d2 / 2.0 / sigma / sigma
    mask = exponent <= 4.6052
    cofid_map = np.exp(-exponent)
    cofid_map = np.multiply(mask, cofid_map)
    accumulate_confid_map += cofid_map # 多个点会叠加的
    accumulate_confid_map[accumulate_confid_map > 1.0] = 1.0
    
    return accumulate_confid_map

PAF向量标签

PAF向量标签的制作步骤如下:

首先，需要求出肢体的方向向量，肢体上的点的方向向量均为肢体的方向向量。需要注意的是，肢体的方向向量不考虑大小，因此需要转化成单位向量
构建肢体的可能区域,即以两个关键点为顶点的一个矩阵,区域内所有点的方向与肢体方向相同
使用向量叉乘根据阈值选择肢体的区域，任何向量与单位向量的叉乘即为四边形的面积,当点位于肢体区域时,平行四边形。的面积较小,当点在肢体区域以外时,平行四边形的面积较大。基于此，选择肢体的区域
不在肢体区域向量全为0,在肢体区域内所有点的方向向量均为肢体的方向向量
针对重叠的情况，重叠取各个方向向量的平均值。

另外，构建时还有很多细节，参见下面的代码注释:

def putVecMaps(centerA, centerB, accumulate_vec_map, count, grid_y, grid_x, stride):
    centerA = centerA.astype(float)
    centerB = centerB.astype(float)
    # 将关键点映射到特征图中
    thre = 1  # limb width
    centerB = centerB / stride #映射到特征图中
    centerA = centerA / stride

    # 构建肢体的方向向量,即关键点AB的方向
    limb_vec = centerB - centerA
    # 不考虑AB的大小,只考虑肢体AB的方向,构建AB的单位向量
    norm = np.linalg.norm(limb_vec)#求范数
    if (norm == 0.0):
        # print 'limb is too short, ignore it...'
        return accumulate_vec_map, count
    limb_vec_unit = limb_vec / norm #单位向量
    # print 'limb unit vector: {}'.format(limb_vec_unit)

    # To make sure not beyond the border of this two points
    # 构建肢体的可能区域,即以两个关键点为顶点的一个矩阵,区域内所有点的方向与肢体方向相同
    min_x = max(int(round(min(centerA[0], centerB[0]) - thre)), 0)# 得到所有可能区域
    max_x = min(int(round(max(centerA[0], centerB[0]) + thre)), grid_x)
    min_y = max(int(round(min(centerA[1], centerB[1]) - thre)), 0)
    max_y = min(int(round(max(centerA[1], centerB[1]) + thre)), grid_y)

    range_x = list(range(int(min_x), int(max_x), 1))
    range_y = list(range(int(min_y), int(max_y), 1))
    xx, yy = np.meshgrid(range_x, range_y)
    ba_x = xx - centerA[0]  # the vector from (x,y) to centerA 根据位置判断是否在该区域上（分别得到X和Y方向的）
    ba_y = yy - centerA[1]
    #  向量叉乘根据阈值选择赋值区域，任何向量与单位向量的叉乘即为四边形的面积,当点位于肢体区域时,平行四边形
    #  的面积较小,当点在肢体区域以外时,平行四边形的面积较大
    limb_width = np.abs(ba_x * limb_vec_unit[1] - ba_y * limb_vec_unit[0]) #向量叉乘根据阈值选择赋值区域，任何向量与单位向量的叉乘即为四边形的面积
    mask = limb_width < thre  # mask is 2D # 小于阈值的表示在该区域上

    # 构建全0的矩阵,即不在肢体区域的向量全为0
    vec_map = np.copy(accumulate_vec_map) * 0.0 #本次计算

    # 不在肢体区域向量全为0,在肢体区域内所有点的方向向量均为肢体的方向向量
    vec_map[yy, xx] = np.repeat(mask[:, :, np.newaxis], 2, axis=2)
    # 通过乘以肢体向量来实现,不在肢体区域值为0,乘以方向向量还是0,在肢体区域值为1,乘以方向向量值为方向向量
    vec_map[yy, xx] *= limb_vec_unit[np.newaxis, np.newaxis, :] #在该区域上的都用对应的方向向量表示（根据mask结果表示是否在）

    # 在特征图中（46*46）中 哪些区域是该肢体所在区域,但凡x,y有一个大于0,就是肢体所在区域
    mask = np.logical_or.reduce(
        (np.abs(vec_map[:, :, 0]) > 0, np.abs(vec_map[:, :, 1]) > 0))

    # 每次返回的accumulate_vec_map都是平均值，现在还原成实际累加值
    accumulate_vec_map = np.multiply(
        accumulate_vec_map, count[:, :, np.newaxis])
    # # 加上当前关键点位置形成的向量
    accumulate_vec_map += vec_map
    # 该区域计算次数+1
    count[mask == True] += 1

    # count == 0表示没有被肢体区域包含的地方,即没有被计算的地方
    mask = count == 0

    # 表示没有被肢体区域包含的地方置1,避免计算平均是除0
    count[mask == True] = 1

    # 当前的平均向量
    accumulate_vec_map = np.divide(accumulate_vec_map, count[:, :, np.newaxis])
    # 表示没有被肢体区域包含的地方count还原为0
    count[mask == True] = 0

    return accumulate_vec_map, count

3.网络结构

第一阶段首先使用vgg等网络进行特征提取，然后输出每个关键点的热图（18个关键点+1个背景）以及PAF向量（19种肢体*2种向量）。然后将关键点热图和PAF向量拼接，输入下一个stage，分别使用卷积进行特征提取，输出关键点热图和PAF向量，拼接后输入下一阶段。依次迭代。

optics算法 python openpose算法_人工智能_03

代码如下:

def get_model(trunk='vgg19'):
    """Creates the whole CPM model
    Args:
        trunk: string, 'vgg19' or 'mobilenet'
    Returns: Module, the defined model
    """
    blocks = {}
    # block0 is the preprocessing stage
    if trunk == 'vgg19':
        block0 = [{'conv1_1': [3, 64, 3, 1, 1]},
                  {'conv1_2': [64, 64, 3, 1, 1]},
                  {'pool1_stage1': [2, 2, 0]},
                  {'conv2_1': [64, 128, 3, 1, 1]},
                  {'conv2_2': [128, 128, 3, 1, 1]},
                  {'pool2_stage1': [2, 2, 0]},
                  {'conv3_1': [128, 256, 3, 1, 1]},
                  {'conv3_2': [256, 256, 3, 1, 1]},
                  {'conv3_3': [256, 256, 3, 1, 1]},
                  {'conv3_4': [256, 256, 3, 1, 1]},
                  {'pool3_stage1': [2, 2, 0]},
                  {'conv4_1': [256, 512, 3, 1, 1]},
                  {'conv4_2': [512, 512, 3, 1, 1]},
                  {'conv4_3_CPM': [512, 256, 3, 1, 1]},
                  {'conv4_4_CPM': [256, 128, 3, 1, 1]}]

    elif trunk == 'mobilenet':
        block0 = [{'conv_bn': [3, 32, 2]},  # out: 3, 32, 184, 184
                  {'conv_dw1': [32, 64, 1]},  # out: 32, 64, 184, 184
                  {'conv_dw2': [64, 128, 2]},  # out: 64, 128, 92, 92
                  {'conv_dw3': [128, 128, 1]},  # out: 128, 256, 92, 92
                  {'conv_dw4': [128, 256, 2]},  # out: 256, 256, 46, 46
                  {'conv4_3_CPM': [256, 256, 1, 3, 1]},
                  {'conv4_4_CPM': [256, 128, 1, 3, 1]}]

    # Stage 1
    blocks['block1_1'] = [{'conv5_1_CPM_L1': [128, 128, 3, 1, 1]},
                          {'conv5_2_CPM_L1': [128, 128, 3, 1, 1]},
                          {'conv5_3_CPM_L1': [128, 128, 3, 1, 1]},
                          {'conv5_4_CPM_L1': [128, 512, 1, 1, 0]},
                          {'conv5_5_CPM_L1': [512, 38, 1, 1, 0]}]

    blocks['block1_2'] = [{'conv5_1_CPM_L2': [128, 128, 3, 1, 1]},
                          {'conv5_2_CPM_L2': [128, 128, 3, 1, 1]},
                          {'conv5_3_CPM_L2': [128, 128, 3, 1, 1]},
                          {'conv5_4_CPM_L2': [128, 512, 1, 1, 0]},
                          {'conv5_5_CPM_L2': [512, 19, 1, 1, 0]}]

    # Stages 2 - 6
    for i in range(2, 7):
        blocks['block%d_1' % i] = [
            {'Mconv1_stage%d_L1' % i: [185, 128, 7, 1, 3]},
            {'Mconv2_stage%d_L1' % i: [128, 128, 7, 1, 3]},
            {'Mconv3_stage%d_L1' % i: [128, 128, 7, 1, 3]},
            {'Mconv4_stage%d_L1' % i: [128, 128, 7, 1, 3]},
            {'Mconv5_stage%d_L1' % i: [128, 128, 7, 1, 3]},
            {'Mconv6_stage%d_L1' % i: [128, 128, 1, 1, 0]},
            {'Mconv7_stage%d_L1' % i: [128, 38, 1, 1, 0]}
        ]

        blocks['block%d_2' % i] = [
            {'Mconv1_stage%d_L2' % i: [185, 128, 7, 1, 3]},
            {'Mconv2_stage%d_L2' % i: [128, 128, 7, 1, 3]},
            {'Mconv3_stage%d_L2' % i: [128, 128, 7, 1, 3]},
            {'Mconv4_stage%d_L2' % i: [128, 128, 7, 1, 3]},
            {'Mconv5_stage%d_L2' % i: [128, 128, 7, 1, 3]},
            {'Mconv6_stage%d_L2' % i: [128, 128, 1, 1, 0]},
            {'Mconv7_stage%d_L2' % i: [128, 19, 1, 1, 0]}
        ]

    models = {}

    if trunk == 'vgg19':
        print("Bulding VGG19")
        models['block0'] = make_vgg19_block(block0)

    for k, v in blocks.items():
        models[k] = make_stages(list(v))

    class rtpose_model(nn.Module):
        def __init__(self, model_dict):
            super(rtpose_model, self).__init__()
            self.model0 = model_dict['block0']
            self.model1_1 = model_dict['block1_1']
            self.model2_1 = model_dict['block2_1']
            self.model3_1 = model_dict['block3_1']
            self.model4_1 = model_dict['block4_1']
            self.model5_1 = model_dict['block5_1']
            self.model6_1 = model_dict['block6_1']

            self.model1_2 = model_dict['block1_2']
            self.model2_2 = model_dict['block2_2']
            self.model3_2 = model_dict['block3_2']
            self.model4_2 = model_dict['block4_2']
            self.model5_2 = model_dict['block5_2']
            self.model6_2 = model_dict['block6_2']

            self._initialize_weights_norm()

        def forward(self, x):
            saved_for_loss = []
            out1 = self.model0(x)#46*46的特征图
            out1_1 = self.model1_1(out1) #PAF输出
            out1_2 = self.model1_2(out1) #关键点输出
            out2 = torch.cat([out1_1, out1_2, out1], 1)
            saved_for_loss.append(out1_1)
            saved_for_loss.append(out1_2)

            out2_1 = self.model2_1(out2)
            out2_2 = self.model2_2(out2)
            out3 = torch.cat([out2_1, out2_2, out1], 1)
            saved_for_loss.append(out2_1)
            saved_for_loss.append(out2_2)

            out3_1 = self.model3_1(out3)
            out3_2 = self.model3_2(out3)
            out4 = torch.cat([out3_1, out3_2, out1], 1)
            saved_for_loss.append(out3_1)
            saved_for_loss.append(out3_2)

            out4_1 = self.model4_1(out4)
            out4_2 = self.model4_2(out4)
            out5 = torch.cat([out4_1, out4_2, out1], 1)
            saved_for_loss.append(out4_1)
            saved_for_loss.append(out4_2)

            out5_1 = self.model5_1(out5)
            out5_2 = self.model5_2(out5)
            out6 = torch.cat([out5_1, out5_2, out1], 1)
            saved_for_loss.append(out5_1)
            saved_for_loss.append(out5_2)

            out6_1 = self.model6_1(out6)
            out6_2 = self.model6_2(out6)
            saved_for_loss.append(out6_1)
            saved_for_loss.append(out6_2)
            return (out6_1, out6_2), saved_for_loss

        def _initialize_weights_norm(self):

            for m in self.modules():
                if isinstance(m, nn.Conv2d):
                    init.normal_(m.weight, std=0.01)
                    if m.bias is not None:  # mobilenet conv2d doesn't add bias
                        init.constant_(m.bias, 0.0)

            # last layer of these block don't have Relu
            init.normal_(self.model1_1[8].weight, std=0.01)
            init.normal_(self.model1_2[8].weight, std=0.01)

            init.normal_(self.model2_1[12].weight, std=0.01)
            init.normal_(self.model3_1[12].weight, std=0.01)
            init.normal_(self.model4_1[12].weight, std=0.01)
            init.normal_(self.model5_1[12].weight, std=0.01)
            init.normal_(self.model6_1[12].weight, std=0.01)

            init.normal_(self.model2_2[12].weight, std=0.01)
            init.normal_(self.model3_2[12].weight, std=0.01)
            init.normal_(self.model4_2[12].weight, std=0.01)
            init.normal_(self.model5_2[12].weight, std=0.01)
            init.normal_(self.model6_2[12].weight, std=0.01)

    model = rtpose_model(models)
    return model