GRU 神经网络 GRU神经网络Python

转载

definitely 2023-05-18 10:46:22

文章标签 pytorch python gru 神经网络 Python 文章分类 神经网络人工智能

Pytorch GRU网络前向传递/Python实现（可运行）

一、背景

对于训练好的神经网络网络模型，实际使用时，只需要进行前向传递的计算过程即可，而不需要考虑反向传播过程。对于一些Hybrid模型如rnnoise降噪算法来说，为了将算法落地，需要在一些低功耗设备上进行神经网络的运算，这时候往往需要使用C语言。本文是个人的笔记，将简单介绍如何将GRU网络部署在Python语言上，进而拓展至C语言上（后续工作）。

二、Pytorch中的GRU网络信息

GRU网络的具体原理本文不再赘述，下面看看pytorch中GRU网络的计算过程：
$GRU 神经网络 GRU神经网络Python_python$
上面式子中 $GRU 神经网络 GRU神经网络Python_神经网络_02$ 便是sigmoid函数， $GRU 神经网络 GRU神经网络Python_pytorch_03$ 表示发当前时刻的输入， $GRU 神经网络 GRU神经网络Python_gru_04$ 表示上一时刻的隐状态。同时可以看出，共有6个权重项 $GRU 神经网络 GRU神经网络Python_python_05$ 与六个bias项 $GRU 神经网络 GRU神经网络Python_python_06$ 。Pytorch对参数信息的介绍如下：

~GRU.weight_ih_l[k] – the learnable input-hidden weights of the $GRU 神经网络 GRU神经网络Python_pytorch_07$ layer (W_ir|W_iz|W_in), of shape (3hidden_size, input_size) for k = 0. Otherwise, the shape is (3hidden_size, num_directions * hidden_size)
~GRU.weight_hh_l[k] – the learnable hidden-hidden weights of the $GRU 神经网络 GRU神经网络Python_pytorch_07$
~GRU.bias_ih_l[k] – the learnable input-hidden bias of the $GRU 神经网络 GRU神经网络Python_pytorch_07$
~GRU.bias_hh_l[k] – the learnable hidden-hidden bias of the $GRU 神经网络 GRU神经网络Python_pytorch_07$

只考虑单层，可以看出，对于每一层GRU网络，参数都由weight_ih, weight_hh, bias_ih和bias_hh四部分组成。以weight_ih和bias_ih为例：weight_ih是一个(3hidden_size, input_size)形状的矩阵，它由W_ir, W_iz, W_in这三个(input_size, hidden_size)形状的矩阵在dim0方向叠加而成。bias_ih是一个(3hidden_size, 1)形状的矩阵，它由b_ir, b_iz, b_in这三个(hidden_size, 1)形状的矩阵在dim0方向叠加而成。 $GRU 神经网络 GRU神经网络Python_pytorch_03$ 作为输入其形状为(input_size, 1)，因此 $GRU 神经网络 GRU神经网络Python_python_12$ 最终是一个(hidden_size, 1)的矩阵。类似的道理： $GRU 神经网络 GRU神经网络Python_神经网络_13$ 的结果也是一个(hidden_size, 1)的矩阵，后面计算 $GRU 神经网络 GRU神经网络Python_gru_14$ 和 $GRU 神经网络 GRU神经网络Python_Python_15$ 情况完全相同。

三、代码演示

网络参数提取
下面的代码构建了一个input_size=10, hidden_size=5的单层GRU网络（batch_first=True），初始参数随机，我们对网络参数进行输出：

import torch
from torch import nn
import torch.nn.functional as F


class GRUtest(nn.Module):
    def __init__(self, input, hidden, act):
        super().__init__()
        self.gru = nn.GRU(input, hidden, batch_first=True)
        if act == 'sigmoid':             # 激活函数后面未使用
            self.act = nn.Sigmoid()
        elif act == 'tanh':
            self.act = nn.Tanh()
        elif act == 'relu':
            self.act = nn.ReLU()

    def forward(self, x):  
        self.gru.flatten_parameters()
        gru_out, gru_state = self.gru(x)   
        return gru_out, gru_state
    
if __name__ == '__main__':
    insize = 10
    hsize = 5
    net1 = GRUtest(insize, hsize, 'tanh')
    for name, parameters in net1.named_parameters():
        print(name)
        print(parameters)

运行结果如下：

gru.weight_ih_l0
Parameter containing:
tensor([[-0.2723,  0.3715,  0.2461,  0.1564, -0.3429,  0.3451,  0.1402,  0.3094,
         -0.1759,  0.0948],
       ...
        [-0.2211, -0.3684,  0.1786, -0.0130, -0.0834, -0.0744, -0.3496,  0.1268,
          0.0111, -0.3086]], requires_grad=True)
gru.weight_hh_l0
Parameter containing:
tensor([[ 0.1683, -0.0090, -0.4325,  0.2406,  0.2392],
        ...
        [ 0.1703,  0.3895,  0.1127, -0.1311,  0.1465],
        [-0.0391, -0.3496, -0.1727,  0.2034,  0.0147]], requires_grad=True)
gru.bias_ih_l0
Parameter containing:
tensor([ 0.1650, -0.2618,  0.4228, -0.1866,  0.0954, -0.2185, -0.2157,  0.2003,
        -0.1248, -0.2836, -0.1828,  0.3261,  0.2692,  0.2722, -0.3817],
       requires_grad=True)
gru.bias_hh_l0
Parameter containing:
tensor([ 0.2106,  0.1117, -0.3007,  0.0141,  0.0894, -0.2416, -0.1887,  0.3648,
        -0.0361, -0.0047, -0.2830, -0.2674,  0.4117,  0.1664, -0.0708],
       requires_grad=True)

可以看出输出恰好就是四个矩阵，分别对应上面提到的weight_ih, weight_hh, bias_ih, bias_hh

前向计算python代码
为了验证计算结果，我们首先将一个随机的生成的GRU网络的参数输出并保存下来，接着使用pytorch自带的load函数加载模型、利用输出的参数自己写前向函数，比较这两种方法的结果。有一点需要注意：GRU没有输出门，也即对于某一层GRU网络而言，当 $GRU 神经网络 GRU神经网络Python_pytorch_16$ 进入网络后，经过一系列计算，隐状态 $GRU 神经网络 GRU神经网络Python_神经网络_17$ 被更新为 $GRU 神经网络 GRU神经网络Python_pytorch_18$ ， $GRU 神经网络 GRU神经网络Python_pytorch_18$ 就是这一层的输出，将每一时刻的 $GRU 神经网络 GRU神经网络Python_pytorch_20$ 拼接在一起就是GRU网络的总输出。代码如下：

import torch
from torch import nn
import numpy as np
import torch.nn.functional as F


weight_ih = torch.tensor([[ 0.3162,  0.0833,  0.1223,  0.4317, -0.2017,  0.1417, -0.1990,  0.3196,
          0.3572, -0.4123],
        [ 0.3818,  0.2136,  0.1949,  0.1841,  0.3718, -0.0590, -0.3782, -0.1283,
         -0.3150,  0.0296],
        [-0.0835, -0.2399, -0.0407,  0.4237, -0.0353,  0.0142, -0.0697,  0.0703,
          0.3985,  0.2735],
        [ 0.1587,  0.0972,  0.1054,  0.1728, -0.0578, -0.4156, -0.2766,  0.3817,
          0.0267, -0.3623],
        [ 0.0705,  0.3695, -0.4226, -0.3011, -0.1781,  0.0180, -0.1043, -0.0491,
         -0.4360,  0.2094],
        [ 0.3925,  0.2734, -0.3167, -0.3605,  0.1857,  0.0100,  0.1833, -0.4370,
         -0.0267,  0.3154],
        [ 0.2075,  0.0163,  0.0879, -0.0423, -0.2459, -0.1690, -0.2723,  0.3715,
          0.2461,  0.1564],
        [-0.3429,  0.3451,  0.1402,  0.3094, -0.1759,  0.0948,  0.4367,  0.3008,
          0.3587, -0.0939],
        [ 0.3407, -0.3503,  0.0387, -0.2518, -0.1043, -0.1145,  0.0335,  0.4070,
          0.2214, -0.0019],
        [ 0.3175, -0.2292,  0.2305, -0.0415, -0.0778,  0.0524, -0.3426,  0.0517,
          0.1504,  0.3823],
        [-0.1392,  0.1610,  0.4470, -0.1918,  0.4251, -0.2220,  0.1971,  0.1752,
          0.1249,  0.3537],
        [-0.1807,  0.1175,  0.0025, -0.3364, -0.1086, -0.2987,  0.1977,  0.0402,
          0.0438, -0.1357],
        [ 0.0022, -0.1391,  0.1285,  0.4343,  0.0677, -0.1981, -0.2732,  0.0342,
         -0.3318, -0.3361],
        [-0.2911, -0.1519,  0.0331,  0.3080,  0.1732,  0.3426, -0.2808,  0.0377,
         -0.3975,  0.2565],
        [ 0.0932,  0.4326, -0.3181,  0.3586,  0.3775,  0.3616,  0.0638,  0.4066,
          0.2987,  0.3337]])
weight_hh = torch.tensor([[-0.0291, -0.3432, -0.0056,  0.0839, -0.3046],
        [-0.2565, -0.4288, -0.1568,  0.3896,  0.0765],
        [-0.0273,  0.0180,  0.2789, -0.3949, -0.3451],
        [-0.1487, -0.2574,  0.2307,  0.3160, -0.4339],
        [-0.3795, -0.4355,  0.1687,  0.3599, -0.3467],
        [-0.2070,  0.1423, -0.2920,  0.3799,  0.1043],
        [-0.1245,  0.0290,  0.1394, -0.1581, -0.3465],
        [ 0.0030,  0.0081,  0.0090, -0.0653,  0.2871],
        [-0.1248, -0.0433,  0.1839, -0.2815,  0.1197],
        [-0.0989,  0.2145, -0.2426,  0.0165,  0.0438],
        [-0.3598, -0.3252,  0.1715, -0.1302,  0.2656],
        [-0.4418, -0.2211, -0.3684,  0.1786, -0.0130],
        [-0.0834, -0.0744, -0.3496,  0.1268,  0.0111],
        [-0.3086,  0.1683, -0.0090, -0.4325,  0.2406],
        [ 0.2392, -0.0843, -0.3088,  0.0180,  0.3375]])
bias_ih = torch.tensor([ 0.4094, -0.3376, -0.2020,  0.3482,  0.2186,  0.2768, -0.2226,  0.3853,
        -0.3676, -0.0215,  0.0093,  0.0751, -0.3375,  0.4103,  0.4395])
bias_hh = torch.tensor([-0.3088,  0.0165, -0.2382,  0.4288,  0.2494,  0.2634,  0.1443, -0.0445,
         0.2518,  0.0076, -0.1631,  0.2309,  0.1403, -0.1159, -0.1226])

class GRUtest(nn.Module):     # pytorch中的gru
    def __init__(self, input, hidden, act):
        super().__init__()
        self.gru = nn.GRU(input, hidden, batch_first=True)
        if act == 'sigmoid':
            self.act = nn.Sigmoid()
        elif act == 'tanh':
            self.act = nn.Tanh()
        elif act == 'relu':
            self.act = nn.ReLU()

    def forward(self, x):  
        self.gru.flatten_parameters()
        gru_out, gru_state = self.gru(x)   
        return gru_out, gru_state

class GRULayer:
    def __init__(self, input_size, hidden_size, act):
        self.bias_ih = bias_ih.reshape(-1)
        self.bias_hh = bias_hh.reshape(-1)
        self.weight_ih = weight_ih.reshape(-1)
        self.weight_hh = weight_hh.reshape(-1)
        self.nb_input = input_size
        self.nb_neurons = hidden_size
        self.activation = act


def compute_gru(gru, state, input):
    M = gru.nb_input
    N = gru.nb_neurons
    r = torch.zeros(N)
    z = torch.zeros(N)
    n = torch.zeros(N)
    h_new = torch.zeros(N)
    
    for i in range(N):
        sum = gru.bias_ih[0*N + i] +  gru.bias_hh[0*N + i]
        for j in range(M):
            sum += input[j] * gru.weight_ih[0*M*N + i*M + j]
        for j in range(N):
            sum += state[j] * gru.weight_hh[0*N*N + i*N + j] 
        r[i] = torch.sigmoid(sum)
    
    for i in range(N):
        sum = gru.bias_ih[1*N+i] +  gru.bias_hh[1*N+i]
        for j in range(M):
            sum += input[j] * gru.weight_ih[1*M*N + i*M + j]
        for j in range(N):
            sum += state[j] * gru.weight_hh[1*N*N + i*N + j] 
        z[i] = torch.sigmoid(sum)
    
    for i in range(N):
        sum = 0
        sum += gru.bias_ih[2*N+i]
        tmp = 0 
        for j in range(M):
            sum += input[j] * gru.weight_ih[2*M*N + i*M + j]
        for j in range(N):
            tmp += state[j] * gru.weight_hh[2*N*N + i*N + j]
        sum += r[i]*(tmp + gru.bias_hh[2*N+i])
        n[i] = torch.tanh(sum)
    
    for i in range(N):
        h_new[i] = (1 - z[i]) * n[i] + z[i] * state[i]
        state[i] = h_new[i]

b = torch.randn((1, 5, 10))
   

if __name__ == '__main__':
    insize = 10
    hsize = 5
    net1 = GRUtest(insize, hsize, 'tanh')
    model_ckpt1 = torch.load('./nn_test.pkl')    #根据路径需要及进行修改
    net1.load_state_dict(model_ckpt1.state_dict())
    gru = GRULayer(insize, hsize, 'tanh')      # 自己写的gru类，包含gru参数 
    out = torch.zeros((5, 5))    #用以保存计算结果
    state = torch.zeros(5)       #用来储存gidden_state的变量，初始化为0
    for i in range(5):
        input = b[0][i]
        compute_gru(gru, state, input)
        out[i] = state
    print("自己实现前向计算结果：")
    print(out)
    print("pytorch实现前向计算结果：")
    torch_out, _ = net1(b)
    print(torch_out)

计算结果为：

自己实现前向计算结果：
tensor([[-0.1810,  0.1028, -0.2076, -0.0975,  0.1328],
        [-0.2521, -0.4217,  0.1996,  0.4948,  0.2553],
        [-0.1471,  0.2741,  0.0375, -0.1926, -0.1080],
        [-0.7646,  0.0691, -0.1276,  0.0147, -0.0271],
        [-0.6323,  0.1059,  0.0936,  0.1193, -0.2436]])
pytorch实现前向计算结果：
tensor([[[-0.1810,  0.1028, -0.2076, -0.0975,  0.1328],
         [-0.2522, -0.4217,  0.1996,  0.4948,  0.2553],
         [-0.1471,  0.2741,  0.0375, -0.1926, -0.1079],
         [-0.7646,  0.0691, -0.1276,  0.0147, -0.0271],
         [-0.6323,  0.1059,  0.0937,  0.1193, -0.2436]]],
       grad_fn=<TransposeBackward1>)

可以看出，结果十分一致