lstm cnn python 代码 lstm attention pytorch

转载

mob6454cc6ba5a5 2023-09-05 22:18:59

文章标签 lstm cnn python 代码神经网络全连接父类 文章分类 Python 后端开发

Bi-LSTM（attention）代码解析——基于Pytorch

以下为基于双向LSTM的的attention代码，采用pytorch编辑，接下来结合pytorch的语法和Attention的原理，对attention的代码进行介绍和解析。

import torch
import numpy as np
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import matplotlib.pyplot as plt
import torch.utils.data as Data

class BiLSTM_Attention(nn.Module):
    def __init__(self):
        super(BiLSTM_Attention, self).__init__()
        
        self.lstm = nn.LSTM(input_dim, n_hidden, bidirectional=True)
        self.out = nn.Linear(n_hidden * 2, output_dim)
    # lstm_output : [batch_size, n_step, n_hidden * num_directions(=2)], F matrix
    
    def attention_net(self, lstm_output, final_state):
        batch_size = len(lstm_output)
        hidden = final_state.view(batch_size, -1, 1)   # hidden : [batch_size, n_hidden * num_directions(=2), n_layer(=1)]
        attn_weights = torch.bmm(lstm_output, hidden).squeeze(2) # attn_weights : [batch_size, n_step]
        soft_attn_weights = F.softmax(attn_weights, 1)
        # context : [batch_size, n_hidden * num_directions(=2)]
        context = torch.bmm(lstm_output.transpose(1, 2), soft_attn_weights.unsqueeze(2)).squeeze(2)
        return context, soft_attn_weights 
    
    
    def forward(self, X):
        '''
        X: [batch_size, seq_len]
        '''
         # X : [batch_size, seq_len, input_dim]
        input = X.transpose(0, 1) # input : [seq_len, batch_size, embedding_dim]
        # final_hidden_state, final_cell_state : [num_layers(=1) * num_directions(=2), batch_size, n_hidden]
        output, (final_hidden_state, final_cell_state) = self.lstm(input)
        output = output.transpose(0, 1) # output : [batch_size, seq_len, n_hidden]
        attn_output, attention = self.attention_net(output, final_hidden_state)
        return self.out(attn_output), attention # model : [batch_size, num_classes], attention : [batch_size, n_step]

第一部分：pytorch的初始定义

class BiLSTM_Attention(nn.Module):
    def __init__(self):
        super(BiLSTM_Attention, self).__init__()

此部分首先定义一个名为 BiLSTM_Attention的python 类，继承 torch.nn 类中的 Module 的特性，此处利用的是python中的类的可继承性
KaTeX parse error: Expected group after '_' at position 1: _̲_init__ 为类的初始化函数，当创建了这个类的实例时就会调用该方法，可以利用KaTeX parse error: Expected group after '_' at position 1: _̲_init__
super() 函数是用于调用父类(超类)的一个方法。

super(BiLSTM_Attention, self).KaTeX parse error: Expected group after '_' at position 1: _̲_init__()

第二部分：计算参数的定义

self.lstm = nn.LSTM(input_dim, n_hidden, bidirectional=True)
self.out = nn.Linear(n_hidden * 2, output_dim)

这里定义的为神经网络的计算过程，分为两部分，LSTM部分和全连接神经网络部分

首先对nn.LSTM进行简单介绍：

LSTM函数的输入为三维变量：

第一维体现的是序列（sequence）结构,也就是序列的个数
第二维度体现的是batch_size，也就是一次性喂给网络多少个个例
第三维体现的是输入的元素（feature of input）

LSTM函数的输出也为三维变量：

output, (hn, cn)
其中 output为最后一层lstm的每个样本对应隐藏层的输出
hn 为最后一层隐含层所有神经元的输出值
hc 为最后一层隐含层所有神经元的记忆状态
这里需要说明的是，在每一个隐含层神经元中都具有hc和hn，来源于上一个神经元，其中hc与hn相乘输入到tanh中用来判断上一层的信息是否值得记忆，并作为下一层的记忆判断输入
举例：

定义LSTM = nn.LSEM(20, 40 , bidirectional=True), 其中输入的数据的时间维度为24，个例数为64，则 output的shape为torch.Size([64, 24, 80])，hn的shape为torch.Size([2, 64, 40])

LSTM神经网络的定义：采用的为 torch.nn 中的LSTM函数，其定义参数为输入特征数（input_dim）隐层神经元数（n_hidden）使用双向传播（bidirectional
）
全连接神经网络：采用 torch.nn 中的Linear函数定义，定义参数为：全连接层得输入特征数（n_hidden*2），输出特征数（output_dim）
这一部分需要注意得是，LSTM函数输出包括两部分，out和隐含层的输出，在定义中采用的为隐含层双向输出，因此在隐含层参数输入到nn.Linear中时，输入的特征数为 KaTeX parse error: Expected group after '_' at position 2: n_̲_hidden* 2

第三部分：attention机制的计算

def attention_net(self, lstm_output, final_state):
       batch_size = len(lstm_output)
       hidden = final_state.view(batch_size, -1, 1)  
       # hidden : [batch_size, n_hidden * num_directions(=2), n_layer(=1)]
       attn_weights = torch.bmm(lstm_output, hidden).squeeze(2) 
      # attn_weights : [batch_size, n_step]
      soft_attn_weights = F.softmax(attn_weights, 1)
      # context : [batch_size, n_hidden * num_directions(=2)]
      context = torch.bmm(lstm_output.transpose(1, 2), 
                                   soft_attn_weights.unsqueeze(2)).squeeze(2)
     return context, soft_attn_weights

这一部分为Attention机制实现的重点，逐句进行解析
attention_net的输入参数为： lstm_output 和 final_state

lstm_output为LSTM 最后隐含层神经元的对应输出
final_state为LSTM 最后隐层所有神经元的隐含输出

batch_size = len(lstm_output)：通过lstm_output的长度得到输入的样本个数
hidden = final_state.view(batch_size, -1, 1) ：将LSTM隐含层的隐含层状态输出值（双向）的shape转化为单向，就是将hn的形状：torch.Size([2, 64, 40]），转变为 torch.Size([64, 80, 1])
attn_weights = torch.bmm(lstm_output, hidden).squeeze(2)

这一步为对lstm_output和转变顺序后的hidden执行矩阵乘法，举例为： $lstm cnn python 代码 lstm attention pytorch_全连接$
squeeze(2) 是对输出结果进行降维，取消输出的第三维，得到的结果为：torch.Size([64, 24])

soft_attn_weights = F.softmax(attn_weights, 1)

这一步调用F中的 softmax函数对attn_weights进行处理

其中1为计算的维度，dim=1 时表示按列进行softmax，每一列的加和为1，dim=0时表示按行进行加和，每一行的加和为1

context = torch.bmm(lstm_output.transpose(1, 2), soft_attn_weights.unsqueeze(2)).squeeze(2)

执行矩阵乘法，其中lstm_output.transpose(1, 2) 为交换 lstm_output的第2、3维的顺序，soft_attn_weights.unsqueeze(2) 为对soft_attn_weights添加第三维度为1，.squeeze(2) 为取消计算结果的第三维度

第四部分为前向传播过程：

def forward(self, X):
        input = X
        input = input.transpose(0, 1) 
        # input : [seq_len, batch_size, embedding_dim]
        '''
        final_hidden_state, final_cell_state : 
        [num_layers(=1) * num_directions(=2), batch_size, n_hidden]
        '''
        output, (finalHiddenState, finalCellState) = self.lstm(input)
        #output, (finalHiddenState, finalCellState) = self.lstm2(input)
        output = output.transpose(0, 1) 
        # output : [batch_size, seq_len, n_hidden]
        attn_output, attention = self.attention_net(output, finalHiddenState)
        out = self.out(attn_output)
        out = out.unsqueeze(2)
        return out, attention 
        # model : [batch_size, num_classes], attention : [batch_size, n_step]

前向传播过程为数据每次输入后在神经网络中的传播过程
其中X为每次输入的数据，在训练中采用batch方法，每一组数据有64个，每个时次的特征数为54，共有24个时次，则X的shape为torch.Size([64, 24, 54])
input = input.transpose(0, 1)