pytorch调用训练好的Bert模型进行测试 bert pytorch源码

转载

mob64ca13f2b62d 2023-08-26 23:54:40

文章标签 bert pytorch源码关联类元组全局变量 文章分类 PyTorch 人工智能

这个应是最简单了解bert源代码的文章，看英语头晕的人和怕麻烦的人，适合看我这个，我不会拓展太多，每一个功能大致都会介绍。

文件定位在 pytorch-transformers/pytorch_transformers/modeling_bert.py

当然啦，依然需要一些预备知识attention、seq2seq、mask、embedding等才能快速看懂，比如我文中说的self-attention，语义向量，要知道是在说什么，不会解释，我就按照代码结构给大家解释整个意思，按顺序读就行。

包含的结构class类：BertModelforward 函数接收参数：inputs，segment，mask'(符号'是可以为None的意思)，position_ids'，head_mask'

输出：元组 (最后一层的隐变量，最后一层第一个token的隐变量，最后一层的隐变量或每一层attentions 权重参数)

方法过程：embedding(关联类BertEmbeddings)->encoder(关联类BertEncoder)->pooler(关联类BertPooler)

BertEmbeddingsforword 函数接收参数：inputs，segment'，position_ids'

输出：words+position+segment的embedding

方法过程：调用nn.Embedding构造words、position、segment的embedding -> 三个embedding相加 -> 规范化 LayerNorm(关联类BertLayerNorm)-> dropout

BertLayerNormforword 函数方法过程：规范化，不多说

BertEncoderforword 函数接收参数：hidden_states(由BetEmbeddings输出)，attention_mask，head_mask'

输出：元组 (最后一层隐变量+每层隐变量) 或者 (最后一层attention+每一层attention)

方法过程：调用modulelist类实例layer使得每一层输出(关联类BertLayer)-> 保存所有层的attention输出和隐变量 -> 返回元组，元组第一个是最后一层的attention或hidden，再往后是每层的。

BertLayerforword 函数接收参数：hidden_states(由上层BertLayer输出)，attention_mask，head_mask

输出：元组，(本层输出的隐变量，本层输出的attention)

方法过程：调用attention(关联类BertAttention)得到attention_outputs -> 取第一维attention_output[0]作为intermediate的参数 ->调用intermediate(关联类BertIntermediate)-> 调用output(关联类BertOutput)得到layer_output -> layer_output 和 attention_outputs[1:]合并成元组返回

BertAttentionforword 函数接收参数：input_tensor(就是BertLayer的hidden_states)，mask，head_mask'

输出：

方法过程：selfattention(关联类BertSelfAttention)得到 self_outputs-> 以self_outputs[0]作为参数调用selfoutput(关联类BertSelfOutput)得到 attention_output-> 返回元组(attention_output，self_outputs[1:] )第一个是语义向量，第二个是概率

BertSelfAttentionforword 函数接收参数：hidden_states(由BertLayer输出)， mask，head_mask'

输出：(context_layer语义向量，attention_prob概率)

方法过程：selfattention方法，不多说

BertSelfOutputforword 函数接收参数：hidden_states(由BertSelfAttention输出), input_tensor(就是BertAttention的input_tensor，也就是BertSelfAttention的输入)

输出：hidden_states

方法过程：对hidden_states加一层dense -> dropout -> 得到的hidden_states与input_tensor相加做LayerNorm(关联BertLayerNorm类) #这种做法说是为了避免梯度消失，也就是曾经的残差网络解决办法：output=output+Q

BertIntermediateforword 函数接收参数：hidden_states(由BertSelfAttention输出), input_tensor(就是BertAttention的input_tensor，也就是BertSelfAttention的输入)

输出：hidden_states

方法过程：对hidden_states加一层dense，向量输出大小为intermedia_size -> 调用intermediate_act_fn，这个函数是由config.hidden_act得来，是gelu、relu、swish方法中的一个 #中间层存在的意义：我翻了翻文献，推测是能够使模型从低至高学习到多层级信息，从表面信息到句法到语义。还有人研究说中间层的可迁移性更好。

BertOutputforword 函数接收参数：hidden_states(由BertSelfAttention输出), input_tensor(就是BertAttention的input_tensor，也就是BertSelfAttention的输入)

输出：hidden_states

方法过程：对hidden_states加一层dense ，由intermedia_size 又变回hidden_size -> dropout -> 得到的hidden_states与input_tensor相加做LayerNorm(关联BertLayerNorm类) #这种做法说是为了避免梯度消失，也就是曾经的残差网络解决办法：output=output+Q

BertPoolerforword 函数接收参数：hidden_states(由BertSelfAttention输出), input_tensor(就是BertAttention的input_tensor，也就是BertSelfAttention的输入)

输出：pooled_output 一个pool后的output

方法过程：简单取第一个token -> 加一层dense -> Tanh激活函数输出

——————————看到这里算是理解了整个encoding过程，下面的东西显得没那么重要，简单讲讲———————————BertConfig保存BERT的各种参数配置

BertOnlyMLMHead使用mask 方法训练语言模型时用的，返回预测值

过程：调用BertLMPredictionHead，返回的就是prediction_scores

BertLMPredictionHeaddecode功能

过程：调用BertPredictionHeadTransform -> linear层，输出维度是vocab_size

BertPredictionHeadTransform过程：dense -> 激活(gelu or relu or swish) -> LayerNorm

BertOnlyNSPHeadNSP策略训练模型用的，返回0或1

过程：添加linear层输出size是2，返回seq_relationship_score

BertPreTrainingHeadsMLM和NSP策略都写在里面，输入的是Encoder的输出sequence_output, pooled_output

返回(prediction_scores, seq_relationship_score)分别是MLM和NSP下的分值

BertPreTrainedModel从全局变量BERT_PRETRAINED_MODEL_ARCHIVE_MAP加载BERT模型的权重

BertForPreTraining计算score和loss

通过BertPreTrainingHeads，得到prediction后计算loss，然后反向传播。

BertForMaskedLM只有MLM策略的loss

BertForNextSentencePrediction只有NSP策略的loss

BertForSequenceClassification计算句子分类任务的loss

BertForMultipleChoice计算句子选择任务的loss

BertForTokenClassification计算对token分类or标注任务的loss

BertForQuestionAnswering计算问答任务的loss

全局变量BERT_START_DOCSTRING

BERT_INPUTS_DOCSTRING

BERT_PRETRAINED_CONFIG_ARCHIVE_MAP

BERT_PRETRAINED_MODEL_ARCHIVE_MAP

装饰器

总观结构，发现应该从BertModel 看起。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：java jna 释放 java 释放集合占用的内存

下一篇：ios 内存分配 ios 内存管理机制

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

pytorch调用训练好的Bert模型进行测试 bert pytorch源码

pytorch调用训练好的Bert模型进行测试 bert pytorch源码

51CTO博客