​SMART: Robust and Efficient Fine-Tuning for Pre-trainedNatural Language Models through Principled RegularizedOptimization​

Smoothness-inducing Adversarial Regularization

fine-tunning的优化如下
SMART: Robust and Efficient Fine-Tuning for Pre-trainedNatural Language Models_深度学习

  • SMART: Robust and Efficient Fine-Tuning for Pre-trainedNatural Language Models_正则化_02是fine-tunning参数
  • SMART: Robust and Efficient Fine-Tuning for Pre-trainedNatural Language Models_机器学习_03是Smoothness-inducing Adversarial正则项
  • SMART: Robust and Efficient Fine-Tuning for Pre-trainedNatural Language Models_机器学习_04
  • SMART: Robust and Efficient Fine-Tuning for Pre-trainedNatural Language Models_机器学习_05就是描述两个分布相似度的
  • 如果是回归模型就把上面的SMART: Robust and Efficient Fine-Tuning for Pre-trainedNatural Language Models_SMART_06改成SMART: Robust and Efficient Fine-Tuning for Pre-trainedNatural Language Models_SMART_07
  • 这里大致参照了VAT中,将对抗的地方设置为正则项,来平滑数据点,可以参见VAT。
  • 这样使得在一定的扰动下,输出一样的分布,增强模型的鲁棒性

Bregman Proximal Point Optimization

  • 我们使用类Bregman Proximal Point Optimization的方式来解决上面fine-tunning的优化,每次迭代的时候将入一个强惩罚项来避免模型调整过激,让模型学习到的流行更加光滑,让loss呈线性变化,增强对扰动的抵抗能力,避免灾难性遗忘
    SMART: Robust and Efficient Fine-Tuning for Pre-trainedNatural Language Models_NLP对抗_08
  • 加入动量加速
    SMART: Robust and Efficient Fine-Tuning for Pre-trainedNatural Language Models_正则化_09
  • 就是做个滑动平均,SMART: Robust and Efficient Fine-Tuning for Pre-trainedNatural Language Models_深度学习_10是动量参数

最终

  • 最终的损失函数为
    SMART: Robust and Efficient Fine-Tuning for Pre-trainedNatural Language Models_深度学习_11
    伪代码如上

实验

集成模型上,用这些fine-tunning后,结合MT-DNN达到当时的SOTA
单模型上,和RoBERTa结合达到SOTA

总结

  • 论文非常精短,但是效果却很好
  • 提供了NLP对抗性训练的新思路,尤其是fine-tunning的思路,加入对抗性正则项这个思路,能对后续工作有较大启发