​A Probabilistic Formulation of Unsupervised Text Style Transfer​

无监督的问题转化

A Probabilistic Formulation of Unsupervised Text Style Transfer_nlp是领域A Probabilistic Formulation of Unsupervised Text Style Transfer_style transfer_02的数据,A Probabilistic Formulation of Unsupervised Text Style Transfer_数据_03是领域A Probabilistic Formulation of Unsupervised Text Style Transfer_概率模型_04的数据,相同的上标表示平行语句

考虑引入latent sentence将其补成平行语料库,设A Probabilistic Formulation of Unsupervised Text Style Transfer_数据_05A Probabilistic Formulation of Unsupervised Text Style Transfer_style transfer_02的latent部分。A Probabilistic Formulation of Unsupervised Text Style Transfer_概率模型_07A Probabilistic Formulation of Unsupervised Text Style Transfer_概率模型_04的latent部分。

A Probabilistic Formulation of Unsupervised Text Style Transfer_style transfer_09


现在任务目标就变成从A Probabilistic Formulation of Unsupervised Text Style Transfer_自然语言处理_10推测A Probabilistic Formulation of Unsupervised Text Style Transfer_概率模型_11,也就是A Probabilistic Formulation of Unsupervised Text Style Transfer_style transfer_12

概率模型

直接学习A Probabilistic Formulation of Unsupervised Text Style Transfer_style transfer_12是很困难的,所以改成求联合概率A Probabilistic Formulation of Unsupervised Text Style Transfer_概率模型_14
因为我们的句子都要从latent层来生成,所以有
A Probabilistic Formulation of Unsupervised Text Style Transfer_概率模型_15

  • A Probabilistic Formulation of Unsupervised Text Style Transfer_自然语言处理_16A Probabilistic Formulation of Unsupervised Text Style Transfer_nlp_17A Probabilistic Formulation of Unsupervised Text Style Transfer_nlp_18A Probabilistic Formulation of Unsupervised Text Style Transfer_nlp_18A Probabilistic Formulation of Unsupervised Text Style Transfer_nlp_17的转换模型
  • A Probabilistic Formulation of Unsupervised Text Style Transfer_自然语言处理_21是对应的参数
  • A Probabilistic Formulation of Unsupervised Text Style Transfer_概率模型_22A Probabilistic Formulation of Unsupervised Text Style Transfer_自然语言处理_23是先验信息

相对应的对数概率
A Probabilistic Formulation of Unsupervised Text Style Transfer_概率模型_24

论文用seq2seq作为上述转化模型

理论上,模型应该对上述概率进行学习,由于较难计算这个概率,我们使用Amortized变分推断得到对数概率下界(ELBO)。

其实这里就是用VAE的那套理论

A Probabilistic Formulation of Unsupervised Text Style Transfer_自然语言处理_25


A Probabilistic Formulation of Unsupervised Text Style Transfer_数据_26A Probabilistic Formulation of Unsupervised Text Style Transfer_概率模型_27表示对模型真实后验A Probabilistic Formulation of Unsupervised Text Style Transfer_数据_28A Probabilistic Formulation of Unsupervised Text Style Transfer_nlp_29的近似

A Probabilistic Formulation of Unsupervised Text Style Transfer_nlp_30A Probabilistic Formulation of Unsupervised Text Style Transfer_数据_26都是A Probabilistic Formulation of Unsupervised Text Style Transfer_style transfer_02A Probabilistic Formulation of Unsupervised Text Style Transfer_概率模型_04的转化,所以参数可以共享

  • 所以有A Probabilistic Formulation of Unsupervised Text Style Transfer_数据_34
  • 同理有A Probabilistic Formulation of Unsupervised Text Style Transfer_数据_35

所以这里只需要训练两个编码器

更进一步

A Probabilistic Formulation of Unsupervised Text Style Transfer_概率模型_36

  • 一些在两个语料间互相转化的问题,可以使用同一个编码器与解码器,然后在中间使用一个对应域的embedding c来指明转化方向

梯度

由于重构项和KL损失项不好求梯度,这里使用Gumbel-softmax的方法来梯度估计,同时使用greedy decode的方式,不记录梯度来重构

自重构

  • 由于在训练刚开始的时候,编码解码器难以有好的结果,所以模型加入自重构损失
  • A Probabilistic Formulation of Unsupervised Text Style Transfer_概率模型_37
  • A Probabilistic Formulation of Unsupervised Text Style Transfer_数据_38是x和y的领域向量
  • A Probabilistic Formulation of Unsupervised Text Style Transfer_nlp_39是衰减参数,在k个epoch从1减到0,k在论文中是3