NLP如何计算loss 向量计算 loss值

转载

mob64ca14031c97 2023-12-11 15:47:42

文章标签 NLP如何计算loss 向量计算深度学习神经网络 pytorch 权重 文章分类 NLP 人工智能

1. BCELoss

2. CELoss

3. MSELoss

4. FocalLoss

5. DiceLoss

1. BCELoss

用于二分类任务，二值交叉熵(Binary Cross Entropy)。

公式如下，其中y是真实值，

是预测值：

注意：

真实值需要经过onehot编码成0，1值；
预测值可以通过sigmoid成为(0,1)区间；
真实值和预测值的维度一样。

使用方式如下：

class torch.nn.BCELoss


Examples::
        >>> loss = nn.BCELoss()
        >>> activation = nn.Sigmoid()  # to [0, 1]
        >>> input = torch.randn(3, requires_grad=True)  # 生成3个随机数，shape = torch.Size([3])
        >>> target = torch.empty(3).random_(2)  # 生成3个数，值为0 或者 1，shape = torch.Size([3])
        >>> output = loss(activation(input), target)  # m(input)激活，生成概率p，区间为(0, 1)
        >>> output.backward()

2. CELoss

用于多分类任务，交叉熵(Cross Entropy)。

公式如下，其中y是真实值，

是预测值：

使用方式如下：

class torch.nn.CrossEntropyLoss


It is useful when training a classification problem with `C` classes.
The `input` is expected to contain scores for each class.

Examples::

        >>> loss = nn.CrossEntropyLoss()
        >>> input = torch.randn(3, 5, requires_grad=True)  # 3行5列,每一行代码当前数据特征
        >>> target = torch.empty(3, dtype=torch.long).random_(5)  # 3个数，值范围是0,1...，4
        >>> output = loss(input, target)
        >>> output.backward()

支持多目标分割、多分类

def forward(self, predictive, target):
        """

        :param predictive: 分类任务:（b，c）, 语义分割任务: (b,c,h,w)
        :param target: 分类任务:（b，）, 语义分割任务: (b,c,h,w)
        :return:
        """
        # seg: predictive.shape [b,c,h,w] -> [n,c]. target.shape [b,c,h,w] -> [n, ]
        # cls: predictive.shape [b,c] -> [n,c]. target.shape [b,]-> [n, ]
        c = predictive.size()[1]
        if len(predictive.shape) == 4:  # seg, [b,c,h,w] -> [b*h*w, c]
            predictive = predictive.transpose(1, 2).transpose(2, 3).contiguous().view(-1, c)
            target = torch.argmax(target, dim=1)  # 输入是one hot. [b,c,h,w] -> [b, h, w]
            target = target.view(-1)  # [b, h, w] -> (n, )

        # predictive.shape == [n, c], target.shape == [n,]
        # cross_entropy will take one-hot operation and change target.shape to [n, c]
        return F.cross_entropy(predictive, target.long(), weight=self.weight, reduction=self.reduction)

3. MSELoss

计算均方误差 Mean Squared Error (squared L2 Norm)。

公式如下，其中y是真实值，

是预测值：

class torch.nn.MSELoss

Creates a criterion that measures the mean squared error (squared L2 norm) between
    each element in the input `x` and target `y`.

Examples::

        >>> loss = nn.MSELoss()
        >>> input = torch.randn(3, 5, requires_grad=True)
        >>> target = torch.randn(3, 5)
        >>> output = loss(input, target)
        >>> output.backward()

类别极度不均衡情况

4. FocalLoss

https://arxiv.org/pdf/1708.02002.pdf

二分类任务，Focal loss主要是为了解决one-stage目标检测中正负样本比例严重失衡的问题。该损失函数降低了大量简单负样本在训练中所占的权重，也可理解为一种困难样本挖掘。

NLP如何计算loss 向量计算 loss值_pytorch

Focal loss 修改自 cross entropy loss for binary classfication （严格来讲是BCE，而不是CE），二值交叉熵：

可见普通的交叉熵对于正样本而言，输出概率越大损失越小。对于负样本而言，输出概率越小则损失越小。

NLP如何计算loss 向量计算 loss值_神经网络_02

公式4是y = 1时的Focal loss和预测概率p之间关系。其中，gamma=0时，即为二值交叉熵函数loss=-log

, y = 1。Focal loss只是在原有函数上加个权重，我们可以单独画出y = 1的权重函数图像：

NLP如何计算loss 向量计算 loss值_NLP如何计算loss 向量计算_03

原始的BCE Loss函数如下左图，加上上图紫色线的权重后，Focal Loss如下右图：

NLP如何计算loss 向量计算 loss值_深度学习_04

NLP如何计算loss 向量计算 loss值_pytorch_05

画图所用到的代码：

import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 1)
y1, y2, y3, y4, y5 = (1-x)**0, (1-x)**0.5, (1-x)**1, (1-x)**2, (1-x)**5
plt.plot(x, y1, 'red')
plt.plot(x, y2, 'green')
plt.plot(x, y3, 'blue')
plt.plot(x, y4, 'yellow')
plt.plot(x, y5, 'purple')
# plt.title('line chart')
plt.xlabel('probability of ground truth class')
plt.ylabel('Weight Value') 
plt.show()

注意：用权重函数，加权BCELoss，则生成论文中的函数图像。由上图可知，其中gamma=2时，权重函数是个单调下降函数，预测的概率值较小时（即为难样本），Focal Loss所加的权重较大，使得整体loss变大，突出难样本；预测值概率值较大时（即为易样本），Focal Loss所加的权重较小，使得整体loss变小，这不要紧，因为预测值概率值较大（注意，此时y=1），我们就是希望loss较小。

import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 1)
y1, y2, y3, y4, y5 = (1-x)**0 * (-np.log(x)), (1-x)**0.5 * (-np.log(x)), (1-x)**1 * (-np.log(x)), (1-x)**2 * (-np.log(x)), (1-x)**5 * (-np.log(x))
plt.plot(x, y1, 'red')
plt.plot(x, y2, 'green')
plt.plot(x, y3, 'blue')
plt.plot(x, y4, 'yellow')
plt.plot(x, y5, 'purple')
# plt.title('line chart')
plt.xlabel('probability of ground truth class')
plt.ylabel('Focal loss') 
plt.show()

NLP如何计算loss 向量计算 loss值_pytorch_06

备注：论文中的图像如下

NLP如何计算loss 向量计算 loss值_神经网络_07

此外，加入平衡因子alpha，用来平衡正负样本本身的比例不均：文中alpha取0.25。用到缺陷检测时，有缺陷的样本非常少，同样可以加入平衡因子alpha。 In practice α may be set by inverse class fre-quency or treated as a hyperparameter to set by cross validation. 即alpha可以设置与样本数成反比，也可以视为一个超参数。

NLP如何计算loss 向量计算 loss值_神经网络_08

作为正类的有缺陷样本数较少，按照论文的意思应该是alpha大于0.5才对，那是没有

情况下才是这样的，因为Focal Loss受两个调节因子影响，不能但看一个，所以这个参数只是试出来的。

NLP如何计算loss 向量计算 loss值_深度学习_09

从论文中的实验部分可以得到验证，左边是只有alpha, 没有

加权的，所以apha=0.75时最好。右图可知gamma=2, alpha=0.25最好，正好验证了如下公式，因为难样本，即正样本往往远少于负样本，y=1时，focal loss是递减的，加上一个小点的权重0.25，loss会变大，则会突出正样本的学习。

NLP如何计算loss 向量计算 loss值_深度学习_10

import torch
import torch.nn as nn
 
#二分类
class FocalLoss(nn.Module):
 
    def __init__(self, gamma=2,alpha=0.25):
        super(FocalLoss, self).__init__()
        self.gamma = gamma
        self.alpha=alpha
    def forward(self, input, target):
        # input:size is M*2. M　is the batch　number
        # target:size is M.
        pt=torch.softmax(input,dim=1)
        p=pt[:,1]
        loss = -self.alpha*(1-p)**self.gamma*(target*torch.log(p))-\
               (1-self.alpha)*p**self.gamma*((1-target)*torch.log(1-p))
        return loss.mean()

5. DiceLoss

医学图像分割之 Dice Loss - AI备忘录

较适用于样本极度不均的情况，公式如下。

后面部分是Dice系数，是集合A和B相似度度量函数。
(1) 分子是矩阵A和B交集，逐个元素相乘（点乘），再求和。
(2) 分母是矩阵分别求和（矩阵内所有元素加起来），再相加。

对于二分类问题，GT 掩码图只有 0, 1 两个值，G为0，无论预测为多少，loss一直是1，不会改变，则梯度值为0，则不会更新权重，所以不会关注背景区域。

而对于G为1的区域，预测值越大，损失越小。所以会关注前景。

NLP如何计算loss 向量计算 loss值_深度学习_11

import torch.nn as nn
import torch.nn.functional as F

class SoftDiceLoss(nn.Module):
    def __init__(self, weight=None, size_average=True):
        super(SoftDiceLoss, self).__init__()
 
    def forward(self, logits, targets):
        num = targets.size(0)
        smooth = 1
        
        probs = F.sigmoid(logits)  # to (0, 1)
        m1 = probs.view(num, -1)  # (b,c,h,w)  -- (n, )
        m2 = targets.view(num, -1)  # (b,c,h,w)  -- (n, )
        intersection = (m1 * m2)  # A * B
 
        score = 2. * (intersection.sum(1) + smooth) / (m1.sum(1) + m2.sum(1) + smooth)
        score = 1 - score.sum() / num
        return score

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。