torch.nn.NLLLoss()

分类问题的损失函数中,经常会遇到torch.nn.NLLLOSS。torch.nn.NLLLOSS通常不被独立当作损失函数,而需要和softmax、log等运算组合当作损失函数。

  • Input形状:(N, C)
  • Target形状:(N)

torch.nn.NLLLOSS官方链接

1、源码

class NLLLoss(_WeightedLoss):
    r"""The negative log likelihood loss. It is useful to train a classification
    problem with `C` classes.

    If provided, the optional argument :attr:`weight` should be a 1D Tensor assigning
    weight to each of the classes. This is particularly useful when you have an
    unbalanced training set.

    The `input` given through a forward call is expected to contain
    log-probabilities of each class. `input` has to be a Tensor of size either
    :math:`(minibatch, C)` or :math:`(minibatch, C, d_1, d_2, ..., d_K)`
    with :math:`K \geq 1` for the `K`-dimensional case (described later).

    Obtaining log-probabilities in a neural network is easily achieved by
    adding a  `LogSoftmax`  layer in the last layer of your network.
    You may use `CrossEntropyLoss` instead, if you prefer not to add an extra
    layer.

    The `target` that this loss expects should be a class index in the range :math:`[0, C-1]`
    where `C = number of classes`; if `ignore_index` is specified, this loss also accepts
    this class index (this index may not necessarily be in the class range).

    The unreduced (i.e. with :attr:`reduction` set to ``'none'``) loss can be described as:

    .. math::
        \ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad
        l_n = - w_{y_n} x_{n,y_n}, \quad
        w_{c} = \text{weight}[c] \cdot \mathbb{1}\{c \not= \text{ignore\_index}\},

    where :math:`x` is the input, :math:`y` is the target, :math:`w` is the weight, and
    :math:`N` is the batch size. If :attr:`reduction` is not ``'none'``
    (default ``'mean'``), then

    .. math::
        \ell(x, y) = \begin{cases}
            \sum_{n=1}^N \frac{1}{\sum_{n=1}^N w_{y_n}} l_n, &
            \text{if reduction} = \text{`mean';}\\
            \sum_{n=1}^N l_n,  &
            \text{if reduction} = \text{`sum'.}
        \end{cases}

    Can also be used for higher dimension inputs, such as 2D images, by providing
    an input of size :math:`(minibatch, C, d_1, d_2, ..., d_K)` with :math:`K \geq 1`,
    where :math:`K` is the number of dimensions, and a target of appropriate shape
    (see below). In the case of images, it computes NLL loss per-pixel.

    Args:
        weight (Tensor, optional): a manual rescaling weight given to each
            class. If given, it has to be a Tensor of size `C`. Otherwise, it is
            treated as if having all ones.
        size_average (bool, optional): Deprecated (see :attr:`reduction`). By default,
            the losses are averaged over each loss element in the batch. Note that for
            some losses, there are multiple elements per sample. If the field :attr:`size_average`
            is set to ``False``, the losses are instead summed for each minibatch. Ignored
            when :attr:`reduce` is ``False``. Default: ``True``
        ignore_index (int, optional): Specifies a target value that is ignored
            and does not contribute to the input gradient. When
            :attr:`size_average` is ``True``, the loss is averaged over
            non-ignored targets.
        reduce (bool, optional): Deprecated (see :attr:`reduction`). By default, the
            losses are averaged or summed over observations for each minibatch depending
            on :attr:`size_average`. When :attr:`reduce` is ``False``, returns a loss per
            batch element instead and ignores :attr:`size_average`. Default: ``True``
        reduction (string, optional): Specifies the reduction to apply to the output:
            ``'none'`` | ``'mean'`` | ``'sum'``. ``'none'``: no reduction will
            be applied, ``'mean'``: the weighted mean of the output is taken,
            ``'sum'``: the output will be summed. Note: :attr:`size_average`
            and :attr:`reduce` are in the process of being deprecated, and in
            the meantime, specifying either of those two args will override
            :attr:`reduction`. Default: ``'mean'``

    Shape:
        - Input: :math:`(N, C)` where `C = number of classes`, or
          :math:`(N, C, d_1, d_2, ..., d_K)` with :math:`K \geq 1`
          in the case of `K`-dimensional loss.
        - Target: :math:`(N)` where each value is :math:`0 \leq \text{targets}[i] \leq C-1`, or
          :math:`(N, d_1, d_2, ..., d_K)` with :math:`K \geq 1` in the case of
          K-dimensional loss.
        - Output: scalar.
          If :attr:`reduction` is ``'none'``, then the same size as the target: :math:`(N)`, or
          :math:`(N, d_1, d_2, ..., d_K)` with :math:`K \geq 1` in the case
          of K-dimensional loss.

    Examples::

        >>> m = nn.LogSoftmax(dim=1)
        >>> loss = nn.NLLLoss()
        >>> # input is of size N x C = 3 x 5
        >>> input = torch.randn(3, 5, requires_grad=True)
        >>> # each element in target has to have 0 <= value < C
        >>> target = torch.tensor([1, 0, 4])
        >>> output = loss(m(input), target)
        >>> output.backward()
        >>>
        >>>
        >>> # 2D loss example (used, for example, with image inputs)
        >>> N, C = 5, 4
        >>> loss = nn.NLLLoss()
        >>> # input is of size N x C x height x width
        >>> data = torch.randn(N, 16, 10, 10)
        >>> conv = nn.Conv2d(16, C, (3, 3))
        >>> m = nn.LogSoftmax(dim=1)
        >>> # each element in target has to have 0 <= value < C
        >>> target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C)
        >>> output = loss(m(conv(data)), target)
        >>> output.backward()
    """
    __constants__ = ['ignore_index', 'reduction']
    ignore_index: int

    def __init__(self, weight: Optional[Tensor] = None, size_average=None, ignore_index: int = -100, reduce=None, reduction: str = 'mean') -> None:
        super(NLLLoss, self).__init__(weight, size_average, reduce, reduction)
        self.ignore_index = ignore_index

    def forward(self, input: Tensor, target: Tensor) -> Tensor:
        assert self.weight is None or isinstance(self.weight, Tensor)
        return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)

2、案例

from torch import nn
import torch

# nllloss首先需要初始化
nllloss = nn.NLLLoss(reduction='mean')  # 可选参数中有 reduction='mean', 'sum', 默认mean

# 在使用nllloss时,需要有两个张量,一个是预测向量,一个是label

# --------------------------- 1、predict shape为(1, category)的情况 ---------------------------
# predict则表示每个类别预测的概率,比如向量(2, 5, 3)则表示类别0,1,2预测的概率分别为(2, 5, 3)
predict01 = torch.Tensor([[2, 5, 3]])  # shape: (n, category)
# label的shape是n,表示了n个向量对应的正确类别,比如这里label为1,则表明向量(2, 5, 3)对应的类别是2
label01 = torch.tensor([2])  # shape: (n,)
# nllloss对两个向量的操作为,将predict中的向量,在label中对应的index取出,并取负号输出。label中为2,则取2, 5, 3中的第2位3, 取负号后输出。
loss01 = nllloss(predict01, label01)
print('loss01 = ', loss01)  # loss01 =  tensor(-3.)

# --------------------------- 2、predict shape为(n, category)的情况 ---------------------------
predict02 = torch.Tensor([[2, 5, 3],
                          [3, 1, 6]])
label02 = torch.tensor([1, 2])
# nllloss对两个向量的操作为,继续将predict中的向量,在label中对应的index取出,并取负号输出。
# label中为1,则取2, 5, 3中的第1位5,label第二位为2,则取出3, 1, 6的第2位6,
# 将第1位5、第2位6两数取平均后加负号后输出
loss02 = nllloss(predict02, label02)
print('loss02 = ', loss02)  # loss02 =  tensor(-5.5000)

打印结果:

loss01 =  tensor(-3.)
loss02 =  tensor(-5.5000)

torch.nn.CrossEntropyLoss(交叉熵损失函数)

nn.CrossEntropyLoss的关系可以描述为:softmax(x)+log(x)+nn.NLLLoss====>nn.CrossEntropyLoss

  • Input形状:(N, C)
  • Target形状:(N)

1、源码

class CrossEntropyLoss(_WeightedLoss):
    r"""This criterion combines :class:`~torch.nn.LogSoftmax` and :class:`~torch.nn.NLLLoss` in one single class.

    It is useful when training a classification problem with `C` classes.
    If provided, the optional argument :attr:`weight` should be a 1D `Tensor`
    assigning weight to each of the classes.
    This is particularly useful when you have an unbalanced training set.

    The `input` is expected to contain raw, unnormalized scores for each class.

    `input` has to be a Tensor of size either :math:`(minibatch, C)` or
    :math:`(minibatch, C, d_1, d_2, ..., d_K)`
    with :math:`K \geq 1` for the `K`-dimensional case (described later).

    This criterion expects a class index in the range :math:`[0, C-1]` as the
    `target` for each value of a 1D tensor of size `minibatch`; if `ignore_index`
    is specified, this criterion also accepts this class index (this index may not
    necessarily be in the class range).

    The loss can be described as:

    .. math::
        \text{loss}(x, class) = -\log\left(\frac{\exp(x[class])}{\sum_j \exp(x[j])}\right)
                       = -x[class] + \log\left(\sum_j \exp(x[j])\right)

    or in the case of the :attr:`weight` argument being specified:

    .. math::
        \text{loss}(x, class) = weight[class] \left(-x[class] + \log\left(\sum_j \exp(x[j])\right)\right)

    The losses are averaged across observations for each minibatch. If the
    :attr:`weight` argument is specified then this is a weighted average:

    .. math::
        \text{loss} = \frac{\sum^{N}_{i=1} loss(i, class[i])}{\sum^{N}_{i=1} weight[class[i]]}

    Can also be used for higher dimension inputs, such as 2D images, by providing
    an input of size :math:`(minibatch, C, d_1, d_2, ..., d_K)` with :math:`K \geq 1`,
    where :math:`K` is the number of dimensions, and a target of appropriate shape
    (see below).


    Args:
        weight (Tensor, optional): a manual rescaling weight given to each class.
            If given, has to be a Tensor of size `C`
        size_average (bool, optional): Deprecated (see :attr:`reduction`). By default,
            the losses are averaged over each loss element in the batch. Note that for
            some losses, there are multiple elements per sample. If the field :attr:`size_average`
            is set to ``False``, the losses are instead summed for each minibatch. Ignored
            when :attr:`reduce` is ``False``. Default: ``True``
        ignore_index (int, optional): Specifies a target value that is ignored
            and does not contribute to the input gradient. When :attr:`size_average` is
            ``True``, the loss is averaged over non-ignored targets.
        reduce (bool, optional): Deprecated (see :attr:`reduction`). By default, the
            losses are averaged or summed over observations for each minibatch depending
            on :attr:`size_average`. When :attr:`reduce` is ``False``, returns a loss per
            batch element instead and ignores :attr:`size_average`. Default: ``True``
        reduction (string, optional): Specifies the reduction to apply to the output:
            ``'none'`` | ``'mean'`` | ``'sum'``. ``'none'``: no reduction will
            be applied, ``'mean'``: the weighted mean of the output is taken,
            ``'sum'``: the output will be summed. Note: :attr:`size_average`
            and :attr:`reduce` are in the process of being deprecated, and in
            the meantime, specifying either of those two args will override
            :attr:`reduction`. Default: ``'mean'``

    Shape:
        - Input: :math:`(N, C)` where `C = number of classes`, or
          :math:`(N, C, d_1, d_2, ..., d_K)` with :math:`K \geq 1`
          in the case of `K`-dimensional loss.
        - Target: :math:`(N)` where each value is :math:`0 \leq \text{targets}[i] \leq C-1`, or
          :math:`(N, d_1, d_2, ..., d_K)` with :math:`K \geq 1` in the case of
          K-dimensional loss.
        - Output: scalar.
          If :attr:`reduction` is ``'none'``, then the same size as the target:
          :math:`(N)`, or
          :math:`(N, d_1, d_2, ..., d_K)` with :math:`K \geq 1` in the case
          of K-dimensional loss.

    Examples::

        >>> loss = nn.CrossEntropyLoss()
        >>> input = torch.randn(3, 5, requires_grad=True)
        >>> target = torch.empty(3, dtype=torch.long).random_(5)
        >>> output = loss(input, target)
        >>> output.backward()
    """
    __constants__ = ['ignore_index', 'reduction']
    ignore_index: int

    def __init__(self, weight: Optional[Tensor] = None, size_average=None, ignore_index: int = -100, reduce=None, reduction: str = 'mean') -> None:
        super(CrossEntropyLoss, self).__init__(weight, size_average, reduce, reduction)
        self.ignore_index = ignore_index

    def forward(self, input: Tensor, target: Tensor) -> Tensor:
        assert self.weight is None or isinstance(self.weight, Tensor)
        return F.cross_entropy(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)

2、案例

2.1 直接使用 nn.CrossEntropyLoss

from torch import nn
import torch

# crossentropyloss首先需要初始化
crossentropyloss = nn.CrossEntropyLoss(reduction='mean')  # 可选参数中有 reduction='mean', 'sum', 默认mean

# 在使用crossentropyloss时,需要有两个张量,一个是预测向量,一个是label

# --------------------------- 1、predict shape为(1, category)的情况 ---------------------------
# predict则表示每个类别预测的概率,比如向量(2, 5, 3)则表示类别0,1,2预测的概率分别为(2, 5, 3)
predict01 = torch.Tensor([[2, 5, 3]])  # shape: (n, category)
# label的shape是n,表示了n个向量对应的正确类别,比如这里label为1,则表明向量(2, 5, 3)对应的类别是2
label01 = torch.tensor([2])  # shape: (n,)
loss01 = crossentropyloss(predict01, label01)
print('loss01 = ', loss01)  # loss01 =  tensor(2.1698)

# --------------------------- 2、predict shape为(n, category)的情况 ---------------------------
predict02 = torch.Tensor([[2, 5, 3],
                          [3, 1, 6]])
label02 = torch.tensor([1, 2])
loss02 = crossentropyloss(predict02, label02)
print('loss02 = ', loss02)  # loss02 =  tensor(0.1124)

打印结果:

loss01 =  tensor(2.1698)
loss02 =  tensor(0.1124)

2.2 使用softmax(x)+log(x)+nn.NLLLoss实现nn.CrossEntropyLoss相同效果

from torch import nn
import torch

# nllloss首先需要初始化
nllloss = nn.NLLLoss(reduction='mean')  # 可选参数中有 reduction='mean', 'sum', 默认mean
softmax_fn = nn.Softmax(dim=1)

# 在使用nllloss时,需要有两个张量,一个是预测向量,一个是label

# --------------------------- 1、predict shape为(1, category)的情况 ---------------------------
# predict则表示每个类别预测的概率,比如向量(2, 5, 3)则表示类别0,1,2预测的概率分别为(2, 5, 3)
predict01 = torch.Tensor([[2, 5, 3]])  # shape: (n, category)
# 计算输入softmax,此时可以看到每一行加到一起结果都是1
soft_output01 = softmax_fn(predict01)
print("soft_output01 = ", soft_output01)
# 在softmax的基础上取log
log_output01 = torch.log(soft_output01)
print("log_output01 = ", log_output01)

# label的shape是n,表示了n个向量对应的正确类别,比如这里label为1,则表明向量(2, 5, 3)对应的类别是2
label01 = torch.tensor([2])  # shape: (n,)
loss01 = nllloss(log_output01, label01)
print('\nloss01 = ', loss01)  # loss01 =  tensor(-3.)

# --------------------------- 2、predict shape为(n, category)的情况 ---------------------------
predict02 = torch.Tensor([[2, 5, 3],
                          [3, 1, 6]])
# 计算输入softmax,此时可以看到每一行加到一起结果都是1
soft_output02 = softmax_fn(predict02)
print("soft_output02 = ", soft_output02)
# 在softmax的基础上取log
log_output02 = torch.log(soft_output02)
print("log_output02 = ", log_output02)

label02 = torch.tensor([1, 2])
loss02 = nllloss(log_output02, label02)
print('\nloss02 = ', loss02)  # loss02 =  tensor(-5.5000)

打印结果:

soft_output01 =  tensor([[0.0420, 0.8438, 0.1142]])
log_output01 =  tensor([[-3.1698, -0.1698, -2.1698]])

loss01 =  tensor(2.1698)

soft_output02 =  tensor([[0.0420, 0.8438, 0.1142],
        [0.0471, 0.0064, 0.9465]])
log_output02 =  tensor([[-3.1698, -0.1698, -2.1698],
        [-3.0550, -5.0550, -0.0550]])

loss02 =  tensor(0.1124)

通过上面的结果可以看出,直接使用pytorch中的loss_func=nn.CrossEntropyLoss()计算得到的结果与softmax-log-NLLLoss计算得到的结果是一致的。

torch.nn.BCELoss

Input 与 Target 的形状一致,都为 (N, *)

1、源码

class BCELoss(_WeightedLoss):
    r"""Creates a criterion that measures the Binary Cross Entropy
    between the target and the output:

    The unreduced (i.e. with :attr:`reduction` set to ``'none'``) loss can be described as:

    .. math::
        \ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad
        l_n = - w_n \left[ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right],

    where :math:`N` is the batch size. If :attr:`reduction` is not ``'none'``
    (default ``'mean'``), then

    .. math::
        \ell(x, y) = \begin{cases}
            \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\
            \operatorname{sum}(L),  & \text{if reduction} = \text{`sum'.}
        \end{cases}

    This is used for measuring the error of a reconstruction in for example
    an auto-encoder. Note that the targets :math:`y` should be numbers
    between 0 and 1.

    Notice that if :math:`x_n` is either 0 or 1, one of the log terms would be
    mathematically undefined in the above loss equation. PyTorch chooses to set
    :math:`\log (0) = -\infty`, since :math:`\lim_{x\to 0} \log (x) = -\infty`.
    However, an infinite term in the loss equation is not desirable for several reasons.

    For one, if either :math:`y_n = 0` or :math:`(1 - y_n) = 0`, then we would be
    multiplying 0 with infinity. Secondly, if we have an infinite loss value, then
    we would also have an infinite term in our gradient, since
    :math:`\lim_{x\to 0} \frac{d}{dx} \log (x) = \infty`.
    This would make BCELoss's backward method nonlinear with respect to :math:`x_n`,
    and using it for things like linear regression would not be straight-forward.

    Our solution is that BCELoss clamps its log function outputs to be greater than
    or equal to -100. This way, we can always have a finite loss value and a linear
    backward method.


    Args:
        weight (Tensor, optional): a manual rescaling weight given to the loss
            of each batch element. If given, has to be a Tensor of size `nbatch`.
        size_average (bool, optional): Deprecated (see :attr:`reduction`). By default,
            the losses are averaged over each loss element in the batch. Note that for
            some losses, there are multiple elements per sample. If the field :attr:`size_average`
            is set to ``False``, the losses are instead summed for each minibatch. Ignored
            when :attr:`reduce` is ``False``. Default: ``True``
        reduce (bool, optional): Deprecated (see :attr:`reduction`). By default, the
            losses are averaged or summed over observations for each minibatch depending
            on :attr:`size_average`. When :attr:`reduce` is ``False``, returns a loss per
            batch element instead and ignores :attr:`size_average`. Default: ``True``
        reduction (string, optional): Specifies the reduction to apply to the output:
            ``'none'`` | ``'mean'`` | ``'sum'``. ``'none'``: no reduction will be applied,
            ``'mean'``: the sum of the output will be divided by the number of
            elements in the output, ``'sum'``: the output will be summed. Note: :attr:`size_average`
            and :attr:`reduce` are in the process of being deprecated, and in the meantime,
            specifying either of those two args will override :attr:`reduction`. Default: ``'mean'``

    Shape:
        - Input: :math:`(N, *)` where :math:`*` means, any number of additional
          dimensions
        - Target: :math:`(N, *)`, same shape as the input
        - Output: scalar. If :attr:`reduction` is ``'none'``, then :math:`(N, *)`, same
          shape as input.

    Examples::

        >>> m = nn.Sigmoid()
        >>> loss = nn.BCELoss()
        >>> input = torch.randn(3, requires_grad=True)
        >>> target = torch.empty(3).random_(2)
        >>> output = loss(m(input), target)
        >>> output.backward()
    """
    __constants__ = ['reduction']

    def __init__(self, weight: Optional[Tensor] = None, size_average=None, reduce=None, reduction: str = 'mean') -> None:
        super(BCELoss, self).__init__(weight, size_average, reduce, reduction)

    def forward(self, input: Tensor, target: Tensor) -> Tensor:
        assert self.weight is None or isinstance(self.weight, Tensor)
        return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)

2、案例

from torch import nn
import torch

# nllloss首先需要初始化
bce_loss = nn.BCELoss()  # 可选参数中有 reduction='mean', 'sum', 默认mean
sigmoid = nn.Sigmoid()

input = torch.Tensor([[2, 6, 7]])
m_input = sigmoid(input)
print('m_input = ', m_input)

target = torch.Tensor([[0, 1, 0]])

output = bce_loss(m_input, target)
print('output = ', output)

打印结果:

m_input =  tensor([[0.8808, 0.9975, 0.9991]])
output =  tensor(3.0435)



参考资料:
Pytorch十九种损失函数的使用详解详解torch.nn.NLLLOSSPytorch常用的交叉熵损失函数CrossEntropyLoss()详解