torch.nn.NLLLoss()
分类问题的损失函数中,经常会遇到torch.nn.NLLLOSS。torch.nn.NLLLOSS通常不被独立当作损失函数,而需要和softmax、log等运算组合当作损失函数。
- Input形状:(N, C)
- Target形状:(N)
1、源码
class NLLLoss(_WeightedLoss):
r"""The negative log likelihood loss. It is useful to train a classification
problem with `C` classes.
If provided, the optional argument :attr:`weight` should be a 1D Tensor assigning
weight to each of the classes. This is particularly useful when you have an
unbalanced training set.
The `input` given through a forward call is expected to contain
log-probabilities of each class. `input` has to be a Tensor of size either
:math:`(minibatch, C)` or :math:`(minibatch, C, d_1, d_2, ..., d_K)`
with :math:`K \geq 1` for the `K`-dimensional case (described later).
Obtaining log-probabilities in a neural network is easily achieved by
adding a `LogSoftmax` layer in the last layer of your network.
You may use `CrossEntropyLoss` instead, if you prefer not to add an extra
layer.
The `target` that this loss expects should be a class index in the range :math:`[0, C-1]`
where `C = number of classes`; if `ignore_index` is specified, this loss also accepts
this class index (this index may not necessarily be in the class range).
The unreduced (i.e. with :attr:`reduction` set to ``'none'``) loss can be described as:
.. math::
\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad
l_n = - w_{y_n} x_{n,y_n}, \quad
w_{c} = \text{weight}[c] \cdot \mathbb{1}\{c \not= \text{ignore\_index}\},
where :math:`x` is the input, :math:`y` is the target, :math:`w` is the weight, and
:math:`N` is the batch size. If :attr:`reduction` is not ``'none'``
(default ``'mean'``), then
.. math::
\ell(x, y) = \begin{cases}
\sum_{n=1}^N \frac{1}{\sum_{n=1}^N w_{y_n}} l_n, &
\text{if reduction} = \text{`mean';}\\
\sum_{n=1}^N l_n, &
\text{if reduction} = \text{`sum'.}
\end{cases}
Can also be used for higher dimension inputs, such as 2D images, by providing
an input of size :math:`(minibatch, C, d_1, d_2, ..., d_K)` with :math:`K \geq 1`,
where :math:`K` is the number of dimensions, and a target of appropriate shape
(see below). In the case of images, it computes NLL loss per-pixel.
Args:
weight (Tensor, optional): a manual rescaling weight given to each
class. If given, it has to be a Tensor of size `C`. Otherwise, it is
treated as if having all ones.
size_average (bool, optional): Deprecated (see :attr:`reduction`). By default,
the losses are averaged over each loss element in the batch. Note that for
some losses, there are multiple elements per sample. If the field :attr:`size_average`
is set to ``False``, the losses are instead summed for each minibatch. Ignored
when :attr:`reduce` is ``False``. Default: ``True``
ignore_index (int, optional): Specifies a target value that is ignored
and does not contribute to the input gradient. When
:attr:`size_average` is ``True``, the loss is averaged over
non-ignored targets.
reduce (bool, optional): Deprecated (see :attr:`reduction`). By default, the
losses are averaged or summed over observations for each minibatch depending
on :attr:`size_average`. When :attr:`reduce` is ``False``, returns a loss per
batch element instead and ignores :attr:`size_average`. Default: ``True``
reduction (string, optional): Specifies the reduction to apply to the output:
``'none'`` | ``'mean'`` | ``'sum'``. ``'none'``: no reduction will
be applied, ``'mean'``: the weighted mean of the output is taken,
``'sum'``: the output will be summed. Note: :attr:`size_average`
and :attr:`reduce` are in the process of being deprecated, and in
the meantime, specifying either of those two args will override
:attr:`reduction`. Default: ``'mean'``
Shape:
- Input: :math:`(N, C)` where `C = number of classes`, or
:math:`(N, C, d_1, d_2, ..., d_K)` with :math:`K \geq 1`
in the case of `K`-dimensional loss.
- Target: :math:`(N)` where each value is :math:`0 \leq \text{targets}[i] \leq C-1`, or
:math:`(N, d_1, d_2, ..., d_K)` with :math:`K \geq 1` in the case of
K-dimensional loss.
- Output: scalar.
If :attr:`reduction` is ``'none'``, then the same size as the target: :math:`(N)`, or
:math:`(N, d_1, d_2, ..., d_K)` with :math:`K \geq 1` in the case
of K-dimensional loss.
Examples::
>>> m = nn.LogSoftmax(dim=1)
>>> loss = nn.NLLLoss()
>>> # input is of size N x C = 3 x 5
>>> input = torch.randn(3, 5, requires_grad=True)
>>> # each element in target has to have 0 <= value < C
>>> target = torch.tensor([1, 0, 4])
>>> output = loss(m(input), target)
>>> output.backward()
>>>
>>>
>>> # 2D loss example (used, for example, with image inputs)
>>> N, C = 5, 4
>>> loss = nn.NLLLoss()
>>> # input is of size N x C x height x width
>>> data = torch.randn(N, 16, 10, 10)
>>> conv = nn.Conv2d(16, C, (3, 3))
>>> m = nn.LogSoftmax(dim=1)
>>> # each element in target has to have 0 <= value < C
>>> target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C)
>>> output = loss(m(conv(data)), target)
>>> output.backward()
"""
__constants__ = ['ignore_index', 'reduction']
ignore_index: int
def __init__(self, weight: Optional[Tensor] = None, size_average=None, ignore_index: int = -100, reduce=None, reduction: str = 'mean') -> None:
super(NLLLoss, self).__init__(weight, size_average, reduce, reduction)
self.ignore_index = ignore_index
def forward(self, input: Tensor, target: Tensor) -> Tensor:
assert self.weight is None or isinstance(self.weight, Tensor)
return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)
2、案例
from torch import nn
import torch
# nllloss首先需要初始化
nllloss = nn.NLLLoss(reduction='mean') # 可选参数中有 reduction='mean', 'sum', 默认mean
# 在使用nllloss时,需要有两个张量,一个是预测向量,一个是label
# --------------------------- 1、predict shape为(1, category)的情况 ---------------------------
# predict则表示每个类别预测的概率,比如向量(2, 5, 3)则表示类别0,1,2预测的概率分别为(2, 5, 3)
predict01 = torch.Tensor([[2, 5, 3]]) # shape: (n, category)
# label的shape是n,表示了n个向量对应的正确类别,比如这里label为1,则表明向量(2, 5, 3)对应的类别是2
label01 = torch.tensor([2]) # shape: (n,)
# nllloss对两个向量的操作为,将predict中的向量,在label中对应的index取出,并取负号输出。label中为2,则取2, 5, 3中的第2位3, 取负号后输出。
loss01 = nllloss(predict01, label01)
print('loss01 = ', loss01) # loss01 = tensor(-3.)
# --------------------------- 2、predict shape为(n, category)的情况 ---------------------------
predict02 = torch.Tensor([[2, 5, 3],
[3, 1, 6]])
label02 = torch.tensor([1, 2])
# nllloss对两个向量的操作为,继续将predict中的向量,在label中对应的index取出,并取负号输出。
# label中为1,则取2, 5, 3中的第1位5,label第二位为2,则取出3, 1, 6的第2位6,
# 将第1位5、第2位6两数取平均后加负号后输出
loss02 = nllloss(predict02, label02)
print('loss02 = ', loss02) # loss02 = tensor(-5.5000)
打印结果:
loss01 = tensor(-3.)
loss02 = tensor(-5.5000)
torch.nn.CrossEntropyLoss(交叉熵损失函数)
nn.CrossEntropyLoss的关系可以描述为:softmax(x)+log(x)+nn.NLLLoss====>nn.CrossEntropyLoss
- Input形状:(N, C)
- Target形状:(N)
1、源码
class CrossEntropyLoss(_WeightedLoss):
r"""This criterion combines :class:`~torch.nn.LogSoftmax` and :class:`~torch.nn.NLLLoss` in one single class.
It is useful when training a classification problem with `C` classes.
If provided, the optional argument :attr:`weight` should be a 1D `Tensor`
assigning weight to each of the classes.
This is particularly useful when you have an unbalanced training set.
The `input` is expected to contain raw, unnormalized scores for each class.
`input` has to be a Tensor of size either :math:`(minibatch, C)` or
:math:`(minibatch, C, d_1, d_2, ..., d_K)`
with :math:`K \geq 1` for the `K`-dimensional case (described later).
This criterion expects a class index in the range :math:`[0, C-1]` as the
`target` for each value of a 1D tensor of size `minibatch`; if `ignore_index`
is specified, this criterion also accepts this class index (this index may not
necessarily be in the class range).
The loss can be described as:
.. math::
\text{loss}(x, class) = -\log\left(\frac{\exp(x[class])}{\sum_j \exp(x[j])}\right)
= -x[class] + \log\left(\sum_j \exp(x[j])\right)
or in the case of the :attr:`weight` argument being specified:
.. math::
\text{loss}(x, class) = weight[class] \left(-x[class] + \log\left(\sum_j \exp(x[j])\right)\right)
The losses are averaged across observations for each minibatch. If the
:attr:`weight` argument is specified then this is a weighted average:
.. math::
\text{loss} = \frac{\sum^{N}_{i=1} loss(i, class[i])}{\sum^{N}_{i=1} weight[class[i]]}
Can also be used for higher dimension inputs, such as 2D images, by providing
an input of size :math:`(minibatch, C, d_1, d_2, ..., d_K)` with :math:`K \geq 1`,
where :math:`K` is the number of dimensions, and a target of appropriate shape
(see below).
Args:
weight (Tensor, optional): a manual rescaling weight given to each class.
If given, has to be a Tensor of size `C`
size_average (bool, optional): Deprecated (see :attr:`reduction`). By default,
the losses are averaged over each loss element in the batch. Note that for
some losses, there are multiple elements per sample. If the field :attr:`size_average`
is set to ``False``, the losses are instead summed for each minibatch. Ignored
when :attr:`reduce` is ``False``. Default: ``True``
ignore_index (int, optional): Specifies a target value that is ignored
and does not contribute to the input gradient. When :attr:`size_average` is
``True``, the loss is averaged over non-ignored targets.
reduce (bool, optional): Deprecated (see :attr:`reduction`). By default, the
losses are averaged or summed over observations for each minibatch depending
on :attr:`size_average`. When :attr:`reduce` is ``False``, returns a loss per
batch element instead and ignores :attr:`size_average`. Default: ``True``
reduction (string, optional): Specifies the reduction to apply to the output:
``'none'`` | ``'mean'`` | ``'sum'``. ``'none'``: no reduction will
be applied, ``'mean'``: the weighted mean of the output is taken,
``'sum'``: the output will be summed. Note: :attr:`size_average`
and :attr:`reduce` are in the process of being deprecated, and in
the meantime, specifying either of those two args will override
:attr:`reduction`. Default: ``'mean'``
Shape:
- Input: :math:`(N, C)` where `C = number of classes`, or
:math:`(N, C, d_1, d_2, ..., d_K)` with :math:`K \geq 1`
in the case of `K`-dimensional loss.
- Target: :math:`(N)` where each value is :math:`0 \leq \text{targets}[i] \leq C-1`, or
:math:`(N, d_1, d_2, ..., d_K)` with :math:`K \geq 1` in the case of
K-dimensional loss.
- Output: scalar.
If :attr:`reduction` is ``'none'``, then the same size as the target:
:math:`(N)`, or
:math:`(N, d_1, d_2, ..., d_K)` with :math:`K \geq 1` in the case
of K-dimensional loss.
Examples::
>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(5)
>>> output = loss(input, target)
>>> output.backward()
"""
__constants__ = ['ignore_index', 'reduction']
ignore_index: int
def __init__(self, weight: Optional[Tensor] = None, size_average=None, ignore_index: int = -100, reduce=None, reduction: str = 'mean') -> None:
super(CrossEntropyLoss, self).__init__(weight, size_average, reduce, reduction)
self.ignore_index = ignore_index
def forward(self, input: Tensor, target: Tensor) -> Tensor:
assert self.weight is None or isinstance(self.weight, Tensor)
return F.cross_entropy(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)
2、案例
2.1 直接使用 nn.CrossEntropyLoss
from torch import nn
import torch
# crossentropyloss首先需要初始化
crossentropyloss = nn.CrossEntropyLoss(reduction='mean') # 可选参数中有 reduction='mean', 'sum', 默认mean
# 在使用crossentropyloss时,需要有两个张量,一个是预测向量,一个是label
# --------------------------- 1、predict shape为(1, category)的情况 ---------------------------
# predict则表示每个类别预测的概率,比如向量(2, 5, 3)则表示类别0,1,2预测的概率分别为(2, 5, 3)
predict01 = torch.Tensor([[2, 5, 3]]) # shape: (n, category)
# label的shape是n,表示了n个向量对应的正确类别,比如这里label为1,则表明向量(2, 5, 3)对应的类别是2
label01 = torch.tensor([2]) # shape: (n,)
loss01 = crossentropyloss(predict01, label01)
print('loss01 = ', loss01) # loss01 = tensor(2.1698)
# --------------------------- 2、predict shape为(n, category)的情况 ---------------------------
predict02 = torch.Tensor([[2, 5, 3],
[3, 1, 6]])
label02 = torch.tensor([1, 2])
loss02 = crossentropyloss(predict02, label02)
print('loss02 = ', loss02) # loss02 = tensor(0.1124)
打印结果:
loss01 = tensor(2.1698)
loss02 = tensor(0.1124)
2.2 使用softmax(x)+log(x)+nn.NLLLoss实现nn.CrossEntropyLoss相同效果
from torch import nn
import torch
# nllloss首先需要初始化
nllloss = nn.NLLLoss(reduction='mean') # 可选参数中有 reduction='mean', 'sum', 默认mean
softmax_fn = nn.Softmax(dim=1)
# 在使用nllloss时,需要有两个张量,一个是预测向量,一个是label
# --------------------------- 1、predict shape为(1, category)的情况 ---------------------------
# predict则表示每个类别预测的概率,比如向量(2, 5, 3)则表示类别0,1,2预测的概率分别为(2, 5, 3)
predict01 = torch.Tensor([[2, 5, 3]]) # shape: (n, category)
# 计算输入softmax,此时可以看到每一行加到一起结果都是1
soft_output01 = softmax_fn(predict01)
print("soft_output01 = ", soft_output01)
# 在softmax的基础上取log
log_output01 = torch.log(soft_output01)
print("log_output01 = ", log_output01)
# label的shape是n,表示了n个向量对应的正确类别,比如这里label为1,则表明向量(2, 5, 3)对应的类别是2
label01 = torch.tensor([2]) # shape: (n,)
loss01 = nllloss(log_output01, label01)
print('\nloss01 = ', loss01) # loss01 = tensor(-3.)
# --------------------------- 2、predict shape为(n, category)的情况 ---------------------------
predict02 = torch.Tensor([[2, 5, 3],
[3, 1, 6]])
# 计算输入softmax,此时可以看到每一行加到一起结果都是1
soft_output02 = softmax_fn(predict02)
print("soft_output02 = ", soft_output02)
# 在softmax的基础上取log
log_output02 = torch.log(soft_output02)
print("log_output02 = ", log_output02)
label02 = torch.tensor([1, 2])
loss02 = nllloss(log_output02, label02)
print('\nloss02 = ', loss02) # loss02 = tensor(-5.5000)
打印结果:
soft_output01 = tensor([[0.0420, 0.8438, 0.1142]])
log_output01 = tensor([[-3.1698, -0.1698, -2.1698]])
loss01 = tensor(2.1698)
soft_output02 = tensor([[0.0420, 0.8438, 0.1142],
[0.0471, 0.0064, 0.9465]])
log_output02 = tensor([[-3.1698, -0.1698, -2.1698],
[-3.0550, -5.0550, -0.0550]])
loss02 = tensor(0.1124)
通过上面的结果可以看出,直接使用pytorch中的loss_func=nn.CrossEntropyLoss()计算得到的结果与softmax-log-NLLLoss计算得到的结果是一致的。
torch.nn.BCELoss
Input 与 Target 的形状一致,都为 (N, *)
1、源码
class BCELoss(_WeightedLoss):
r"""Creates a criterion that measures the Binary Cross Entropy
between the target and the output:
The unreduced (i.e. with :attr:`reduction` set to ``'none'``) loss can be described as:
.. math::
\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad
l_n = - w_n \left[ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right],
where :math:`N` is the batch size. If :attr:`reduction` is not ``'none'``
(default ``'mean'``), then
.. math::
\ell(x, y) = \begin{cases}
\operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\
\operatorname{sum}(L), & \text{if reduction} = \text{`sum'.}
\end{cases}
This is used for measuring the error of a reconstruction in for example
an auto-encoder. Note that the targets :math:`y` should be numbers
between 0 and 1.
Notice that if :math:`x_n` is either 0 or 1, one of the log terms would be
mathematically undefined in the above loss equation. PyTorch chooses to set
:math:`\log (0) = -\infty`, since :math:`\lim_{x\to 0} \log (x) = -\infty`.
However, an infinite term in the loss equation is not desirable for several reasons.
For one, if either :math:`y_n = 0` or :math:`(1 - y_n) = 0`, then we would be
multiplying 0 with infinity. Secondly, if we have an infinite loss value, then
we would also have an infinite term in our gradient, since
:math:`\lim_{x\to 0} \frac{d}{dx} \log (x) = \infty`.
This would make BCELoss's backward method nonlinear with respect to :math:`x_n`,
and using it for things like linear regression would not be straight-forward.
Our solution is that BCELoss clamps its log function outputs to be greater than
or equal to -100. This way, we can always have a finite loss value and a linear
backward method.
Args:
weight (Tensor, optional): a manual rescaling weight given to the loss
of each batch element. If given, has to be a Tensor of size `nbatch`.
size_average (bool, optional): Deprecated (see :attr:`reduction`). By default,
the losses are averaged over each loss element in the batch. Note that for
some losses, there are multiple elements per sample. If the field :attr:`size_average`
is set to ``False``, the losses are instead summed for each minibatch. Ignored
when :attr:`reduce` is ``False``. Default: ``True``
reduce (bool, optional): Deprecated (see :attr:`reduction`). By default, the
losses are averaged or summed over observations for each minibatch depending
on :attr:`size_average`. When :attr:`reduce` is ``False``, returns a loss per
batch element instead and ignores :attr:`size_average`. Default: ``True``
reduction (string, optional): Specifies the reduction to apply to the output:
``'none'`` | ``'mean'`` | ``'sum'``. ``'none'``: no reduction will be applied,
``'mean'``: the sum of the output will be divided by the number of
elements in the output, ``'sum'``: the output will be summed. Note: :attr:`size_average`
and :attr:`reduce` are in the process of being deprecated, and in the meantime,
specifying either of those two args will override :attr:`reduction`. Default: ``'mean'``
Shape:
- Input: :math:`(N, *)` where :math:`*` means, any number of additional
dimensions
- Target: :math:`(N, *)`, same shape as the input
- Output: scalar. If :attr:`reduction` is ``'none'``, then :math:`(N, *)`, same
shape as input.
Examples::
>>> m = nn.Sigmoid()
>>> loss = nn.BCELoss()
>>> input = torch.randn(3, requires_grad=True)
>>> target = torch.empty(3).random_(2)
>>> output = loss(m(input), target)
>>> output.backward()
"""
__constants__ = ['reduction']
def __init__(self, weight: Optional[Tensor] = None, size_average=None, reduce=None, reduction: str = 'mean') -> None:
super(BCELoss, self).__init__(weight, size_average, reduce, reduction)
def forward(self, input: Tensor, target: Tensor) -> Tensor:
assert self.weight is None or isinstance(self.weight, Tensor)
return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
2、案例
from torch import nn
import torch
# nllloss首先需要初始化
bce_loss = nn.BCELoss() # 可选参数中有 reduction='mean', 'sum', 默认mean
sigmoid = nn.Sigmoid()
input = torch.Tensor([[2, 6, 7]])
m_input = sigmoid(input)
print('m_input = ', m_input)
target = torch.Tensor([[0, 1, 0]])
output = bce_loss(m_input, target)
print('output = ', output)
打印结果:
m_input = tensor([[0.8808, 0.9975, 0.9991]])
output = tensor(3.0435)
参考资料:
Pytorch十九种损失函数的使用详解详解torch.nn.NLLLOSSPytorch常用的交叉熵损失函数CrossEntropyLoss()详解