pytorch计算条件熵 pytorchlinear函数

转载

网猴儿 2024-01-11 10:39:07

文章标签 pytorch计算条件熵 python 深度学习神经网络初始化 文章分类 PyTorch 人工智能

导入方式：

torch.nn.Linear(features_in, features_out, bias=False)

参数说明：

features_in其实就是输入的神经元个数，features_out就是输出神经元个数，bias默认为True，这里为了表达方便，就写了False，一般调用都是torch.nn.Linear(10, 5)，就是输入10个，输出5个神经元，且考虑偏置。

该函数实现的功能：就是制造出一个全连接层的框架，即y=X*W.T + b，对给定一个具体的输入X，就会输出相应的y

下面我们来看看源码：

class Linear(Module):
   ...
   __constants__ = ['bias']

   def __init__(self, in_features, out_features, bias=True):
   super(Linear, self).__init__()
   self.in_features = in_features
   self.out_features = out_features
   self.weight = Parameter(torch.Tensor(out_features, in_features))
   if bias:
   self.bias = Parameter(torch.Tensor(out_features))
   else:
   self.register_parameter('bias', None)
   self.reset_parameters()
   ...
note:可以看到两个重要的属性self.weight、self.bias

@weak_script_method
def forward(self, input):
return F.linear(input, self.weight, self.bias)
note:这是用来计算y=X*W.T + b的前向传播

接下来我用一个具体的例子来实现这个全连接层

import torch
x = torch.rand(10, 5)  # 默认下required_grad=False
Affine_layers = torch.nn.Linear(5, 3)
Out = Affine_layers(x)  # ******************

print(Out, Out.size())
print(Affine_layers.weight, Affine_layers.bias)

输出结果：

pytorch计算条件熵 pytorchlinear函数_初始化

从上面可以看出，输出y是(10,3) ， weight是(3,5) ，b是(1,3)，且w、b的requires_grad均为True

总结：

这个函数Linear做了什么工作呢？

首先我们输入X，一般都是mini-batch形式，然后它会将配置一个转置之后可以与X进行矩阵乘法的W，当然还有偏置，形成y=X*W.T+b的Affine层（全连接层）结构。当我们对这个类实例化成对象Affine_layers并给定具体输入X的时候，该对象就会调用forward进行线性运算并输入结果。

在forward代码中，他的主要源码如下：

if input.dim() == 2 and bias is not None:
    # fused op is marginally faster
    ret = torch.addmm(bias, input, weight.t())
else:
    output = input.matmul(weight.t())
    if bias is not None:
        output += bias
    ret = output
return ret

从matmul可以看出全连接层的输入可以是二维张量，也可以是一维张量，但不能是标量，且bias可有可无。特别要注意的是：如果输入为一维张量，那么根据matmul的特性，输出也是一维张量。

例如：

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc = nn.Sequential(
            nn.Linear(1, 10),
            nn.ReLU(),
            nn.Linear(10, 2)
        )

    def forward(self, x):
        out = self.fc(x)
        return out

net = Net()
# x = torch.tensor(5, dtype=torch.float32)  # 0维不可以
x = torch.tensor([5], dtype=torch.float32)  # 1维可以
# x = torch.arange(6, dtype=torch.float32).view(6, 1)  # 二维可以
y = net(x)
print(y)

注意：

1、这里可能有刚接触python的同学看不懂上面打“******************”的地方，会认为为啥例化后实例对象进行着类似函数的操作。

其实是因为__call__(self, *args, **kwargs)这个魔法方法的存在，Linear这个类继承了nn.Module，Linear提供了自动的__call__实现，那么__call__实现代表了什呢？

这个是官方的解释：

object.__call__(self[, args...])

　　Called when the instance is “called” as a function; if this method is defined, x(arg1, arg2, ...) is a shorthand for x.__call__(arg1, arg2, ...).

简而言之，就是__call__实现了将例化对象当作函数使用

因此Affine_layers(x) 等价于 Affine_layers.forward(x)

理解了这个，就能明白为啥下面out1和out2的内容是相同的了

class LinearNet(nn.Module):
    def __init__(self, featuresin):
        super().__init__()
        self.linear = nn.Linear(featuresin, 1)  # 实例化对象做属性

    def forward(self, input):
        out = self.linear(input)
        return out


x = torch.rand(10, 5)
net = LinearNet(x.size()[1])  #实例化
out1 = net(x)
out2 = net.linear(x)  # out1=out2

2、

全连接层的初始化采用标准初始化（读者可以去看看源码）。也就是说pytorch对全连接层默认使用了标准初始化方式，好处是，不需要自己再去初始化，坏处就是，无法达到最优，比如我们常用Relu做激活函数，那么根据何凯明大神指出，relu函数最佳的初始化方式是He初始化，因此如果想要最优初始化参数的话，就必须要调用torch.nn.init模块中的初始化方式来重新对FC层进行初始化当然，默认的方式问题也不大，也可以一直使用的。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。