pytorch 搭建概率神经网络 pytorch搭建自己的神经网络

原创

云端筑梦工匠 2023-08-08 10:46:01 ©著作权

文章标签 pytorch 搭建概率神经网络 pytorch 神经网络构造函数 2d 文章分类 PyTorch 人工智能

©著作权归作者所有：来自51CTO博客作者云端筑梦工匠的原创作品，请联系作者获取转载授权，否则将追究法律责任

1、nn.Module——搭建属于自己的神经网络

前言：

前面介绍了自定义一个模型的5种方式，大家应该更喜欢最后一种吧，可以快速搭建复杂的神经网络、能GPU运算、还能自动求导和更新参数，简直perfect!这样的好事怎么能不深入了解一下呢？
pytorch中对于一般的序列模型，直接使用torch.nn.Sequential类及可以实现，这点类似于keras，但是更多的时候面对复杂的模型，比如：多输入多输出、多分支模型、跨层连接模型、带有自定义层的模型等，就需要自己来定义一个模型了。
事实上，在pytorch里面自定义层也是通过继承自nn.Module类来实现的，pytorch里面一般是没有层的概念，层也是当成一个模型来处理的，这里和keras是不一样的。当然也可以直接通过继承torch.autograd.Function类来自定义一个层，但是这很不推荐，不提倡，至于为什么后面会介绍。记住一句话，keras更加注重的是层Layer、pytorch更加注重的是模型Module。
本文将详细说明如何让使用Mudule类来自定义一个模型，以及详细解读一下系统预设的一些神经网络层，看看源代码

注意：

我们当然也可以直接通过继承torch.autograd.Function类来自定义一个层，但是这很不推荐，不提倡，至于为什么后面会介绍。

本文仅仅先讨论使用Module来实现自定义模块，后面会接着讨论自定义层、自定义激活函数、自定义梯度下降算法等

**本文重点为nn.Module的使用

希望能自己独立动手搭建属于自己的独特的神经网络
可以以后私人定制化设计属于自己的激活函数层、神经网络变换层–包括线性变换和非线性变换，还有梯度更新、优化算法等）**
总结：pytorch里面一切自定义操作基本上都是继承nn.Module类来实现的

1.1、torch.nn.Module类概述

1.1.1、torch.nn.Module类的简介

想要学习一个类，最好的方式就先看看类的方法，来看一下torch.nn.Module中的Module类中封装了哪些方法？

Module类封装的方法简介

class Module(object):
#******************************************************************************************
    # 最重要__init__初始化方法，便于一些参数的传递
    def __init__(self):
    # 最重要的forward方法，便于进行前向传播
    def forward(self, *input):
#******************************************************************************************  

    # add_module方法，可以添加层
    def add_module(self, name, module):
    
    # 设置处理数据、计算使用的设备
    def cuda(self, device=None):
    def cpu(self):

#******************************************************************************************
    # __call__，非常重要的魔法方法，其中该方法中调用了forwrd方法，因此继承的时候，子类都要覆写forwrd方法
    # 为了实现将不同的神经网络层进行连接
    def __call__(self, *input, **kwargs):
#******************************************************************************************

    def parameters(self, recurse=True):
        
    def named_parameters(self, prefix='', recurse=True):
        
    def children(self):
        
    def named_children(self):
        
    def modules(self):  
        
    def named_modules(self, memo=None, prefix=''):
        
    def train(self, mode=True):
        
    def eval（self):
        
    def zero_grad(self):
        
    def __repr__(self):
        
    def __dir__(self):
        
'''
以上只是列出了最重要最常用的的一些方法，有一部分没有完全列出来
'''

设计神经网络的核心：构造函数init和forward方法

下面重点说一下，在设计神经网络模型的时候，最重要的两个方法：构造函数__init__和forward方法

我们在定义自已的网络的时候，需要继承nn.Module类，并重新实现构造函数__init__和forward这两个方法。但有一些注意技巧：

（1）一般把网络中具有可学习参数的层（如全连接层、卷积层等）放在构造函数__init__()中，当然也可以吧不具有参数的层也放在里面；

（2）

一般把不具有可学习参数的层(如ReLU、dropout、BatchNormanation层)可放在构造函数中，也可不放在构造函数中；
如果不具有可学习参数的层不放在构造函数__init__里面，则在forward方法里面可以使用nn.functional来代替

（3）forward方法是必须要重写的，它是实现模型的功能，实现各个层之间的连接关系的核心。

下面通过简单的例子来说明一下

import torch

# 神经网络模型都继承自torch.nn.Module类
class MyNet(torch.nn.Module):
    
    """    
    1. 构造函数的作用
        构造函数主要用来在创建对象时完成对对象属性的一些初始化等操作, 当创建
对象时, 对象会自动调用它的构造函数。一般来说, 构造函数有以下三个方面
的作用:
    - 给创建的对象建立一个标识符;
    - 为对象数据成员开辟内存空间;
    - 完成对象数据成员的初始化 
    """
    
    # 重写构造函数，构造函数也被称为构造器，当创建对象的时候第一个被自动调用的函数
    # 一旦创建对象，则会自动调用该构造函数，完成对象的属性的初始化设置
    def __init__(self):
        # 调用父类的构造函数
        ###################################第一步############################################
        #####################调用父类的构造函数，以继承父类的一些属性#############################
        super(MyNet, self).__init__()  
        
        ###################################第二步############################################
        #####################增加一些属性，比如Conv2d卷积层、ReLU激活函数层、MaxPool2d池化层#######   
        # 通过self.属性名的方式，给创建的初始化实例对象增加一些属性
        # 具体卷积层的涉及，留到以后再谈
        self.conv1 = torch.nn.Conv2d(3, 32, 3, 1, 1)
        self.relu1=torch.nn.ReLU()
        self.max_pooling1=torch.nn.MaxPool2d(2,1)
 
        self.conv2 = torch.nn.Conv2d(3, 32, 3, 1, 1)
        self.relu2=torch.nn.ReLU()
        self.max_pooling2=torch.nn.MaxPool2d(2,1)
 
        self.dense1 = torch.nn.Linear(32 * 3 * 3, 128)
        self.dense2 = torch.nn.Linear(128, 10)
 

        ###################################第三步############################################
        ##############覆写forward方法，实现前面定义的各个层的顺序连接，也就是完成来了前向传播##########
    def forward(self, x):
        x = self.conv1(x)
        x = self.relu1(x)
        x = self.max_pooling1(x)
        x = self.conv2(x)
        x = self.relu2(x)
        x = self.max_pooling2(x)
        x = self.dense1(x)
        x = self.dense2(x)
        # 前向传播结束后，返回最终得到的计算值
        return x

        ###################################第四步：创建一个对象###################################
        ##############该对象会自动调用构造函数__init__()，就是说该对象已经被添加了一些层的属性###########   
model = MyNet()
print(model)

'''运行结果为：
MyNet(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu1): ReLU()
  (max_pooling1): MaxPool2d(kernel_size=2, stride=1, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu2): ReLU()
  (max_pooling2): MaxPool2d(kernel_size=2, stride=1, padding=0, dilation=1, ceil_mode=False)
  (dense1): Linear(in_features=288, out_features=128, bias=True)
  (dense2): Linear(in_features=128, out_features=10, bias=True)
)
'''

注意：上面的是将所有的层都放在了构造函数__init__里面，但是只是定义了一系列的层，各个层之间到底是什么连接关系并没有，而是在forward里面实现所有层的连接关系，当然这里依然是顺序连接的。下面再来看一下一个例子

import torch
import torch.nn.functional as F
 
class MyNet(torch.nn.Module):
    def __init__(self):
        ############重点1：此处没有把Conv2d卷积层、ReLU激活函数层放入构造函数中，因为不含需要训练的参数#########
        super(MyNet, self).__init__()  # 第一步，调用父类的构造函数
        self.conv1 = torch.nn.Conv2d(3, 32, 3, 1, 1)
        self.conv2 = torch.nn.Conv2d(3, 32, 3, 1, 1)
 
        self.dense1 = torch.nn.Linear(32 * 3 * 3, 128)
        self.dense2 = torch.nn.Linear(128, 10)
        
         ########################重点2：由于构造函数中没有放一些不含训练参数的层###########################
         ###########因此在forward方法中可以使用torch.nn.functional来设置不含参数的层######################
    def forward(self, x):
        x = self.conv1(x)
        x = F.relu(x)
        x = F.max_pool2d(x)
        x = self.conv2(x)
        x = F.relu(x)
        x = F.max_pool2d(x)
        x = self.dense1(x)
        x = self.dense2(x)
        return x
 
model = MyNet()
print(model)
'''运行结果为：
MyNet(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (conv2): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (dense1): Linear(in_features=288, out_features=128, bias=True)
  (dense2): Linear(in_features=128, out_features=10, bias=True)
)
'''

注意：此时，将没有训练参数的层放在构造函数之外了，所以这些层就不会出现在model里面，但是运行关系是在forward里面通过functional的方法实现的
总结：

所有放在构造函数__init__里面的层的都是这个模型的“固有属性.
实际上，由于__init__是python类的内置方法，在init构造函数中新增加的属性会存储在类的内置属性dict中，就是存储在一个字典里面.

# 在init构造函数中新增加的属性会存储在类的内置属性__dict__中
model.__dict__

1.2、搭建神经网络–torch.nn.Module类的不同方式应用

以上部分的神经网络模型的设计是为了简单的演示，但是Module类的使用是非常灵活的，下面将一一介绍torch.nn.Module类的不同使用方式。

通过nn.Sequential来包装层

其实就是把几个层先包装在一起作为一个大的层（块），然后和其他层连接在一起，对层的可以通过这三种方式，关于nn.Sequential的使用方式以后再说。

（1）方式一：直接使用nn.Sequential打包不同的层

import torch.nn as nn
from collections import OrderedDict
class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        # 将卷积层配套打包
        self.conv_block = nn.Sequential(
            nn.Conv2d(3, 32, 3, 1, 1),
            nn.ReLU(),
            nn.MaxPool2d(2))
        
        # 将全连接层页配套打包
        self.dense_block = nn.Sequential(
            nn.Linear(32 * 3 * 3, 128),
            nn.ReLU(),
            nn.Linear(128, 10)
        )
    # 在这里实现了打包之后的块（层）之间的连接关系，就是前向传播
    def forward(self, x):
        conv_out = self.conv_block(x)
        res = conv_out.view(conv_out.size(0), -1)
        out = self.dense_block(res)
        return out
 
model = MyNet()
print(model)
'''运行结果为：
MyNet(
  (conv_block): Sequential(
    (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU()
    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (dense_block): Sequential(
    (0): Linear(in_features=288, out_features=128, bias=True)
    (1): ReLU()
    (2): Linear(in_features=128, out_features=10, bias=True)
  )
)
'''

注意，在每一个包装的块里面，各个层是没有名称的，默认按照0、1、2、3、4来排序

（2）方式二：通过OrderedDict中的元组方式打包层，同时可以给各层命名

import torch.nn as nn
from collections import OrderedDict
class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        
        # 注意这里可以算是一个完整的卷积块，这样看起来是不是更方便一些
        self.conv_block = nn.Sequential(
            OrderedDict(
                [
                    ("conv1", nn.Conv2d(3, 32, 3, 1, 1)),
                    ("relu1", nn.ReLU()),
                    ("pool", nn.MaxPool2d(2))
                ]
            ))
         
            # 这里可以算是一个完整的全连接块
        self.dense_block = nn.Sequential(
            OrderedDict([
                ("dense1", nn.Linear(32 * 3 * 3, 128)),
                ("relu2", nn.ReLU()),
                ("dense2", nn.Linear(128, 10))
            ])
        )
 
    def forward(self, x):
        conv_out = self.conv_block(x)
        res = conv_out.view(conv_out.size(0), -1)
        out = self.dense_block(res)
        return out
 
model = MyNet()
print(model)
'''运行结果为：
MyNet(
  (conv_block): Sequential(
    (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu1): ReLU()
    (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (dense_block): Sequential(
    (dense1): Linear(in_features=288, out_features=128, bias=True)
    (relu2): ReLU()
    (dense2): Linear(in_features=128, out_features=10, bias=True)
  )
)
'''

注意：这种方式可以把不同层集成为块，还有命名，方便查看，一目了然

(3)方式三：通过add_module添加层，还可以给不同的层命名

import torch.nn as nn
from collections import OrderedDict
class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.conv_block=torch.nn.Sequential()
        self.conv_block.add_module("conv1",torch.nn.Conv2d(3, 32, 3, 1, 1))
        self.conv_block.add_module("relu1",torch.nn.ReLU())
        self.conv_block.add_module("pool1",torch.nn.MaxPool2d(2))
 
        self.dense_block = torch.nn.Sequential()
        self.dense_block.add_module("dense1",torch.nn.Linear(32 * 3 * 3, 128))
        self.dense_block.add_module("relu2",torch.nn.ReLU())
        self.dense_block.add_module("dense2",torch.nn.Linear(128, 10))
 
    def forward(self, x):
        conv_out = self.conv_block(x)
        res = conv_out.view(conv_out.size(0), -1)
        out = self.dense_block(res)
        return out
 
model = MyNet()
print(model)
'''运行结果为：
MyNet(
  (conv_block): Sequential(
    (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu1): ReLU()
    (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (dense_block): Sequential(
    (dense1): Linear(in_features=288, out_features=128, bias=True)
    (relu2): ReLU()
    (dense2): Linear(in_features=128, out_features=10, bias=True)
  )
)
'''

注意：

先通过nn.Sequentia实例化一个conv_block对象，也就是self.conv_block=torch.nn.Sequential()
再给该conv_block对象添加不同的层属性，即self.conv_block.add_module(“conv1”,torch.nn.Conv2d(3, 32, 3, 1, 1))

1.3、Module类的常见方法

层的四种索引方法

特别注意：

Sequential类虽然继承自Module类，二者有相似部分，但是也有很多不同的部分，集中体现在：

Sequenrial类实现了整数索引，故而可以使用model[index] 这样的方式获取一个层
Module类没有实现整数索引，不能够通过整数索引来获得层，那该怎么办呢？它提供了几个主要的方法，如下：

def children(self):
 
def named_children(self):
 
def modules(self):
 
def named_modules(self, memo=None, prefix=''):
 
'''
注意：这几个方法返回的都是一个Iterator迭代器，所以可以通过for循环访问，当然也可以通过next
'''

用上面构建的神经网络为例子来说明以上四个方法：

（1）model.children()方法

import torch
import torch.nn as nn
from collections import OrderedDict
class MyNet(nn.Module):
    def __init__(self):
        super(MyNet, self).__init__()
        self.conv_block=torch.nn.Sequential()
        self.conv_block.add_module("conv1",torch.nn.Conv2d(3, 32, 3, 1, 1))
        self.conv_block.add_module("relu1",torch.nn.ReLU())
        self.conv_block.add_module("pool1",torch.nn.MaxPool2d(2))
 
        self.dense_block = torch.nn.Sequential()
        self.dense_block.add_module("dense1",torch.nn.Linear(32 * 3 * 3, 128))
        self.dense_block.add_module("relu2",torch.nn.ReLU())
        self.dense_block.add_module("dense2",torch.nn.Linear(128, 10))
 
    def forward(self, x):
        conv_out = self.conv_block(x)
        res = conv_out.view(conv_out.size(0), -1)
        out = self.dense_block(res)
        return out
 
model = MyNet()

# 调用.children()方法，由于.children()方法的返回值
for i in model.children():
    print(i)
    print(type(i)) # 查看每一次迭代的元素到底是什么类型，实际上是 Sequential 类型,所以有可以使用下标index索引来获取每一个Sequenrial 里面的具体层
 
'''运行结果为：
Sequential(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu1): ReLU()
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
<class 'torch.nn.modules.container.Sequential'>
Sequential(
  (dense1): Linear(in_features=288, out_features=128, bias=True)
  (relu2): ReLU()
  (dense2): Linear(in_features=128, out_features=10, bias=True)
)
<class 'torch.nn.modules.container.Sequential'>
'''

（2）model.named_children()方法

for i in model.named_children():
    print(i)
    print(type(i)) # 查看每一次迭代的元素到底是什么类型，实际上是 返回一个tuple,tuple 的第一个元素是
 
'''运行结果为：
('conv_block', Sequential(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu1): ReLU()
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
))
<class 'tuple'>
('dense_block', Sequential(
  (dense1): Linear(in_features=288, out_features=128, bias=True)
  (relu2): ReLU()
  (dense2): Linear(in_features=128, out_features=10, bias=True)
))
<class 'tuple'>
'''

总结：

（1）model.children()和model.named_children()方法返回的是迭代器iterator；
（2）model.children():每一次迭代返回的每一个元素实际上是 Sequential 类型,而Sequential类型又可以使用下标index索引来获取每一个Sequenrial 里面的具体层，比如conv层、dense层等。俄罗斯套娃？？？
（3）model.named_children():每一次迭代返回的每一个元素实际上是一个元组类型，元组的第一个元素是名称，第二个元素就是对应的层或者是Sequential。

（3）model.modules()方法

for i in model.modules():
    print(i)
    print("==================================================")
'''运行结果为：
MyNet(
  (conv_block): Sequential(
    (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu1): ReLU()
    (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (dense_block): Sequential(
    (dense1): Linear(in_features=288, out_features=128, bias=True)
    (relu2): ReLU()
    (dense2): Linear(in_features=128, out_features=10, bias=True)
  )
)
==================================================
Sequential(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu1): ReLU()
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
==================================================
Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
==================================================
ReLU()
==================================================
MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
==================================================
Sequential(
  (dense1): Linear(in_features=288, out_features=128, bias=True)
  (relu2): ReLU()
  (dense2): Linear(in_features=128, out_features=10, bias=True)
)
==================================================
Linear(in_features=288, out_features=128, bias=True)
==================================================
ReLU()
==================================================
Linear(in_features=128, out_features=10, bias=True)
==================================================
'''

（4）model.named_modules()方法

for i in model.named_modules():
    print(i)
    print("==================================================")
'''运行结果是：
('', MyNet(
  (conv_block): Sequential(
    (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (relu1): ReLU()
    (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (dense_block): Sequential(
    (dense1): Linear(in_features=288, out_features=128, bias=True)
    (relu2): ReLU()
    (dense2): Linear(in_features=128, out_features=10, bias=True)
  )
))
==================================================
('conv_block', Sequential(
  (conv1): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu1): ReLU()
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
))
==================================================
('conv_block.conv1', Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))
==================================================
('conv_block.relu1', ReLU())
==================================================
('conv_block.pool1', MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False))
==================================================
('dense_block', Sequential(
  (dense1): Linear(in_features=288, out_features=128, bias=True)
  (relu2): ReLU()
  (dense2): Linear(in_features=128, out_features=10, bias=True)
))
==================================================
('dense_block.dense1', Linear(in_features=288, out_features=128, bias=True))
==================================================
('dense_block.relu2', ReLU())
==================================================
('dense_block.dense2', Linear(in_features=128, out_features=10, bias=True))
==================================================
'''

总结：

（1）model.modules()和model.named_modules()方法返回的是迭代器iterator；
（2）model的modules()方法和named_modules()方法都会将整个模型的所有构成（包括包装层、单独的层、自定义层等）由浅入深依次遍历出来,但是：

modules()返回的每一个元素是直接返回的层对象本身
named_modules()返回的每一个元素是一个元组，第一个元素是名称，第二个元素才是层对象本身

（3）如何理解children和modules之间的这种差异性。注意pytorch里面不管是模型、层、激活函数、损失函数都可以当成是Module的拓展，所以modules和named_modules会层层迭代，由浅入深，将每一个自定义块block、然后block里面的每一个层都当成是module来迭代。而children就比较直观，就表示的是所谓的“孩子”，所以没有层层迭代深入。

注意：

上面这四个方法是以层包装为例来说明的，如果没有层的包装，依然可以使用这四个方法，结果也是类似

Module类的魔法方法——call方法

该方法允许类的实例跟函数一样表现，就是把实例对象当成函数一样调用，可以给对象传递参数，调用实例对象

def __call__(self, *input, **kwargs):
    for hook in self._forward_pre_hooks.values():
        result = hook(self, input)
        if result is not None:
            if not isinstance(result, tuple):
                result = (result,)
            input = result
    if torch._C._get_tracing_state():
        result = self._slow_forward(*input, **kwargs)
    else:
        
        #################重要：调用了forward方法######################
        result = self.forward(*input, **kwargs)
    for hook in self._forward_hooks.values():
        hook_result = hook(self, input, result)
        if hook_result is not None:
            result = hook_result
    if len(self._backward_hooks) > 0:
        var = result
        while not isinstance(var, torch.Tensor):
            if isinstance(var, dict):
                var = next((v for v in var.values() if isinstance(v, torch.Tensor)))
            else:
                var = var[0]
        grad_fn = var.grad_fn
        if grad_fn is not None:
            for hook in self._backward_hooks.values():
                wrapper = functools.partial(hook, self)
                functools.update_wrapper(wrapper, hook)
                grad_fn.register_hook(wrapper)
    return result

1.4、谈谈系统预定义的层

上面搭建的网络模型，显然是使用了很多系统已经预先给我们定好的层，为了深入理解层的实现，有必要了解这些层，以便于后面自己独立设计新的层

Linear线性层的说明

import torch
print(torch.Tensor(2, 2))

tensor([[0., 0.],
        [0., 0.]])

可以猜测，参数应该是在神经网络类中定义的，因为我们在外部调用时似乎很少看到权重参数本身是什么，我们以线性层类torch.nn.Linear来说明。

import math
import torch
from torch.nn.parameter import Parameter
from .. import functional as F
from .module import Module

class Linear(Module):#继承自Module
    r"""Applies a linear transformation to the incoming data: :math:`y = Ax + b`
    Args:
        in_features: size of each input sample
        out_features: size of each output sample
        bias: If set to False, the layer will not learn an additive bias. Default: True
    Shape:
        - Input: :math:`(N, in\_features)`
        - Output: :math:`(N, out\_features)`
    Attributes:
        weight: the learnable weights of the module of shape (out_features x in_features)
        bias:   the learnable bias of the module of shape (out_features)
    Examples::
        >>> m = nn.Linear(20, 30)
        >>> input = autograd.Variable(torch.randn(128, 20))
        >>> output = m(input)
        >>> print(output.size())
    """

    def __init__(self, in_features, out_features, bias=True):
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        ##################参数定义##############################
        self.weight = Parameter(torch.Tensor(out_features, in_features))
        if bias:
            self.bias = Parameter(torch.Tensor(out_features))
        else:
            self.register_parameter('bias', None)
        ##################参数初始化函数#######################################    
        self.reset_parameters()

    def reset_parameters(self):
        stdv = 1. / math.sqrt(self.weight.size(1))
        self.weight.data.uniform_(-stdv, stdv)
        if self.bias is not None:
            self.bias.data.uniform_(-stdv, stdv)

Linear线性层类的源代码具体说明

# 线性层也是继承Module类的，这个非常有用
class Linear(Module):
    # Linear线性层的说明文档
    r"""Applies a linear transformation to the incoming data: :math:`y = xA^T + b`

    This module supports :ref:`TensorFloat32<tf32_on_ampere>`.
    
    # 接受三个参数，输入维度、输出维度、是否需要偏置项
    Args:
        in_features: size of each input sample
        out_features: size of each output sample
        bias: If set to ``False``, the layer will not learn an additive bias.
            Default: ``True``
    
    # 参数的形状说明，y = xA^T + b
    Shape:
        - Input: :math:`(N, *, H_{in})` where :math:`*` means any number of
          additional dimensions and :math:`H_{in} = \text{in\_features}`
        - Output: :math:`(N, *, H_{out})` where all but the last dimension
          are the same shape as the input and :math:`H_{out} = \text{out\_features}`.
          
    # 属性说明
    Attributes:
        weight: the learnable weights of the module of shape
            :math:`(\text{out\_features}, \text{in\_features})`. The values are
            initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})`, where
            :math:`k = \frac{1}{\text{in\_features}}`
        bias:   the learnable bias of the module of shape :math:`(\text{out\_features})`.
                If :attr:`bias` is ``True``, the values are initialized from
                :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
                :math:`k = \frac{1}{\text{in\_features}}`
                
    # 使用例子说明
    Examples::
    
        # 通过实例化传入参数，in_features = 20，out_features = 30
        # 自动生成权重矩阵W形状为[30,20]
        # m为示例化的对象，自动调用__call__方法中的forward方法，相当于可以当成函数来调用
        # 此时给m传入的参数，实际上时给forward方法中的linear函数传参数input:[128,20]
        # 最后的输出为：y = x*W^T + b
        # [128,20]*[20,30]——>[128,30]
# **********************************************************      
#       def linear(input, weight, bias=None)
#       有个疑问就是linear函数需要传入两个参数，是因为m是实例化的对象，本身就有了weight, bias=None属性吗？
# **********************************************************   
        >>> m = nn.Linear(20, 30)
        >>> input = torch.randn(128, 20)
        >>> output = m(input)
        >>> print(output.size())
        torch.Size([128, 30])
    """
    
    __constants__ = ['in_features', 'out_features']
    in_features: int
    out_features: int
    weight: Tensor

    # 初始化方法，定义了需要输入的值，包括输入维度、输出维度、还有是否需要偏置
    def __init__(self, in_features: int, out_features: int, bias: bool = True) -> None:
        super(Linear, self).__init__()
        
        # 设置属性，输入维度、输出维度
        self.in_features = in_features
        self.out_features = out_features
        
        # 有了输入和输出的维度，很自然就能定义权重
        # torch.Tensor还会初始化权重矩阵
        # 把需要更新的参数放入Parameter中，并且使用tensor初始化权重
        self.weight = Parameter(torch.Tensor(out_features, in_features))
        
        # 是否需要偏置项目
        if bias:
            #
            self.bias = Parameter(torch.Tensor(out_features))
        else:
            # register_parameter，添加一个参数到module中
            self.register_parameter('bias', None)
        # 自动调用reset_parameters方法
        self.reset_parameters()

    def reset_parameters(self) -> None:
        # 该方法会使用kaiming_uniform_初始化权重矩阵
        init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            # 初始化偏置项
            init.uniform_(self.bias, -bound, bound)
    
    # 此时使用forward方法进行前向传播，调用F.linear函数,完成线性变换的运算
    def forward(self, input: Tensor) -> Tensor:
        # 给F.linear传入input和weight
        # input直接通过实例化传入参数，weight通过计算自动生成
        return F.linear(input, self.weight, self.bias)

    def extra_repr(self) -> str:
        return 'in_features={}, out_features={}, bias={}'.format(
            self.in_features, self.out_features, self.bias is not None
        )

nn.Linear的具体使用例子

注意：

实际上，Module类中也实现了魔法方法call
call方法中调用了forward方法，因此给实例化对象传递参数时，参数实际上传递给了forward方法里面

# 通过实例化传入参数，in_features = 20，out_features = 30
# 自动生成权重矩阵W形状为[30,20]
# m为实例化的对象，自动调用__call__方法中的forward方法，相当于可以当成函数来调用
# 此时给m传入的参数，实际上时给forward方法中的linear函数传参数input:[128,20]
# 最后的输出为：y = x*W^T + b
# [128,20]*[20,30]——>[128,30]
# **********************************************************      
#       def linear(input, weight, bias=None)
#       有个疑问就是linear函数需要传入两个参数，是因为m是实例化的对象，本身就有了weight, bias=None属性吗？
# **********************************************************  
import torch
import torch.nn as nn

m = nn.Linear(20, 30)
input = torch.randn(128, 20)

########m为示例化的对象，自动调用__call__方法中的forward方法，相当于可以当成函数来调用###################
output = m(input)
print(output.size())

torch.Size([128, 30])

注意：

在Linear类中，似乎没有看到怎么执行线性变换的具体运算过程的，那是因为它确实本身没实现这个功能，
而是通过forward方法调用了linear函数，来实现：y = x*W^T + b

def forward(self, input: Tensor) -> Tensor:
    # 给F.linear传入input和weight
    # input直接通过实例化传入参数，weight通过计算自动生成
    return F.linear(input, self.weight, self.bias)

Linear类中的linear线性函数的说明

Linear类中实际上使用了forword方法，而forword方法中调用了torch.nn.functional.linear，即调用了linear函数
torch.nn.functional.linear(input, weight, bias=None)

# linear线性函数接受两个参数，默认偏置为bias=None，实现计算线性变换：y = xA^T + b
# 由于在Linear类中，调用了torch.nn.functional.linear函数，
# 因此如果需要偏置，需要在实例化Linear()的同时传入参数bias=Fale

def linear(input, weight, bias=None):
    # type: (Tensor, Tensor, Optional[Tensor]) -> Tensor
    r"""
    Applies a linear transformation to the incoming data: :math:`y = xA^T + b`.

    This operator supports :ref:`TensorFloat32<tf32_on_ampere>`.

    Shape:
        # 输入的维度为(N, *, in\_features)，N表示一次取多少数据训练
        - Input: :math:`(N, *, in\_features)` N is the batch size, `*` means any number of
          additional dimensions

        # 权重的维度为(out\_features, in\_features)
        - Weight: :math:`(out\_features, in\_features)`
        
        # 由于具有广播机制，所以Bias的size设置相对自由
        - Bias: :math:`(out\_features)`
        
        # 最后的输出为(N, *, out\_features)，因为计算时使用了权重的转置，y = xA^T + b
        - Output: :math:`(N, *, out\_features)`
    """
    
    tens_ops = (input, weight)
    if not torch.jit.is_scripting():
        if any([type(t) is not Tensor for t in tens_ops]) and has_torch_function(tens_ops):
            return handle_torch_function(linear, tens_ops, input, weight, bias=bias)
    if input.dim() == 2 and bias is not None:
        # fused op is marginally faster
        ret = torch.addmm(bias, input, weight.t())
    else:
        output = input.matmul(weight.t())
        if bias is not None:
            output += bias
        ret = output
    return ret

1.5、自定义实现Linear层

注意

关于nn.Parameter，以后具体讲解

import torch as t
from torch import nn
from torch.autograd import Variable as V
 
class Linear(nn.Module):
    def __init__(self, in_features, out_features):
        # nn.Module.__init__(self)
        super(Linear, self).__init__()
        self.w = nn.Parameter(t.randn(in_features, out_features)) # nn.Parameter是特殊Variable
        self.b = nn.Parameter(t.randn(out_features))
         
    def forward(self, x):
        x = x.mm(self.w)
        return x + self.b
 
layer = Linear(4, 3)
input = V(t.randn(2, 4))
output = layer(input)
print(output)
 
for name, Parameter in layer.named_parameters():
    print(name, Parameter)

tensor([[1.7390, 2.6249, 1.5991],
        [2.6383, 0.8510, 3.4361]], grad_fn=<AddBackward0>)
w Parameter containing:
tensor([[-0.1486,  0.6831, -1.4095],
        [-1.1092, -0.2992,  0.1042],
        [ 0.8690, -0.3030, -0.9971],
        [-0.9249, -1.0954,  0.2435]], requires_grad=True)
b Parameter containing:
tensor([0.2868, 2.1222, 0.7106], requires_grad=True)

1.6、总结

1、自定义层Linear必须继承nn.Module，并且在其构造函数中需调用nn.Module的构造函数，即super(Linear, self).init() 或nn.Module.init(self)，推荐使用第一种用法。
2、在构造函数__init__中最好自己定义可学习的参数，并封装成Parameter，比如可以把w和b封装成parameter。parameter是一种特殊的Variable，但其默认需要求导（requires_grad = True）
3、forward函数实现前向传播过程，其输入可以是一个或多个variable，对x的任何操作也必须是variable支持的操作。
无需写反向传播函数，因其前向传播都是对variable进行操作，nn.Module能够利用autograd自动实现反向传播，这点比Function简单许多。
4、使用时，直观上可将layer看成数学概念中的函数，调用layer(input)即可得到input对应的结果。它等价于layers.call(input)，在__call__函数中，主要调用的是 layer.forward(x)，另外还对钩子做了一些处理。所以在实际使用中应尽量使用layer(x)而不是使用layer.forward(x)
5、Module中的可学习参数可以通过named_parameters()或者parameters()返回迭代器，前者会给每个parameter都附上名字，使其更具有辨识度
6、类的魔法方法，定义方式：双下划线+方法名称+双下划线，一旦定义实现魔法方法，初始化对象的时候会自动调用，就是说对象会自动拥有某些特定功能，可以为类进行一些定制化的功能设计，这一点非常重要
7、pythond的魔法方法call，允许类的实例对象和函数有一样，也就是把实例对象当成函数一样调用，可以给对象传递参数，调用实例对象
8、实际上，Module类中也实现了魔法方法call，而call方法中调用了forward方法，可以给实例化对象传递参数时，参数实际上传递给了forward方法里面