Pytorch入坑二 | autograd 及Variable（0.3版本）

转载

我是天才很好 2021-06-18 14:11:38

文章标签 PyTorch 文章分类 PyTorch 人工智能

文章目录

这个是0.3的版本，之后修改。

Autograd: 自动微分

autograd包是PyTorch中神经网络的核心, 它可以为基于tensor的的所有操作提供自动微分的功能, 这是一个逐个运行的框架, 意味着反向传播是根据你的代码来运行的, 并且每一次的迭代运行都可能不同.

Variable

tensor是硬币的话，那Variable就是钱包，它记录着里面的钱的多少，和钱的流向。

Pytorch入坑二 | autograd 及Variable（0.3版本）_PyTorch

variable是tensor的外包装，data属性存储着tensor数据，grad属性存储关于该变量的导数，creator是代表该变量的创造者。

autograd.Variable是包的中央类, 它包裹着Tensor, 支持几乎所有Tensor的操作,并附加额外的属性, 在进行操作以后, 通过调用.backward()来计算梯度, 通过.data来访问原始raw data (tensor), 并将变量梯度累加到.grad。

Variable 与 Function互连并建立一个非循环图，编码完整的计算历史。每个变量都有一个.grad_fn 属性，它引用了一个已经创建了Variable 的操作,如加减乘除等（除了用户创建的变量代替creator is None 即第一个运算节点,.grad_fn为空)

Pytorch入坑二 | autograd 及Variable（0.3版本）_PyTorch_02

Variable 和 Tensor

Tensor是存在Variable中的.data里的，而cpu和gpu的数据是通过.cpu()和.cuda()来转换的。

>> a=Variable(torch.Tensor([1]),requires_grad=True).cuda()
>> a
Variable containing:
 1
[torch.cuda.FloatTensor of size 1 (GPU 0)]
>> a.data
 1
[torch.cuda.FloatTensor of size 1 (GPU 0)]
>> a.cpu()
Variable containing:
 1
[torch.FloatTensor of size 1]
>> a.cpu().data
 1
[torch.FloatTensor of size 1]

自动求导

需要注意的是因为当初开发时设计的是，对于中间变量，一旦它们完成了自身反传的使命，grad就会被释放掉。另外，启始节点的grad_fn为空。

Pytorch入坑二 | autograd 及Variable（0.3版本）_PyTorch_03

>> import torch
>> from torch.autograd import Variable

# requres_grad=True开启微分模式
>> a=Variable(torch.Tensor([1]),requires_grad=True) 
>> b=Variable(torch.Tensor([2]),requires_grad=True)
>> c=Variable(torch.Tensor([3]),requires_grad=True)

>> d=a+b
>> e=d+c
>> e.backward()
>> print(a.grad,b.grad,c.grad)
Variable containing:
 1
[torch.FloatTensor of size 1]
Variable containing:
 1
[torch.FloatTensor of size 1]
Variable containing:
 1
[torch.FloatTensor of size 1]

>> d.grad # 中间梯度值不保存，为空

>> a.grad_fn # 第一个节点的.grad_fn为空
>> e.grad_fn
<AddBackward1 at 0x7f387cf1c588>

从反向传播中排除子图

每个Variable都有两个属性，requires_grad和volatile, 这两个属性都可以将子图从梯度计算中排除并可以增加运算效率。

requires_grad：排除特定子图，不参与反向传播的计算，即不会累加记录grad。

volatile: 推理模式, 计算图中只要有一个子图设置为True,　所有子图都会被设置不参与反向传播计算，.backward()被禁止。

Pytorch入坑二 | autograd 及Variable（0.3版本）_PyTorch_04

>> a=Variable(torch.Tensor([1]),requires_grad=False) 
>> b=Variable(torch.Tensor([2]),requires_grad=True)
>> c=a+b
>> c.backward()
>> a.grad  # 因为a的requires_grad=False 所以不存储梯度值
>> b.grad
Variable containing:
 1
[torch.FloatTensor of size 1]

－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
>> a=Variable(torch.Tensor([1]),volatile=True) 
>> b=Variable(torch.Tensor([2]),requires_grad=True)
>> c=a+b
>> c.backward() #由于其中一个子图设置了volatile，所以不能反向传播
RuntimeError: element 0 of variables tuple is volatile

注册钩子

Variable中的hook, 相当于插件，可以在既不修改主体的情况下，同时增加额外的功能挂在主体代码上，就好比一个人去打猎，他的衣服上有挂枪的扣子，他可以选择他想要带的枪．来取得不同的打猎效果．

因此，在Variable中通过reister_hook来实现，register_hook的作用是，当反向传播时，你所注册的hook都会被调用，比如你可以定义个打印函数，每次反向传播都将grad值打印出来．

需要注意的是，register_hook函数接收的是一个函数，这个函数有如下的形式：

hook(grad) -> Variable or None

也就是说，这个函数是拥有改变梯度值的威力的！

例子：打印grad的hook

import torch
from torch.autograd import Variable


# 定义一个打印函数，每次反向传播都将相应的grad打印出来
def print_grad(grad):
    print(grad)

>> x = Variable(torch.randn([1]), requires_grad=True)
>> y = x+2
>> z = torch.mean(torch.pow(y, 2))
>> y.register_hook(print_grad)　＃将打印函数挂在变量y上
>> z.backward()

Variable containing:
 5.1408
[torch.FloatTensor of size 1]

那一般什么时候我们可以用到注册hook呢？

有几种常见的情况，比如当我们想要提取中间层参数来进行可视化的时候，或者当我们想要保存中间参数变量，或者我们想要在传播过程中改变梯度值的时候。

自定义Function

在快速实现想法的过程中，创建自定义的Operation 可以让我们灵活的使用pytorch. 基于这个过程，你所自定义的Operation 需要继承class autograd.Function 类来将其添加到autograd, 这样当我们调用 Operation时可以使用 autograd 来计算结果和梯度，并编码operation 的历史，在定义过程中每个 operation都需要实现三个方法：

__init__(optional): 如果你的 operation 包含非Variable 的参数，那么可以将其传入到init 并在 operation 中使用，如果你的operation 不需要额外的参数，你可以忽略__init__
forward() : 这里写的是operation 的逻辑代码，可以有任意数量的参数，但参数只能是Variable,返回既可以是Variable, 也可以是Variable 的tuple.
backward():　梯度计算逻辑代码，参数个数和forward()返回个数一样，每个参数代表传回到此 Operation 的梯度，返回值的个数和此operation 输入的个数一样，如果operation不需要返回梯度，可以返回None。

例子：

注：官方定义一般会用到@staticmethod , 同时通过调用custom function.apply来实现,

但也可以不加@staticmethod,　这样调用要custom_function()来实现．

import torch
from torch.autograd import Variable


class custom_add(torch.autograd.Function): 
   　　"""
      我们可以实现通过继承torch.autograd.Function来实现自定义function. 
      正向和反向传播通过tensors来实现
   　　"""

   　 @staticmethod
      def forward(ctx,input,input2):
      　　 """
      　　 在正向传播中,我们接受tensor作为输入，并返回tensor类型的输出，
          ctx是可以用来在反向传播计算的存储属性的对象，如ctx.save_for_backward可以在反向传播中使用。
     　　  """
          ctx.save_for_backward(input,input2)
          output=input+input2
          return output
    
      @staticmethod
      def backward(ctx,grad_output):
      　　 """
      　　 在反向传播中，我们接受一个存储loss梯度的tensor(grad_output)
      　　 同时根据输入来计算应该返回的的梯度。
     　　  """
          input1,input2=ctx.saved_tensors　＃ 在forward中存储的数据 save_for_backward
          grad_input=grad_output.clone()
          return grad_input,grad_input  # 由于input是两个输入，所以也返回两个grad

new_add=custom_add.apply
a=Variable(torch.Tensor([1]),requires_grad=True)
b=Variable(torch.Tensor([2]),requires_grad=True)
c=new_add(a,b)

>> c
Variable containing:
 3
[torch.FloatTensor of size 1]
>> c.backward()
>> a.grad
Variable containing:
 1
[torch.FloatTensor of size 1]

自定义function检查：

在你完成function后，你可能会想要知道你的逻辑，反向传播是否有写错，你可以通过比较小数值的差分法结果来进行确认，通过gradcheck来实现。

from torch.autograd import gradcheck


# gradchek takes a tuple of tensor as input, check if your gradient
# evaluated with these tensors are close enough to numerical
# approximations and returns True if they all verify this condition.
input = (Variable(torch.Tensor([2]).double(), requires_grad=True), Variable(torch.Tensor([3]).double(), requires_grad=True),)
test = gradcheck(custom_add.apply, input, eps=1e-6, atol=1e-4)

>> print(test)
True

分析工具：

如果你想查看你定义操作的时间花销，autograd 的 profiler 提供内视每个操作在GPU和CPU的花销，对于CPU通过profile, 基于nvprof通过使用emit_nvtx。

>>> x = Variable(torch.randn(1, 1), requires_grad=True)
>>> with torch.autograd.profiler.profile() as prof:
...     y = x ** 2
...     y.backward()
>>> # NOTE: some columns were removed for brevity
... print(prof)
-------------------------------------  ---------------  ---------------
Name                                          CPU time        CUDA time
-------------------------------------  ---------------  ---------------
PowConstant                                  142.036us          0.000us
N5torch8autograd9GraphRootE                   63.524us          0.000us
PowConstantBackward                          184.228us          0.000us
MulConstant                                   50.288us          0.000us
PowConstant                                   28.439us          0.000us
Mul                                           20.154us          0.000us
N5torch8autograd14AccumulateGradE             13.790us          0.000us
N5torch8autograd5CloneE                        4.088us          0.000us