PyTorch 设置缓存目录 pytorch 内存不足

转载

mob6454cc79ab13 2024-08-30 16:39:28

文章标签 PyTorch 设置缓存目录 pytorch 深度学习神经网络 github 文章分类 PyTorch 人工智能

文章目录

方法篇：Pytorch学习总结or方法

第一步当作高级Numpy来玩。
第二步找个标准模版研究
第三步边看文档边用

资源篇：常用资源

1. Awesome主要内容：
2. 相关链接：

零、pytorch简介

1.pytorch优势
2.用pytorch训练DNN的过程

一、数据操作（tensor）

1.1 创建Tensor
1.2 基本操作（算术or索引or改变size）

1.2.1 算术操作
1.2.2 索引
1.2.3 改变形状

1.3 广播机制
1.4 Tensor和Numpy相互转化

二、自动求梯度（敲黑板）

2.1张量及张量的求导（Tensor）
2.2 梯度

三、神经网络设计的pytorch版本

3.1 定义网络
3.2 损失函数
3.3 更新权重

四、数据集加载

1.dataset
2.dataloader

五、GPU跑深度学习
六、其他问题

1. `torch.nn.Linear(a, b) `的用法
2. pytorch和cuda版本匹配的问题：
3. tensor转int/float格式
4. 一维增加为二维
5. flatten压平操作
6. 万能einsum函数的用法
7. random和seed种子设置
8.矩阵乘法
9.dataloader中使用自定义collate_fn函数
10. 预测阶段别漏了eval
11. list和tensor互相转换
12. 统计tensor中负数的个数
13. tensor基础操作
14. permute()函数用法
15. 生成mask矩阵
16. model的搭建
17. autocast自动类型转换
18.epoch内定期保存checkpoint
19. contiguous连续存储
20. 查看显存占用情况、指定具体显卡
21.将k个tensor进行stack
22. 用tensor计算平方和

reference

方法篇：Pytorch学习总结or方法

（1）资源总结见reference （2）李宏毅的pytorch：https://www.bilibili.com/video/BV1Wv411h7kN?p=5&spm_id_from=pageDriver

第一步当作高级Numpy来玩。

看官方的tutorial [Welcome to PyTorch Tutorials]：(https://pytorch.org/tutorials/)，

一路next，把第一块内容《Deep Learning with PyTorch: A 60 Minute Blitz》看完就够了，60分钟入门，搞懂Tensor和Variable两大核心概念，知道自动求导是怎么回事。有空的话可以一路next到底，各种基本概念都有个印象。

总之，打开iPython交互界面，当作Numpy来玩就好了。

第二步找个标准模版研究

看官方的例子[pytorch/examples]：(https://github.com/pytorch/examples)，

里面的MNIST和ImageNet的例子都可以研究一下，处理命令行参数的部分比较多余可以略过，看一下标准范式，另外[Learning PyTorch with Examples]：(https://pytorch.org/tutorials/beginner/pytorch_with_examples.html)

官方tutorial里面也有对应的讲解，结合起来看。

上面看完基本就想动手用了，觉得不够还可以补充看下[yunjey/pytorch-tutorial]：(https://github.com/yunjey/pytorch-tutorial)

这个，有好几个入门的例子。

第三步边看文档边用

PyTorch的官方文档[PyTorch documentation]：(https://pytorch.org/docs/master/index.html)

有一些不足，很多关键概念和原理都没有讲清楚，但是作为API参考手册是相当好的，先通读一遍，PyTorch具体能干那些事情有个印象，然后开始搞自己的任务，遇到想要实现的操作就去官方文档查API。

到这里，就算入门了，尽情用PyTorch完成自己的任务吧。

资源篇：常用资源

入门后，在具体的日常使用上面，可能经常需要利用到的几个资源：

[bharathgs/Awesome-pytorch-list]：(https://github.com/bharathgs/Awesome-pytorch-list)： Awesome系列，收录各种PyTorch的资源，有需求，这里去找，包括各种模型，各种有趣的应用，更多的教程，各种论文复现等等

1. Awesome主要内容：

（1）PyTorch&相关库：这一部分只有一个资源，也就是PyTorch的官方网站。（2）NLP&语音处理：这一部分暂时有二十六个资源，主要涉及语音处理、NLP、多说话人语音处理、语音合成、机器翻译等等。（3）计算机视觉：这一部分暂时有十四个资源，主要涵盖图像增强、语义分割、风格迁移等等。（4）概率/生成库：这一部分暂时有七个资源，主要涵盖概率编程、统计推理和生成模型等等。（5）其他库：这一部分暂时有七十八个资源，主要涵盖上述领域之外的一些PyTorch库。（6）教程&实例：这一部分暂时有五十三个资源，不仅有官方的教程，也有许多非官方的开发者自己的经验，而且也有中文版的教程。（7）论文实现：这一部分资源是最多的，暂时有二百七十三个。基本上涵盖了所有顶尖的论文，有兴趣的可以mark下来，一篇一篇的自己过一遍。

总结：PyTorch大法好，不过还有很多具体功能怎么用并不是很直接，怎么自定义控制加载不同模型的权重，怎么样多GPU并行，怎么样自定义每一层的学习率和weight decay，以及怎么调整学习率等等，都要自己摸索，官方支持还不是很人性化，后面博客可能会介绍这些topics。

零、pytorch简介

很多时候产品只靠

1.pytorch优势

PyTorch是深度学习的主流框架，优势：

（1）可以用tensor（类似numpy）进行GPU加速

（2）DNN建立在autograd上

PyTorch 设置缓存目录 pytorch 内存不足_github

2.用pytorch训练DNN的过程

PyTorch 设置缓存目录 pytorch 内存不足_深度学习_02

使用torch.nn创建神经网络，nn包会使用autograd包定义模型和求梯度。一个nn.Module对象包括了许多网络层，并且用forward(input)方法来计算损失值，返回output。

训练一个神经网络通畅需要以下步骤：

定义一个神经网络，通常有一些可以训练的参数
迭代一个数据集（Dataset）
处理网络的输入
计算损失（会调用Module对象的forward()方法）
计算损失函数对参数的梯度
更新参数，通常使用如下的梯度下降方法来更新：weight=weight-learning_rate × gradien。

一、数据操作（tensor）

1.1 创建Tensor

（1）创建未初始化的Tensor

import torch

# 创建未初始化的Tensor
x = torch.empty(5, 3)
print(x)

#### 结果为：####

tensor([[-7.9905e+25,  8.1556e-43, -7.9905e+25],
        [ 8.1556e-43, -7.9899e+25,  8.1556e-43],
        [-7.9899e+25,  8.1556e-43, -7.9884e+25],
        [ 8.1556e-43, -7.9884e+25,  8.1556e-43],
        [-7.9900e+25,  8.1556e-43, -7.9900e+25]])

（2）创建随机初始化的Tensor

# 创建随机初始化的Tensor
x = torch.rand(5, 3)
print(x)

#### 结果为：####

tensor([[0.1757, 0.9102, 0.0980],
        [0.0969, 0.6846, 0.5546],
        [0.3665, 0.2245, 0.2967],
        [0.5773, 0.4293, 0.5060],
        [0.0633, 0.2833, 0.2325]])

如果是选择随机数，可以通过torch.randperm(10)产生10个随机数。如果是生成一个区间的数，可以用torch.arange(10, 30, 5)。

torch.linspace(2, 10, steps = 9)
Out[5]: tensor([ 2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

torch.arange(10, 30, 5)
Out[6]: tensor([10, 15, 20, 25])

（3）创建全为0的Tensor

# 创建全为0的Tensor
x = torch.zeros(5, 3, dtype = torch.long)
print(x)

#### 结果为：####

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

（4）根据数据创建Tensor

# 根据数据创建Tensor
x = torch.tensor([5.5, 3])
print(x)

结果为：

tensor([5.5000, 3.0000])

（5）修改原Tensor为全1的Tensor

# 修改原Tensor为全1的Tensor
x = x.new_ones(5, 3, dtype = torch.float64)
print(x)

# 修改数据
x = torch.rand_like(x, dtype = torch.float64)
print(x)

#### 结果为：####

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)
tensor([[0.3330, 0.9622, 0.9146],
        [0.2841, 0.9874, 0.3035],
        [0.2449, 0.2221, 0.1693],
        [0.2697, 0.7510, 0.7994],
        [0.1660, 0.9774, 0.4102]], dtype=torch.float64)

（6）获取Tensor的形状

# 获取Tensor的形状
print(x.size())
print(x.shape)
# 注意：返回的torch.Size就是一个tuple，支持所有tuple的操作

#### 结果为：####
torch.Size([5, 3])
torch.Size([5, 3])

（7）通过切分数列初始化

# 切分 linspace
torch.linspace(2, 10, steps = 9)
# tensor([ 2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10.])

1.2 基本操作（算术or索引or改变size）

1.2.1 算术操作

同一种操作可能有多种操作方法，下面用加法作栗子：（1）形式1：

# 同一种操作可能有很多种形式
# 形式1：
y = torch.rand(5, 3)
print(x + y)

tensor([[0.6024, 1.9602, 0.9764],
        [1.2583, 1.6134, 0.6532],
        [0.6273, 0.4975, 0.4529],
        [1.1975, 0.8352, 1.5810],
        [0.2917, 1.4789, 1.1978]], dtype=torch.float64)

（2）形式2：

# 形式2
print(torch.add(x, y))
# 还可以指定输出
result = torch.empty(5, 3)
torch.add(x, y, out = result)
print(result)

tensor([[0.6024, 1.9602, 0.9764],
        [1.2583, 1.6134, 0.6532],
        [0.6273, 0.4975, 0.4529],
        [1.1975, 0.8352, 1.5810],
        [0.2917, 1.4789, 1.1978]], dtype=torch.float64)
tensor([[0.6024, 1.9602, 0.9764],
        [1.2583, 1.6134, 0.6532],
        [0.6273, 0.4975, 0.4529],
        [1.1975, 0.8352, 1.5810],
        [0.2917, 1.4789, 1.1978]])

（3）形式3

# 形式3
y.add_(x)
print(y)

tensor([[0.6024, 1.9602, 0.9764],
        [1.2583, 1.6134, 0.6532],
        [0.6273, 0.4975, 0.4529],
        [1.1975, 0.8352, 1.5810],
        [0.2917, 1.4789, 1.1978]])

1.2.2 索引

可以使用类似NumPy的索引操作来访问Tensor的一部分。注意：索引的结果与原数据共享内存（修改一个，另一个也会随之被修改）。

# 用类似NumPy的索引操作来访问Tensor的一部分
# 注意：索引出来的结果与原来的数据共享内存
y = x[0, :]
y += 1
print(y)
print(x[0, :]) # 观察x是否改变了

tensor([1.3330, 1.9622, 1.9146], dtype=torch.float64)
tensor([1.3330, 1.9622, 1.9146], dtype=torch.float64)

1.2.3 改变形状

view()返回的是新tensor与源tensor共享内存，即更改其中，另一个也会随之改变。就是说，view仅仅改变了对这个张量的观察角度。

y = x.view(15)
z = x.view(-1, 5)# -1所指的维度可以根据其他维度的值推出来
print(x.size(), y.size(), z.size())

结果为：

torch.Size([5, 3]) torch.Size([15]) torch.Size([3, 5])

x += 1
print(x)
print(y)

结果为：

tensor([[2.3330, 2.9622, 2.9146],
        [1.2841, 1.9874, 1.3035],
        [1.2449, 1.2221, 1.1693],
        [1.2697, 1.7510, 1.7994],
        [1.1660, 1.9774, 1.4102]], dtype=torch.float64)
tensor([2.3330, 2.9622, 2.9146, 1.2841, 1.9874, 1.3035, 1.2449, 1.2221, 1.1693,
        1.2697, 1.7510, 1.7994, 1.1660, 1.9774, 1.4102], dtype=torch.float64)

如果想返回一个真正新的副本（即不共享内存），则可以使用pytorch的reshape()改变形状，但是不能保证返回的是其拷贝，所以不推荐。可以用clone创造一个副本然后再使用view！

x_cp = x.clone().view(15)# 用clone创造一个副本
x -= 1
print(x)
print(x_cp)

结果为：

tensor([[1.3330, 1.9622, 1.9146],
        [0.2841, 0.9874, 0.3035],
        [0.2449, 0.2221, 0.1693],
        [0.2697, 0.7510, 0.7994],
        [0.1660, 0.9774, 0.4102]], dtype=torch.float64)
tensor([2.3330, 2.9622, 2.9146, 1.2841, 1.9874, 1.3035, 1.2449, 1.2221, 1.1693,
        1.2697, 1.7510, 1.7994, 1.1660, 1.9774, 1.4102], dtype=torch.float64)

另一个常用的函数item()，可以将一个标量Tensor转换成一个Pyhotn number。

# item()可以将一个标量Tensor转换成一个Python number
x = torch.randn(1)
print(x)
print(x.item())

结果为

tensor([0.2603])
0.2603132724761963

1.3 广播机制

当对两个形状不同的 Tensor 按元素运算时，可能会触发广播（broadcasting）机制：先适当复制元素使这两个 Tensor 形状相同后再按元素运算。例如：

x = torch.arange(1, 3).view(1, 2)
print(x)
y = torch.arange(1, 4).view(3, 1)
print(y)
print(x + y)

结果为

tensor([[1, 2]])
tensor([[1],
        [2],
        [3]])
tensor([[2, 3],
        [3, 4],
        [4, 5]])

1.4 Tensor和Numpy相互转化

⽤ numpy() 和 from_numpy() 将 Tensor 和NumPy中的数组相互转换。但是需要注意的⼀点是：这两个函数所产生的的 Tensor 和NumPy中的数组共享相同的内存。

a = torch.ones(5)
b = a.numpy()
print(a, b)

结果为：

tensor([1., 1., 1., 1., 1.]) [1. 1. 1. 1. 1.]

a += 1
print(a, b)

结果为：

tensor([2., 2., 2., 2., 2.]) [2. 2. 2. 2. 2.]

b += 1
print(a, b)

结果为：

tensor([3., 3., 3., 3., 3.]) [3. 3. 3. 3. 3.]

使⽤ from_numpy() 将NumPy数组转换成 Tensor :

import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
print(a, b)

结果为：

[1. 1. 1. 1. 1.] tensor([1., 1., 1., 1., 1.], dtype=torch.float64)

a += 1
print(a, b)
b += 1
print(a, b)

结果为：

[2. 2. 2. 2. 2.] tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
[3. 3. 3. 3. 3.] tensor([3., 3., 3., 3., 3.], dtype=torch.float64)

二、自动求梯度（敲黑板）

这里可以参考：Tensor的自动求导(AoutoGrad)

自动求导的一些原理性的知识 autograd软件包是PyTorch中所有神经网络的核心。让我们首先简要地访问它，然后我们将去训练我们的第一个神经网络。

该autograd软件包可自动区分张量上的所有操作。这是一个按运行定义的框架，这意味着您的backprop是由代码的运行方式定义的，并且每次迭代都可以不同。

如果想了解数值微分数值积分和自动求导的知识，可以查看邱锡鹏老师的《神经网络与深度学习》第四章第五节：下载地址：https://nndl.github.io/

在这里简单说说自动微分的原理吧：我们的目标是求

PyTorch 设置缓存目录 pytorch 内存不足_深度学习_03

PyTorch 设置缓存目录 pytorch 内存不足_pytorch_04

处的导数。利用链式法则分解为一系列的操作：

PyTorch 设置缓存目录 pytorch 内存不足_PyTorch 设置缓存目录_05

PyTorch 设置缓存目录 pytorch 内存不足_PyTorch 设置缓存目录_06

2.1张量及张量的求导（Tensor）

# 加入requires_grad=True参数可追踪函数求导
x = torch.ones(2, 2, requires_grad=True)
print(x)
print(x.grad_fn)

结果为：

tensor([[1., 1.],
        [1., 1.]], requires_grad=True)
None

# 进行运算
y = x + 2 # 创建了一个加法操作
print(y)
print(y.grad_fn)

结果为：

tensor([[3., 3.],
        [3., 3.]], grad_fn=<AddBackward0>)
<AddBackward0 object at 0x00000246EA421460>
像x这种直接创建的称为叶子节点，叶子节点对应的 grad_fn 是 None 。
```python
print(x.is_leaf, y.is_leaf)

结果为：

True False

# 整点复杂的操作
z = y * y * 3
out = z.mean()
print(z, out)

结果为：

tensor([[27., 27.],
        [27., 27.]], grad_fn=<MulBackward0>) tensor(27., grad_fn=<MeanBackward0>)

requires_grad_( … )改变requires_grad 的属性。

a = torch.randn(2, 2) # 缺失情况下默认 requires_grad = False
a = ((a * 3)/(a - 1))
print(a.requires_grad) # False
a.requires_grad_(True)
print(a.requires_grad)
b = (a * a).sum()
print(b.grad_fn)

结果为：

False
True
<SumBackward0 object at 0x00000246E6851FD0>

2.2 梯度

反向传播：因为out包含单个标量，out.backward()所以等效于out.backward(torch.tensor(1.))。

out.backward()
print(x.grad)

结果为：

tensor([[4.5000, 4.5000],
        [4.5000, 4.5000]])

# 再来反向传播一次，注意grad是累加的
out2 = x.sum()
out2.backward()
print(x.grad)

out3 = x.sum()
x.grad.data.zero_()
out3.backward()
print(x.grad)

结果为：

tensor([[5.5000, 5.5000],
        [5.5000, 5.5000]])
tensor([[1., 1.],
        [1., 1.]])

三、神经网络设计的pytorch版本

一个简单的前馈网络。它获取输入，将其一层又一层地馈入，然后最终给出输出。神经网络的典型训练过程如下：（1）定义具有一些可学习参数（或权重）的神经网络（2）遍历输入数据集（3）通过网络处理输入（4）计算损失（输出正确的距离有多远）（5）将梯度传播回网络参数

通常使用简单的更新规则来更新网络的权重：weight = weight - learning_rate * gradient

3.1 定义网络

# 定义网络
import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    
    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3 x 3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation :y =Wx + b
        self.fc1 = nn.Linear(16*6*6, 120) # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        
    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) 
        # CLASStorch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    
    def num_flat_features(self, x):
        size = x.size()[1:] # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        print(num_features)
        return num_features
    
net = Net()
print(net)

结果为：

Net(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

# 模型的可学习参数由返回 net,parameters()
params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's weight

结果为

10
torch.Size([6, 1, 3, 3])

# 尝试一个32 x 32随机输入
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

结果为：

576
tensor([[ 0.0496, -0.1179, -0.0271, -0.0818, -0.1386, -0.1017, -0.0374,  0.1208,
          0.0532,  0.0830]], grad_fn=<AddmmBackward>)

# 用随机梯度将所有参数和反向传播器的梯度缓冲区归零
net.zero_grad()
out.backward(torch.randn(1, 10))

3.2 损失函数

output = net(input)
target = torch.randn(10)    # a dummy target, for example
target = target.view(-1,1)  # # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output,target)
print(loss)

结果为：

576
tensor(0.9183, grad_fn=<MseLossBackward>)

我们现在的网络结构：

PyTorch 设置缓存目录 pytorch 内存不足_pytorch_07

# 如果loss使用.grad_fn属性的属性向后移动，可查看网络结构
print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

结果为：

<MseLossBackward object at 0x00000246EB9CDC10>
<ExpandBackward object at 0x00000246EB9CD1C0>
<AddmmBackward object at 0x00000246EB9CDC10>

3.3 更新权重

实践中最简单的更新规则是随机梯度下降（SGD）: weight = weight - learning_rate * gradient

# 实践中最简单的更新规则是随机梯度下降（SGD）
import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr = 0.01)

# in your training loop
optimizer.zero_grad()# zero ther gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()

四、数据集加载

1.dataset

PyTorch 设置缓存目录 pytorch 内存不足_PyTorch 设置缓存目录_08

从dataset的源码中发现，Dataset自带有__add__内置函数，dataset对象可以用+号来cat，更多参考：https://zhuanlan.zhihu.com/p/222772996

PyTorch 设置缓存目录 pytorch 内存不足_深度学习_09

2.dataloader

五、GPU跑深度学习

李沐老师的手把手教学视频：https://www.zhihu.com/zvideo/1363284223420436480

（1）cmd命令：dxdiag，查看电脑的芯片配置：

PyTorch 设置缓存目录 pytorch 内存不足_pytorch_10

（2）下载CUDA：https://developer.nvidia.com/cuda-downloads?target_os=Windows&target_arch=x86_64&target_versinotallow=10&target_type=exe_local

PyTorch 设置缓存目录 pytorch 内存不足_深度学习_11

（3）cmd命令行查看（下图的光标的位置）

PyTorch 设置缓存目录 pytorch 内存不足_pytorch_12

（4）下载pytorch的GPU版本：https://pytorch.org/get-started/locally/（pytorch的官网）

PyTorch 设置缓存目录 pytorch 内存不足_深度学习_13

复制底下的command命令到anaconda prompt命令行中

注意：注意pytorch和cuda的版本对应可以先查看下https://download.pytorch.org/whl/torch_stable.html。如下载适配cuda11.6版本的gpu版torch和其他包可以如下pip命令：

# CUDA 11.6
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://pypi.tuna.tsinghua.edu.cn/simple 

# 后面要加上链接
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/cu111/torch_stable.html

# 完整命令
conda create -n pytorch python=3.7 -y   
conda activate pytorch
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
python
import torch
torch.cuda.is_available()

有些包还需要和其他包兼容，比如torchtext和torch需要兼容，参考https://github.com/pytorch/text/。（5）下载需要点内存空间，卡了很多次断了，但是后来没继续下竟然测试时也显示可以用GPU的torch，，可能之前下过。。

import torch
flag = torch.cuda.is_available()
if flag:
    print("CUDA可使用")
else:
    print("CUDA不可用")

ngpu= 1
# Decide which device we want to run on
device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")
print("驱动为：",device)
print("GPU型号： ",torch.cuda.get_device_name(0))

测试结果为：

CUDA可使用
驱动为： cuda:0
GPU型号：  NVIDIA GeForce MX150

六、其他问题

1. torch.nn.Linear(a, b) 的用法

首先我们可以查找pytorch官方文档：https://pytorch.org/docs/master/nn.html#linear-layers，可知torch.nn的线性层有如下几种：

PyTorch 设置缓存目录 pytorch 内存不足_pytorch_14

import torch

x = torch.randn(128, 20)  # 输入的维度是（128，20）
m = torch.nn.Linear(20, 30)  # 20,30是指维度
output = m(x)
print('m.weight.shape:\n ', m.weight.shape)
print('m.bias.shape:\n', m.bias.shape)
print('output.shape:\n', output.shape)

# ans = torch.mm(input,torch.t(m.weight))+m.bias 等价于下面的
ans = torch.mm(x, m.weight.t()) + m.bias   
print('ans.shape:\n', ans.shape)

print(torch.equal(ans, output))

结果为：

m.weight.shape:
  torch.Size([30, 20])
m.bias.shape:
 torch.Size([30])
output.shape:
 torch.Size([128, 30])
ans.shape:
 torch.Size([128, 30])
True

为什么 m.weight.shape = (30,20)? 因为线性变换的公式是： $PyTorch 设置缓存目录 pytorch 内存不足_深度学习_15$ 先生成一个（30，20）的weight，实际运算中再转置，这样就能和x做矩阵乘法了

2. pytorch和cuda版本匹配的问题：

用GPU跑模型报错（torch和GPU版本不匹配），卸载后重新安装（一开始用conda不行按照官网用pip就可以了）pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116。可以参考官网的提示：https://pytorch.org/get-started/previous-versions/

注意低版本的 pytorch 是否支持更高版本的 cuda。（高版本的pytorch一般能兼容低版本cuda）例如：你需要 1.7.0 的 pytorch，那么 cuda 只能 11.0 及以下。官方推荐的cuda版本为10.2和11.3，这两种 cuda 支持大多数的 pytorch 版本。（不过一般人都是根据cuda装pytorch，谁没事指定pytorch版本反装cuda啊，哦是复现baseline啊）

3. tensor转int/float格式

import torch

a = [1, 2, 3, 4]
a1 = torch.tensor(a)
a_float = torch.tensor(a1, dtype=torch.float32)
a_int64 = torch.tensor(a1, dtype=torch.int64)

print(a1.dtype)
print(a_float.dtype)
print(a_int64.dtype)
#torch.int64
#torch.float32
#torch.int64

# 方法二
float_tensor = my_tensor.float()
my_tensor = torch.randn(2, 4)  # 默认为float32类型
my_tensor.type(torch.float16)
print(my_tensor.type(torch.float16))
print(my_tensor.type(torch.float32))
print(my_tensor.type(torch.int32))
print(my_tensor.type(torch.long))

4. 一维增加为二维

因为torch.nn.init.kaiming_normal_参数初始化（其实其他大部分参数初始化API也是），该函数第一个参数tensor的维度不能小于等于二维，也就是说在问题描述代码中，for循环时存在维度为1的情况，那就多加个判断（如果是一维则增加为二维）就好了：

def reset_parameters(self, initializer=None):
    for weight in self.parameters():
        if len(weight.shape) < 2:
            torch.nn.init.kaiming_normal_(weight.unsqueeze(0))
        else:
            torch.nn.init.kaiming_normal_(weight)

5. flatten压平操作

input1 = torch.tensor(range(2*3*4*5)).view(2, 3, 4, 5)
# input1.shape
torch.flatten(input1, start_dim = 1, end_dim=2).shape
# torch.Size([2, 12, 5])

6. 万能einsum函数的用法

einsum（Einstein summation convention，即爱因斯坦求和约定）的用法：

$PyTorch 设置缓存目录 pytorch 内存不足_神经网络_16$

c = np.dot(a, b)                 # 常规
c = np.einsum('ij,jk->ik', a, b) # einsum

再比如 $PyTorch 设置缓存目录 pytorch 内存不足_神经网络_17$ ： c = np.einsum('ijk,jkl->kl', a, b)

import torch

# 1. 张量转置
A = torch.randn(3, 4, 5)
B = torch.einsum("ijk->ikj", A)
print(A.shape, "\n", B.shape, "\n", "======")  # (3, 4, 5) ; (3, 5, 4)

# 2. 取对角元素
A = torch.randn(5, 5)
B = torch.einsum("ii->i", A)
print(A.shape, "\n", B.shape, "\n", "======")

# 3. 求和降维
A = torch.randn(4, 5)
B = torch.einsum("ij->i", A)
print(A.shape, "\n", B.shape, "\n", "======")

# 4. 哈达玛积(两个矩阵维度相同)
A = torch.randn(3, 4)
B = torch.randn(3, 4)
C = torch.einsum("ij, ij->ij", A, B)
print(A.shape, "\n", B.shape, "\n", C.shape, "\n", "======")

# 5. 向量内积
A = torch.randn(10)
B = torch.randn(10)
#C=torch.dot(A,B)
C = torch.einsum("i,i->",A,B)

# 6. 向量外积
A = torch.randn(10)
B = torch.randn(5)
#C = torch.outer(A,B)
C = torch.einsum("i,j->ij",A,B)

# 7. 矩阵乘法
A = torch.randn(5,4)
B = torch.randn(4,6)
#C = torch.matmul(A,B)
C = torch.einsum("ik,kj->ij",A,B)

# 8. 张量缩并
A = torch.randn(3,4,5)
B = torch.randn(4,3,6)
#C = torch.tensordot(A,B,dims=[(0,1),(1,0)])
C = torch.einsum("ijk,jih->kh",A,B)

# 9. batch矩阵乘法
batch_tensor_1 = torch.arange(2 * 4 * 3).reshape(2, 4, 3)
batch_tensor_2 = torch.arange(2 * 3 * 4).reshape(2, 3, 4) 
torch.bmm(batch_tensor_1, batch_tensor_2)  # [2, 4, 4]
torch.einsum("bij, bjk -> bik", batch_tensor_1, batch_tensor_2) # [2, 4, 4]

7. random和seed种子设置

python内置random函数、numpy中的random函数、tensorflow及pytorch中常见的seed使用方式如下，设置随机种子后，再次执行代码的输出结果相同，注意如果是for循环里出torch.randn(5)结果则每层循环遍历的结果是不同。

import random
import numpy as np
import tensorflow as tf
import torch
import time

seed = 1

random.seed(seed)
np.random.seed(seed)
tf.random.set_seed(seed)
torch.manual_seed(seed)

list = [1,2,3,4,5,6,7,8,9]

a = random.sample(list,5)
b = np.random.randn(5)
c = tf.random.normal([5])
d = torch.randn(5)

print(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))
print('python内置输出：',a)
print('*' * 60)
print('numpy输出：',b)
print('*' * 60)
print('tensorflow输出：',c)
print('*' * 60)
print('pytorch输出',d)

2022-12-10 07:50:28
python内置输出： [3, 2, 9, 1, 4]
************************************************************
numpy输出： [ 1.62434536 -0.61175641 -0.52817175 -1.07296862  0.86540763]
************************************************************
tensorflow输出： tf.Tensor([-1.1012203   1.5457517   0.383644   -0.87965786 -1.2246722 ], shape=(5,), dtype=float32)
************************************************************
pytorch输出 tensor([ 0.6614,  0.2669,  0.0617,  0.6213, -0.4519])

8.矩阵乘法

torch.mm : 用于两个矩阵（不包括向量）的乘法。如维度为(l,m)和(m,n)相乘
torch.bmm : 用于带batch的三维向量的乘法。如维度为(b,l,m)和(b,m,n)相乘
torch.mul : 用于两个同维度矩阵的逐像素点相乘（点乘）。如维度为(l,m)和(l,m)相乘
torch.mv: 用于矩阵和向量之间的乘法（矩阵在前，向量在后）。如维度为(l,m)和(m)相乘，结果的维度为(l)。
torch.matmul : 用于两个张量(后两维满足矩阵乘法的维度)相乘或者是矩阵与向量间的乘法，因为其具有广播机制(broadcasting，自动补充维度)。如维度为(b,l,m)和(b,m,n)；(l,m)和(b,m,n)；(b,c,l,m)和(b,c,m,n)；(l,m)和(m)相乘等。【其作用包含torch.mm、torch.bmm和torch.mv】
@运算符 : 其作用类似于torch.matmul
*运算符 : 其作用类似于torch.mul
einsum（Einstein summation convention，即爱因斯坦求和约定）的用法：
$PyTorch 设置缓存目录 pytorch 内存不足_神经网络_16$

c = np.dot(a, b)                 # 常规
c = np.einsum('ij,jk->ik', a, b) # einsum

再比如 $PyTorch 设置缓存目录 pytorch 内存不足_神经网络_17$ ： c = np.einsum('ijk,jkl->kl', a, b)

9.dataloader中使用自定义collate_fn函数

使用自定义的collate_fn函数

一图胜千言（如下），如NLP中每个句子长度都不一样时，如果按照每个batch的长度都填充到一致，那显然效率不是最高的（占内存），dataloader中的参数collate_fn就是为了这种情况，定制函数使得每个batch中的样本长度，只和当前batch中最长的样本长度相同，即每个batch中的样本长度可以不同，使得高效迭代样本训练模型：

PyTorch 设置缓存目录 pytorch 内存不足_深度学习_20

from torch.nn.utils.rnn import pad_sequence #(1)
from pprint import pprint

# values are token indices but it does not matter - it can be any kind of variable-size data
nlp_data = [
    {'tokenized_input': [1, 4, 5, 9, 3, 2],
     'label':0},
    {'tokenized_input': [1, 7, 3, 14, 48, 7, 23, 154, 2],
     'label':0},
    {'tokenized_input': [1, 30, 67, 117, 21, 15, 2],
     'label':1},
    {'tokenized_input': [1, 17, 2],
     'label':0},
]

def custom_collate(data): #(2)
    inputs = [torch.tensor(d['tokenized_input']) for d in data] #(3)
    labels = [d['label'] for d in data]
    inputs = pad_sequence(inputs, batch_first=True) #(4)
    labels = torch.tensor(labels) #(5)
    return { #(6)
        'tokenized_input': inputs,
        'label': labels
    }
loader = DataLoader(nlp_data, batch_size=2, shuffle=False, collate_fn=custom_collate) #(7)
iter_loader = iter(loader)

batch1 = next(iter_loader)
pprint(batch1)

batch2 = next(iter_loader)
pprint(batch2)

上面代码中使用pad_sequence进行padding，自定义的collate_fn函数的形参只有一个——可以是字典列表或者元组列表（取决于dataset怎么写）。
如果形参data是一个字典列表，则需要单独将数据的input_data和label取出，分别组成对应的列表；然后将inputs按照当前batch的最长样本进行补齐，将labels从数组转为tensor。
结果如下：

{'label': tensor([0, 0]),
 'tokenized_input': tensor([[  1,   4,   5,   9,   3,   2,   0,   0,   0],
        [  1,   7,   3,  14,  48,   7,  23, 154,   2]])}
{'label': tensor([1, 0]),
 'tokenized_input': tensor([[  1,  30,  67, 117,  21,  15,   2],
        [  1,  17,   2,   0,   0,   0,   0]])}

10. 预测阶段别漏了eval

一、model.train()和model.eval（)分别在训练和测试中都要写，它们的作用如下： (1)、 model.train() 启用BatchNormalization和 Dropout，将BatchNormalization和Dropout置为True (2)、 model.eval（) 不启用 BatchNormalization 和 Dropout，将BatchNormalization和Dropout置为False

# evaluate model:
model.eval（)

with torch.no_grad():
    ...
    out_data = model(data)
    ...

11. list和tensor互相转换

a = [1, 2, 3]
b = torch.FloatTensor(a)

# 方法二
b = torch.as_tensor(a)

# array转list
import numpy as np
a = np.array(12, np.float32)
list = a.tolist()

# list转array
import numpy as np
a = list()
array = np.array(a)

# array转Tensor
import numpy as np
a = np.array(1, np.float32)
tensor = torch.from_numpy(a)

# Tensor转array
array = tensor.numpy()

# Tensor转list
list = tensor.numpy().tolist()

# list转Tensor
tensor=torch.Tensor(list)

12. 统计tensor中负数的个数

如先将维度为发128X1的tensor通过squeeze压缩为128后：

( self.fc(x).squeeze(1)<0 ).sum().item()

13. tensor基础操作

import torch
x = torch.arange(12)
# tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
x1 = x.reshape(3, 4)  # 改变维度
x2 = x.reshape(-1, 4)
x3 = torch.zeros((2, 3, 4))
x4 = torch.ones((2, 3, 4)) # 所有元素都为1
# 正态分布
x5 = torch.randn(3, 4)
x6 = torch.tensor([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])

x = torch.tensor([1.0, 2, 4, 8])
y = torch.tensor([2, 2, 2, 2])
# 都是按元素操作,注意**是求幂运算
print(x + y, x - y, x * y, x / y, x ** y)

X = torch.arange(12, dtype=torch.float32).reshape((3,4))
Y = torch.tensor([[2.0, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])
# 每行（上下）拼接, dim=1为左右拼接
print(torch.cat((X, Y), dim=0), "\n", torch.cat((X, Y), dim=1))

# 判断每个位置是否相同
X == Y

# 广播机制, 两个矩阵维度不同（数学上不能按元素相加），通过广播（a赋值列，b赋值行）后相加
a = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))
print(a + b)

# 切片和索引， 和numpy差不多
X[-1], X[1:3]
X[1, 2]
X[0:2, :] = 12  # 赋值

14. permute()函数用法

可以交换tensor的顺序，元素不变。

import torch

x = torch.randn(3, 4, 5)
y = x.permute(1, 2, 0) # 将第1个维度移到最后，第2个维度移到第一个，第3个维度移到第二个
print(x.shape) # 输出: (3, 4, 5)
print(y.shape) # 输出: (4, 5, 3)

15. 生成mask矩阵

比如要让tensor中数值大于0的位置标记为1.

test_tensor = torch.tensor([1, 2, 0])
test_ans = (test_tensor > 0).type(torch.int32)
test_ans
Out[26]: tensor([1, 1, 0], dtype=torch.int32)

16. model的搭建

nn.Sequential适用于快速验证结果，因为已经明确了要用哪些层，直接写一下就好了，不需要同时写__init__和forward；
从列表和字典理解：ModuleList和ModuleDict在某个完全相同的层需要重复出现多次时，比如 ResNets 中的残差计算，当前层的结果需要和之前层中的结果进行融合，一般使用 ModuleList/ModuleDict 比较方便，能够直接下标访问。
ModuleDict和ModuleList的作用类似，只是ModuleDict能够更方便地为神经网络的层添加名称（value即对应层信息）。
nn.Sequential、继承nn.Module、nn.ModuleList、nn.ModuleDict的做法如下：

import torch
import torch.nn as nn

input_size = 784
hidden_size = 128
output_size = 10

# 方法一： 使用nn.Sequential
model1 = nn.Sequential(
    nn.Linear(input_size, hidden_size),
    nn.ReLU(),
    nn.Linear(hidden_size, output_size)
)
# bs = 4
input = torch.rand(4, 784)
ans1 = model1(input)
ans1.shape  # [bs, 10]

# 方法二： 常规做法，继承nn.Module
class MLP(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(MLP, self).__init__()
        self.layer1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.layer2 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out = self.layer1(x)
        out = self.relu(out)  # 或者使用nn.functional.
        out = self.layer2(out)
        return out

input = torch.rand(4, 784)
model2 = MLP(input_size, hidden_size, output_size)
ans2 = model2(input)
ans2.shape  # [bs, 10]

# 方法三： 使用nn.ModuleList
import torch.nn as nn
class MLP(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(MLP, self).__init__()
        self.layers = nn.ModuleList([
            nn.Linear(input_size, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, output_size)
        ])
    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
        return x
# bs = 4
input = torch.rand(4, 784)
model3 = MLP(input_size, hidden_size, output_size)
ans3 = model3(input)
ans3.shape  # [bs, 10]

# 方法四： 使用nn.ModuleDict
import torch.nn as nn
class MLP(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(MLP, self).__init__()
        self.layers = nn.ModuleDict({
            'layer1': nn.Linear(input_size, hidden_size),
            'relu': nn.ReLU(),
            'layer2': nn.Linear(hidden_size, output_size)
        })
    def forward(self, x):
        for layer in self.layers.values():
            x = layer(x)
        return x
# bs = 4
input = torch.rand(4, 784)
model4 = MLP(input_size, hidden_size, output_size)
ans4 = model4(input)
ans4.shape  # [bs, 10]

17. autocast自动类型转换

autocast是PyTorch 1.6版本新增的一个功能，它是一个自动类型转换器，可以根据输入数据的类型自动选择合适的精度进行计算，从而使得计算速度更快，同时也能够节省显存的使用。使用autocast可以避免在模型训练过程中手动进行类型转换，减少了代码实现的复杂性。

在深度学习中，通常会使用浮点数进行计算，但是浮点数需要占用更多的显存，而低精度数值可以在减少精度的同时，减少缓存使用量。因此，对于正向传播和反向传播中的大多数计算，可以使用低精度型的数值，提高内存使用效率，进而提高模型的训练速度。

参考： https://www.fke6.com/html/87474.html https://pytorch.org/docs/stable/amp.html?highlight=autocast#torch.autocast

# 使用栗子
# 导入相关库
import torch
from torch.cuda.amp import autocast

# 定义一个模型
class MyModel(torch.nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.linear = torch.nn.Linear(10, 1)

    def forward(self, x):
        with autocast():
            x = self.linear(x)
        return x

# 初始化数据和模型
x = torch.randn(1, 10).cuda()
model = MyModel().cuda()

# 进行前向传播
 with autocast():
    output = model(x)

# 计算损失
loss = output.sum()

# 反向传播
loss.backward()

上面代码：with autocast(): 语句块内的代码会自动进行混合精度计算，也就是根据输入数据的类型自动选择合适的精度进行计算，并且这里使用了GPU进行加速。

18.epoch内定期保存checkpoint

用torch.load从磁盘加载之前保存的模型参数和优化器状态
optimizer.param_groups获得之前的优化器的参数组；scheduler.load_state_dict()加载之前训练的学习率调度器
继续训练时也保存checkpoint

import torch
import torch.optim as optim
from torch.optim.lr_scheduler import StepLR
from my_model import MyModel
from my_dataset import MyDataset

# 1. 加载之前训练的模型参数和优化器状态
checkpoint = torch.load('checkpoint.pth')
model_state_dict = checkpoint['model_state_dict']
optimizer_state_dict = checkpoint['optimizer_state_dict']
epoch = checkpoint['epoch']

# 2. 设置训练的起始epoch和最大epoch
start_epoch = epoch + 1
max_epoch = 100

# 3. 设置优化器的学习率
learning_rate = 0.001
model = MyModel()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
optimizer.load_state_dict(optimizer_state_dict)

# 4. 设置学习率调度器的状态
scheduler = StepLR(optimizer, step_size=10, gamma=0.1)
scheduler_state_dict = checkpoint['scheduler_state_dict']
if scheduler_state_dict is not None:
    scheduler.load_state_dict(scheduler_state_dict)

# 5. 继续训练模型
dataset = MyDataset()
train_loader = DataLoader(dataset, batch_size=32, shuffle=True)
for epoch in range(start_epoch, max_epoch):
    for inputs, targets in train_loader:
        # 训练模型
        outputs = model(inputs)
        loss = compute_loss(outputs, targets)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    # 更新学习率
    scheduler.step()

    # 保存模型和优化器状态
    if epoch % 10 == 0:
        checkpoint = {
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'scheduler_state_dict': scheduler.state_dict(),
            'epoch': epoch
        }
        torch.save(checkpoint, 'checkpoint.pth')

19. contiguous连续存储

contiguous将张量转为内存连续的方式存储，防止向量切片后续计算性能影响

# tf中gpt2的loss计算
if labels is not None:
    # move labels to correct device to enable model parallelism
    labels = labels.to(lm_logits.device)
    # Shift so that tokens < n predict n
    shift_logits = lm_logits[..., :-1, :].contiguous()
    shift_labels = labels[..., 1:].contiguous()
    # Flatten the tokens
    loss_fct = CrossEntropyLoss()
    loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))

20. 查看显存占用情况、指定具体显卡

利用CUDA_VISIBLE_DEVICES设置可用显卡在CUDA中设定可用显卡，一般有2种方式：

(1) 在代码中直接指定

import os os.environ[‘CUDA_VISIBLE_DEVICES’] = gpu_ids (2) 在命令行中执行代码时指定

CUDA_VISIBLE_DEVICES=gpu_ids python3 train.py 如果使用sh脚本文件运行代码，则有3种方式可以设置

(3) 在命令行中执行脚本文件时指定：

CUDA_VISIBLE_DEVICES=gpu_ids sh run.sh (4) 在sh脚本中指定：

source bashrc export CUDA_VISIBLE_DEVICES=gpu_ids && python3 train.py (5) 在sh脚本中指定

source bashrc CUDA_VISIBLE_DEVICES=gpu_ids python3 train.py 如果同时使用多个设定可用显卡的指令，比如

source bashrc export CUDA_VISIBLE_DEVICES=gpu_id1 && CUDA_VISIBLE_DEVICES=gpu_id2 python3 train.py 那么高优先级的指令会覆盖第优先级的指令使其失效。优先级顺序为：不使用sh脚本 (1)>(2)；使用sh脚本(1)>(5)>(4)>(3)

个人感觉在炼丹时建议大家从(2)(3)(4)(5)中选择一个指定可用显卡，不要重复指定以防造成代码的混乱。方法(1)虽然优先级最高，但是需要修改源代码，所以不建议使用。

监听显卡，每 1 秒刷新一次：watch -n -1 -d nvidia-smi

21.将k个tensor进行stack

目标：在python的列表中，每个元素是维度为torch.Size([768])的tensor,一共有3个元素，请问如何将这三个tensor组装为维度为3X768的tensor

import torch

# 假设列表中的三个张量
tensor_list = [
    torch.randn(768),
    torch.randn(768),
    torch.randn(768)
]

# 使用 torch.stack 函数将列表中的张量堆叠成一个新的张量
stacked_tensor = torch.stack(tensor_list)

# 打印新的张量维度, 维度为torch.Size([3, 768])
print(stacked_tensor.size())

22. 用tensor计算平方和

在k-means聚类中, 如果我们有 $PyTorch 设置缓存目录 pytorch 内存不足_github_21$ 个簇, 并且第 $PyTorch 设置缓存目录 pytorch 内存不足_github_22$ 个簇的中心是 $PyTorch 设置缓存目录 pytorch 内存不足_深度学习_23$ , 那么第 $PyTorch 设置缓存目录 pytorch 内存不足_github_24$ 个数据点是 $PyTorch 设置缓存目录 pytorch 内存不足_PyTorch 设置缓存目录_25$ 并且它属于簇 $PyTorch 设置缓存目录 pytorch 内存不足_github_22$ , 则SSE的公式为: $PyTorch 设置缓存目录 pytorch 内存不足_神经网络_27$

其中:

$PyTorch 设置缓存目录 pytorch 内存不足_pytorch_28$
$PyTorch 设置缓存目录 pytorch 内存不足_PyTorch 设置缓存目录_29$ 是第 $PyTorch 设置缓存目录 pytorch 内存不足_github_30$
$PyTorch 设置缓存目录 pytorch 内存不足_PyTorch 设置缓存目录_31$ 是第 $PyTorch 设置缓存目录 pytorch 内存不足_PyTorch 设置缓存目录_32$
$PyTorch 设置缓存目录 pytorch 内存不足_github_33$ 是第 $PyTorch 设置缓存目录 pytorch 内存不足_github_30$
$PyTorch 设置缓存目录 pytorch 内存不足_github_35$ 是第 $PyTorch 设置缓存目录 pytorch 内存不足_PyTorch 设置缓存目录_32$ 个数据点到第 $PyTorch 设置缓存目录 pytorch 内存不足_github_30$

如下代码中，cluster_samples_emb维度为(265, 768)，cluster_center_emb维度为(768,)，这里聚类的指标SSE公式如上所示，所以就两个向量作相减，求平方和：

centers = clustering.cluster_centers_
labels = clustering.labels_
# 计算 sse指标
sse = 0
k = args.kmeans_num_clusters
for i in range(k):
    # cluster_samples = high_dim_vectors[labels == i]
    cluster_samples_emb = high_dim_vectors[labels == i].numpy()  # 之前是torch.tensor类型
    cluster_center_emb = centers[i]
    cluster_sse = np.sum((cluster_samples_emb - cluster_center_emb) ** 2)
    sse += cluster_sse
print("聚类指标SSE :", sse, "\n")

reference

1）pytorch中文文档：https://pytorch-cn.readthedocs.io/zh/latest/ 2）pytorch英文文档：https://pytorch.org/docs/stable/index.html 3）pytorch官方教程的笔记： 4）学习GNN可看pytorch的geometric文档：https://pytorch-geometric.readthedocs.io/en/latest/index.html 5）小土堆pytorch的b站视频：https://www.bilibili.com/video/BV1hE411t7RN 6）PyTorch官方教程介绍 7）datawhale的PyTorch基础教程 8）《深度学习框架PyTorch入门与实践》陈云 9）https://www.zhihu.com/question/55720139/answer/294449487 10）pytorch常用的乘法运算以及相关的运算符(@、*） 11）pytorch测试时要加上model.eval（)的原因 12）在pytorch中指定显卡

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：docker 停止镜像启动镜像 docker 镜像启动命令

下一篇：python必应图片下载 python图片素材

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯