下面学习如何使用 PyTorch 保存和加载模型。我们经常需要加载之前训练过的模型,或继续用新的数据训练模型。所以这部分还是挺重要的。

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms

import helper
# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))])
# Download and load the training data
trainset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

# Download and load the test data
testset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=False, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)


image, label = next(iter(trainloader))

我将上一部分的模型架构和训练代码移到了文件 ​​fc_model​​​ 中。通过导入此模块,我们可以使用 ​​fc_model.Network​​​ 轻松创建一个完全连接的网络,并使用 ​​fc_model.train​​ 训练网络。我会使用经过训练后的模型来演示保存和加载。

# Create the network, define the criterion and optimizer

model = fc_model.Network(784, 10, [512, 256, 128])
criterion = nn.NLLLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
fc_model.train(model, trainloader, testloader, criterion, optimizer, epochs=2)
Epoch: 1/2..  Training Loss: 1.703..  Test Loss: 0.997..  Test Accuracy: 0.659
Epoch: 1/2.. Training Loss: 1.060.. Test Loss: 0.738.. Test Accuracy: 0.733
Epoch: 2/2.. Training Loss: 0.528.. Test Loss: 0.445.. Test Accuracy: 0.840
Epoch: 2/2.. Training Loss: 0.502.. Test Loss: 0.465.. Test Accuracy: 0.829
Epoch: 2/2.. Training Loss: 0.540.. Test Loss: 0.439.. Test Accuracy: 0.837



PyTorch 网络的参数保存在模型的 ​​state_dict​​ 中。可以看到这个状态字典包含每个层级的权重和偏差矩阵。

print("Our model: \n\n", model, '\n')
print("The state dict keys: \n\n", model.state_dict().keys())
Our model: 

(hidden_layers): ModuleList(
(0): Linear(in_features=784, out_features=512, bias=True)
(1): Linear(in_features=512, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=128, bias=True)
(output): Linear(in_features=128, out_features=10, bias=True)
(dropout): Dropout(p=0.5, inplace=False)

The state dict keys:

odict_keys(['hidden_layers.0.weight', 'hidden_layers.0.bias', 'hidden_layers.1.weight', 'hidden_layers.1.bias', 'hidden_layers.2.weight', 'hidden_layers.2.bias', 'output.weight', 'output.bias'])

最简单的方法是使用 ​​torch.save​​​ 保存状态字典。例如,我们可以将其保存到文件 ​​'checkpoint.pth'​​ 中。

torch.save(model.state_dict(), 'checkpoint.pth')

然后,使用 ​​torch.load​​ 加载这个状态字典。

state_dict = torch.load('checkpoint.pth')
odict_keys(['hidden_layers.0.weight', 'hidden_layers.0.bias', 'hidden_layers.1.weight', 'hidden_layers.1.bias', 'hidden_layers.2.weight', 'hidden_layers.2.bias', 'output.weight', 'output.bias'])

要将状态字典加载到神经网络中,需要执行 ​​model.load_state_dict(state_dict)​​。

<All keys matched successfully>


# Try this
model = fc_model.Network(784, 10, [400, 200, 100])
# This will throw an error because the tensor sizes are wrong!

RuntimeError Traceback (most recent call last)

<ipython-input-13-d859c59ebec0> in <module>
2 model = fc_model.Network(784, 10, [400, 200, 100])
3 # This will throw an error because the tensor sizes are wrong!
----> 4 model.load_state_dict(state_dict)

~/anaconda3/envs/tf/lib/python3.6/site-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
845 if len(error_msgs) > 0:
846 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 847 self.__class__.__name__, "\n\t".join(error_msgs)))
848 return _IncompatibleKeys(missing_keys, unexpected_keys)

RuntimeError: Error(s) in loading state_dict for Network:
size mismatch for hidden_layers.0.weight: copying a param with shape torch.Size([512, 784]) from checkpoint, the shape in current model is torch.Size([400, 784]).
size mismatch for hidden_layers.0.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([400]).
size mismatch for hidden_layers.1.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([200, 400]).
size mismatch for hidden_layers.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([200]).
size mismatch for hidden_layers.2.weight: copying a param with shape torch.Size([128, 256]) from checkpoint, the shape in current model is torch.Size([100, 200]).
size mismatch for hidden_layers.2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([100]).
size mismatch for output.weight: copying a param with shape torch.Size([10, 128]) from checkpoint, the shape in current model is torch.Size([10, 100]).


checkpoint = {'input_size': 784,
'output_size': 10,
'hidden_layers': [each.out_features for each in model.hidden_layers],
'state_dict': model.state_dict()}

torch.save(checkpoint, 'checkpoint.pth')


def load_checkpoint(filepath):
checkpoint = torch.load(filepath)
model = fc_model.Network(checkpoint['input_size'],

model = load_checkpoint('checkpoint.pth')