PyTorch 08： PyTorch实现迁移学习（densenet121，resnet,，alexnet等）

原创

luteresa 2022-09-14 21:14:33 博主文章分类：PyTorch ©著作权

文章标签 神经网络深度学习卷积 2d ide 文章分类 机器学习人工智能

©著作权归作者所有：来自51CTO博客作者luteresa的原创作品，请联系作者获取转载授权，否则将追究法律责任

现在，我们来学习如何使用预训练的网络解决有挑战性的计算机视觉问题。你将使用通过 ImageNet 位于 torchvision 上训练的网络。

ImageNet 是一个庞大的数据集，包含 100 多万张有标签图像，并涉及 1000 个类别。.它可用于训练采用卷积层结构的深度神经网络。我不会详细讲解卷积网络，但是你可以观看此视频了解这种网络。

训练过后，作为特征检测器，这些模型可以在训练时未使用的图像上达到惊人的效果。对不在训练集中的图像使用预训练的网络称为迁移学习。我们将使用迁移学习训练网络分类猫狗照片并达到很高的准确率。

使用 torchvision.models，你可以下载这些预训练的网络，并用在你的应用中。我们现在导入 models。

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt

import torch
from torch import nn
from torch import optim
import torch.nn.functional as F
from torchvision import datasets, transforms, models

import

大多数预训练的模型要求输入是 224x224 图像。此外，我们需要按照训练时采用的标准化方法转换图像。每个颜色通道都分别进行了标准化，均值为 [0.485, 0.456, 0.406]，标准偏差为 [0.229, 0.224, 0.225]。

data_dir = 'Cat_Dog_data'

# TODO: Define transforms for the training data and testing data
train_transforms = transforms.Compose([transforms.RandomRotation(30),
                                       transforms.RandomResizedCrop(224),
                                       transforms.RandomHorizontalFlip(),
                                       transforms.ToTensor(),
                                       transforms.Normalize([0.485, 0.456, 0.406],
                                                            [0.229, 0.224, 0.225])])

test_transforms = transforms.Compose([transforms.Resize(255),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize([0.485, 0.456, 0.406],
                                                           [0.229, 0.224, 0.225])])

# Pass transforms in here, then run the next cell to see how the transforms look
train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)
test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)

trainloader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(test_data, batch_size=64)

data_iter = iter(testloader)
images, labels = next(data_iter)
fig, axes = plt.subplots(figsize=(10,4), ncols=4)
for ii in range(4):
    ax = axes[ii]
    helper.imshow(images[ii], ax=ax, normalize=False)

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

PyTorch 08： PyTorch实现迁移学习（densenet121，resnet,，alexnet等）_深度学习

DensNet网络

我们可以加载 DenseNet 等模型。现在我们输出这个模型的结构，看看后台情况。

model = models.densenet121(pretrained=True)
#model

该模型由两部分组成：特征和分类器。特征部分由一堆卷积层组成，整体作为特征检测器传入分类器中。分类器是一个单独的全连接层 (classifier): Linear(in_features=1024, out_features=1000)。这个层级是用 ImageNet 数据集训练过的层级，因此无法解决我们的问题。这意味着我们需要替换分类器。但是特征就完全没有问题。你可以把预训练的网络看做是效果很好地的特征检测器，可以用作简单前馈分类器的输入。

# Freeze parameters so we don't backprop through them
for param in model.parameters():
    param.requires_grad = False

from collections import OrderedDict
classifier = nn.Sequential(OrderedDict([
                          ('fc1', nn.Linear(1024, 500)),
                          ('relu', nn.ReLU()),
                          ('fc2', nn.Linear(500, 2)),
                          ('output', nn.LogSoftmax(dim=1))
                          ]))
    
model.classifier =

model

DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (denseblock1): _DenseBlock(
      (denselayer1): _DenseLayer(
        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
     ...
     ...
     ...
      (denselayer16): _DenseLayer(
        (norm1): BatchNorm2d(992, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(992, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
    )
    (norm5): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (classifier): Sequential(
    (fc1): Linear(in_features=1024, out_features=500, bias=True)
    (relu): ReLU()
    (fc2): Linear(in_features=500, out_features=2, bias=True)
    (output): LogSoftmax()
  )
)

构建好模型后，我们需要训练分类器。但是，问题是，现在我们使用的是非常深度的神经网络。如果你正常地在 CPU 上训练此网络，这会耗费相当长的时间。所以，我们将使用 GPU 进行运算。在 GPU 上，线性代数运算同步进行，这使得运算速度提升了 100 倍。我们还可以在多个 GPU 上训练，进一步缩短训练时间。

PyTorch 和其他深度学习框架一样，也使用 CUDA 在 GPU 上高效地进行前向和反向运算。在 PyTorch 中，你需要使用 model.to('cuda') 将模型参数和其他张量转移到 GPU 内存中。你可以使用 model.to('cpu') 将它们从 GPU 移到 CPU，比如在你需要在 PyTorch 之外对网络输出执行运算时。为了向你展示速度的提升对比，我将分别使用 GPU 和不使用 GPU 进行前向和反向传播运算。

import

for device in ['cpu', 'cpu']:#'cuda'

    criterion = nn.NLLLoss()
    # Only train the classifier parameters, feature parameters are frozen
    optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)

    model.to(device)

    for ii, (inputs, labels) in enumerate(trainloader):

        # Move input and label tensors to the GPU
        inputs, labels = inputs.to(device), labels.to(device)

        start = time.time()

        outputs = model.forward(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        if ii==3:
            break
        
    print(f"Device = {device}; Time per batch: {(time.time() - start)/3:.3f} seconds")

Device = cpu; Time per batch: 1.038 seconds
Device = cpu; Time per batch: 1.030 seconds

你可以先询问 GPU 设备是否可用，如果启用了 CUDA，它将自动使用 CUDA：

at beginning of the script

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

…

then whenever you get a new Tensor or Module
this won’t copy if they are already on the desired device
input = data.to(device)
model = MyModule(…).to(device)

接下来由你来完成模型训练过程。流程和之前的差不多，但是现在模型强大了很多，你应该能够轻松地达到 95% 以上的准确率。

**练习：**请训练预训练的模型来分类猫狗图像。你可以继续使用 DenseNet 模型或尝试 ResNet，两个模型都值得推荐。记住，你只需要训练分类器，特征部分的参数应保持不变。

## TODO: Use a pretrained model to classify the cat and dog images

# at beginning of the script
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model2 = models.densenet121(pretrained=True)

for param in model2.parameters():
    param.requires_grad = False
    
model2.classifier = nn.Sequential(nn.Linear(1024,256),
                                 nn.ReLU(),
                                 nn.Dropout(0.2),
                                 nn.Linear(256,2),
                                 nn.LogSoftmax(dim=1))
criterion = nn.NLLLoss()

optimizer = optim.Adam(model2.classifier.parameters(),lr=0.003)

model2.to(device)

DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    ...
    ...
    ...
      (denselayer16): _DenseLayer(
        (norm1): BatchNorm2d(992, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(992, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
    )
    (norm5): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (classifier): Sequential(
    (0): Linear(in_features=1024, out_features=256, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.2, inplace=False)
    (3): Linear(in_features=256, out_features=2, bias=True)
    (4): LogSoftmax()
  )
)

epochs = 1
steps = 0
running_loss = 0
print_every = 5

for epoch in range(epochs):
    for inputs,labels in trainloader:
        steps += 1
        inputs,labels = inputs.to(device),labels.to(device)
        
        optimizer.zero_grad()
        logps = model2.forward(inputs)
        loss = criterion(logps, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        
        if steps%print_every == 0:
            test_loss = 0
            accuracy = 0
            model2.eval()
            with torch.no_grad():
                for inputs,labels in testloader:
                    inputs,labels = inputs.to(device),labels.to(device)
                    logps = model2.forward(inputs)
                    batch_loss = criterion(logps, labels)
                    
                    test_loss += batch_loss.item()
                    
                    #Caculate accuracy
                    ps = torch.exp(logps)
                    top_p, top_class = ps.topk(1,dim=1)
                    equals = top_class == labels.view(*top_class.shape)
                    accuracy += torch.mean(equals.type(torch.FloatTensor)).item()
            
            print(f"Epoch {epoch+1}/{epochs}.. "
                  f"step: {steps}..."
                  f"Train loss: {running_loss/print_every:.3f}.. "
                  f"Test loss: {test_loss/len(testloader):.3f}.. "
                  f"Test accuracy: {accuracy/len(testloader):.3f}")
            
            if accuracy/len(testloader) > 0.98:
                 break
            running_loss = 0
            model2.train()

Epoch 1/1.. step: 5...Train loss: 0.857.. Test loss: 0.748.. Test accuracy: 0.568
...
...
...
Epoch 1/1.. step: 50...Train loss: 0.159.. Test loss: 0.052.. Test accuracy: 0.982

def test_model(my_model,my_device, my_epochs, my_trainloader, my_testloader):
    steps = 0
    running_loss = 0
    print_every = 5
    
    my_criterion = nn.NLLLoss()
    my_optimizer = optim.Adam(my_model.classifier.parameters(),lr=0.003)

    my_model.to(device)

    for epoch in range(my_epochs):
        for inputs,labels in my_trainloader:
            steps += 1
            inputs,labels = inputs.to(my_device),labels.to(my_device)

            my_optimizer.zero_grad()
            logps = my_model.forward(inputs)
            loss = my_criterion(logps, labels)
            loss.backward()
            my_optimizer.step()

            running_loss += loss.item()

            if steps%print_every == 0:
                test_loss = 0
                accuracy = 0
                my_model.eval()
                with torch.no_grad():
                    for inputs,labels in my_testloader:
                        inputs,labels = inputs.to(my_device),labels.to(my_device)
                        logps = my_model.forward(inputs)
                        batch_loss = my_criterion(logps, labels)

                        test_loss += batch_loss.item()

                        #Caculate accuracy
                        ps = torch.exp(logps)
                        top_p, top_class = ps.topk(1,dim=1)
                        equals = top_class == labels.view(*top_class.shape)
                        accuracy += torch.mean(equals.type(torch.FloatTensor)).item()

                print(f"Epoch {epoch+1}/{my_epochs}.. "
                      f"step: {steps}..."
                      f"Train loss: {running_loss/print_every:.3f}.. "
                      f"Test loss: {test_loss/len(my_testloader):.3f}.. "
                      f"Test accuracy: {accuracy/len(my_testloader):.3f}")
                running_loss = 0
                if accuracy/len(testloader) > 0.92:
                 break
                my_model.train()

import time
# at beginning of the script
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model_type = models.densenet121(pretrained=True)

DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  ...
  ...
  ...
      (denselayer16): _DenseLayer(
        (norm1): BatchNorm2d(992, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(992, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
    )
    (norm5): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (classifier): Linear(in_features=1024, out_features=1000, bias=True)
)

for param in model_type.parameters():
    param.requires_grad = False
    
model_type.classifier = nn.Sequential(nn.Linear(1024,256),
                                 nn.ReLU(),
                                 nn.Dropout(0.2),
                                 nn.Linear(256,2),
                                 nn.LogSoftmax(dim=1))

DenseNet(
  (features): Sequential(
    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (relu0): ReLU(inplace=True)
    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
 ...
 ...
 ...
      (denselayer16): _DenseLayer(
        (norm1): BatchNorm2d(992, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu1): ReLU(inplace=True)
        (conv1): Conv2d(992, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu2): ReLU(inplace=True)
        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      )
    )
    (norm5): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  )
  (classifier): Sequential(
    (0): Linear(in_features=1024, out_features=256, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.2, inplace=False)
    (3): Linear(in_features=256, out_features=2, bias=True)
    (4): LogSoftmax()
  )
)

EPOCHS = 1
start = time.time()
test_model(model_type,device,EPOCHS,trainloader,testloader)
print(f"Device = {device}; model_type='densenet121',Time per batch: {(time.time() - start):.3f} seconds")

Epoch 1/1.. step: 5...Train loss: 0.727.. Test loss: 0.233.. Test accuracy: 0.957
Epoch 1/1.. step: 10...Train loss: 0.370.. Test loss: 0.175.. Test accuracy: 0.940
Epoch 1/1.. step: 15...Train loss: 0.305.. Test loss: 0.109.. Test accuracy: 0.962
Epoch 1/1.. step: 20...Train loss: 0.214.. Test loss: 0.090.. Test accuracy: 0.970
Epoch 1/1.. step: 25...Train loss: 0.192.. Test loss: 0.145.. Test accuracy: 0.943
Epoch 1/1.. step: 30...Train loss: 0.235.. Test loss: 0.073.. Test accuracy: 0.973
Device = cpu; model_type='densenet121',Time per batch: 1363.256 seconds

AlexNet 网络

model_type = models.alexnet(pretrained=True)

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace=True)
    (3): Dropout(p=0.5, inplace=False)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace=True)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

for param in model_type.parameters():
    param.requires_grad = False
    
model_type.classifier = nn.Sequential(nn.Linear(9216,4096),
                                 nn.ReLU(),
                                 nn.Dropout(0.2),
                                 nn.Linear(4096,256),
                                 nn.ReLU(),
                                 nn.Dropout(0.2),
                                 nn.Linear(256,2),
                                 nn.LogSoftmax(dim=1))

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Linear(in_features=9216, out_features=4096, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.2, inplace=False)
    (3): Linear(in_features=4096, out_features=256, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.2, inplace=False)
    (6): Linear(in_features=256, out_features=2, bias=True)
    (7): LogSoftmax()
  )
)

EPOCHS = 1

start = time.time()
test_model(model_type,device,EPOCHS,trainloader,testloader)
print(f"Device = {device}; model_type='resnet',Time per batch: {(time.time() - start)/1:.3f} seconds")

Epoch 1/1.. step: 5...Train loss: 24.561.. Test loss: 1.131.. Test accuracy: 0.490
Epoch 1/1.. step: 10...Train loss: 0.979.. Test loss: 0.261.. Test accuracy: 0.896
Epoch 1/1.. step: 15...Train loss: 0.499.. Test loss: 0.432.. Test accuracy: 0.791
Epoch 1/1.. step: 20...Train loss: 0.503.. Test loss: 0.277.. Test accuracy: 0.896
Epoch 1/1.. step: 25...Train loss: 0.395.. Test loss: 0.188.. Test accuracy: 0.922
Device = cpu; model_type='resnet',Time per batch: 157.419 seconds

VGG16网络

model_type = models.vgg16(pretrained=True)

Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /home/leon/.cache/torch/checkpoints/vgg16-397923af.pth
100.0%





VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
   ...
    (29): ReLU(inplace=True)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

for param in model_type.parameters():
    param.requires_grad = False
    
model_type.classifier = nn.Sequential(nn.Linear(25088,4096),
                                 nn.ReLU(),
                                 nn.Dropout(0.2),
                                 nn.Linear(4096,256),
                                 nn.ReLU(),
                                 nn.Dropout(0.2),
                                 nn.Linear(256,2),
                                 nn.LogSoftmax(dim=1))

VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace=True)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace=True)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace=True)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU()
    (2): Dropout(p=0.2, inplace=False)
    (3): Linear(in_features=4096, out_features=256, bias=True)
    (4): ReLU()
    (5): Dropout(p=0.2, inplace=False)
    (6): Linear(in_features=256, out_features=2, bias=True)
    (7): LogSoftmax()
  )
)

EPOCHS = 1

start = time.time()
test_model(model_type,device,EPOCHS,trainloader,testloader)
print(f"Device = {device}; model_type='resnet',Time per batch: {(time.time() - start)/1:.3f} seconds")

Epoch 1/1.. step: 5...Train loss: 45.882.. Test loss: 3.707.. Test accuracy: 0.518
Epoch 1/1.. step: 10...Train loss: 1.980.. Test loss: 0.121.. Test accuracy: 0.957
Device = cpu; model_type='resnet',Time per batch: 672.757 seconds

观察这些形状

你需要检查传入模型和其他代码的张量形状是否正确。在调试和开发过程中使用 .shape 方法。

如果网络训练效果不好，检查以下几个事项：

在训练循环中使用 optimizer.zero_grad() 清理梯度。如果执行验证循环，使用 model.eval（) 将网络设为评估模式，再使用 model.train() 将其设为训练模式。

CUDA 错误
有时候你会遇到这个错误：

RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #1 ‘mat1’

第二个类型是 torch.cuda.FloatTensor，这意味着它是已经移到 GPU 的张量。它想获得类型为 torch.FloatTensor 的张量，但是没有 .cuda，因此该张量应该在 CPU 上。PyTorch 只能对位于相同设备上的张量进行运算，因此必须同时位于 CPU 或 GPU 上。如果你要在 GPU 上运行网络，一定要使用 .to(device) 将模型和所有必要张量移到 GPU 上，其中 device 为 “cuda” 或 “cpu”。