一、我的环境
● 语言环境:Python3.8
● 编译器:pycharm
● 深度学习环境:Pytorch
二、理论知识
1、ResNet要解决的是深度神经网络的“退化”问题。
(1) “退化”指的是,给网络叠加更多的层后,性能却快速下降的情况
(2) 训练集上的性能下降,可以排除过拟合;BN层的引入也基本解决了plain net的梯度消失和梯度爆炸问题(梯度小于1,累积后容易变为0;梯度大于1,累积后容易梯度爆炸)
(3) 浅层网络的解空间是包含在深层网络的解空间中的。深层网络的解空间至少存在不差于浅层网络的解,因为只需将增加的层变成恒等映射,其他层的权重原封不动copy浅层网络,就可以获得与浅层网络同样的性能
(4)主要是由于优化问题导致的。 反映出结构相似的模型,其优化难度是不一样的,且难度的增长并不是线性的,越深的模型越难以优化。
2、解决退化问题的两种解决思路
(1) 一种是调整求解方法,比如更好的初始化、更好的梯度下降算法等
(2) 调整模型结构,让模型更易于优化——改变模型结构实际上是改变了error surface的形态
3、残差块的结构
4、ResNet中两种不同的residual
(1)左侧残差结构称为 BasicBlock
(2) 右侧残差结构称为 Bottleneck
(a)其中第一层的1× 1的卷积核的作用是对特征矩阵进行降维操作,将特征矩阵的深度由256降为64;
第三层的1× 1的卷积核是对特征矩阵进行升维操作,将特征矩阵的深度由64升成256。
降低特征矩阵的深度主要是为了减少参数的个数。
如果采用BasicBlock,参数的个数应该是:256×256×3×3×2=1179648
采用Bottleneck,参数的个数是:1×1×256×64+3×3×64×64+1×1×256×64=69632
(b)先降后升为了主分支上输出的特征矩阵和捷径分支上输出的特征矩阵形状相同,以便进行加法操作。
CNN参数个数 = 卷积核尺寸×卷积核深度 × 卷积核组数 = 卷积核尺寸 × 输入特征矩阵深度 × 输出特征矩阵深度
5、降维时的shortcut
这些虚线的 short cut 上通过1×1的卷积核进行了维度处理(特征矩阵在长宽方向降采样,深度方向调整成下一层残差结构所需要的channel)
6、如何解决梯度消失
ResNet最终更新某一个节点的参数时,由于h(x)=F(x)+x,使得链式求导后的结果如图所示,不管括号内右边部分的求导参数有多小,因为左边的1的存在,并且将原来的链式求导中的连乘变成了连加状态(正确?),都能保证该节点参数更新不会发生梯度消失或梯度爆炸现象。
7、如何解决网络退化问题
我们发现,假设该层是冗余的,在引入ResNet之前,我们想让该层学习到的参数能够满足h(x)=x,即输入是x,经过该冗余层后,输出仍然为x。但是可以看见,要想学习h(x)=x恒等映射时的这层参数时比较困难的。ResNet想到避免去学习该层恒等映射的参数,使用了如上图的结构,让h(x)=F(x)+x;这里的F(x)我们称作残差项,我们发现,要想让该冗余层能够恒等映射,我们只需要学习F(x)=0。学习F(x)=0比学习h(x)=x要简单,因为一般每层网络中的参数初始化偏向于0,这样在相比于更新该网络层的参数来学习h(x)=x,该冗余层学习F(x)=0的更新参数能够更快收敛,如图所示:
假设该曾网络只经过线性变换,没有bias也没有激活函数。我们发现因为随机初始化权重一般偏向于0,那么经过该网络的输出值为[0.6 0.6],很明显会更接近与[0 0],而不是[2 1],相比与学习h(x)=x,模型要更快到学习F(x)=0。并且ReLU能够将负数激活为0,过滤了负数的线性变化,也能够更快的使得F(x)=0。这样当网络自己决定哪些网络层为冗余层时,使用ResNet的网络很大程度上解决了学习恒等映射的问题,用学习残差F(x)=0更新该冗余层的参数来代替学习h(x)=x更新冗余层的参数。
这样当网络自行决定了哪些层为冗余层后,通过学习残差F(x)=0来让该层网络恒等映射上一层
8、总结
残差网络更容易优化,并且能够通过增加相当的深度来提高准确率
三、代码实现
1、main.py
# -*— coding: utf-8 -*-
import copy
import os
import time
import warnings
import torch.utils.data
from matplotlib import pyplot as plt
from torchsummary import torchsummary
from torchvision import transforms, datasets
from ResNet50 import *
# 一、前期工作
# 1、 设置GPU
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print("Using {} device!".format(device))
# 2、 导入数据
train_transforms = transforms.Compose([
transforms.Resize([224, 224]),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
# 加载数据
total_data = datasets.ImageFolder('./data/', train_transforms)
print('图片总数为:{}'.format(len(total_data)))
# 数据类别
class_names = total_data.classes
print("数据集分类为:{}".format(class_names))
# 划分数据集
train_size = int(len(total_data) * 0.8)
validate_size = len(total_data) - train_size
train_data, validate_data = torch.utils.data.random_split(total_data, [train_size, validate_size])
batch_size = 32
train_dl = torch.utils.data.DataLoader(train_data, batch_size=batch_size, shuffle=True)
validate_dl = torch.utils.data.DataLoader(validate_data, batch_size=batch_size, shuffle=False)
# 可视化图片样例
for imgs, labels in train_dl:
print("Batch images shape: ", imgs.shape)
print("Batch labels shape:", labels.shape)
# plt.figure('Data Visualization', figsize=(10, 5))
# for i, imgs in enumerate(imgs[:8]):
# # 维度顺序调整 [3, 224, 224]->[224, 224, 3]
# npimg = imgs.numpy().transpose((1, 2, 0))
# # 将整个figure分成2行10列,绘制第i+1个子图。
# plt.subplot(2, 4, i + 1)
# plt.imshow(npimg) # cmap=plt.cm.binary
# plt.show()
break
# 二、模型结构 ResNet50.py
model = ResNet50().to(device)
torchsummary.summary(model, (3, 224, 224))
# torchinfo.summary(model)
print(model)
# 三、模型训练
epochs = 10
learn_rate = 1e-7 # 初始学习率
loss_fn = nn.CrossEntropyLoss() # 创建损失函数
optimizer = torch.optim.Adam(model.parameters(), lr=learn_rate)
train_loss = []
train_acc = []
validate_loss = []
validate_acc = []
epoch_best_acc = 0
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset) # 训练集的大小
num_batches = len(dataloader) # 批次数目
train_loss, train_acc = 0, 0
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X) # 网络输出
loss = loss_fn(pred, y) # 计算网络输出和真实值之间的差距,targets为真实值,计算二者差值即为损失
# 反向传播
optimizer.zero_grad() # grad属性归零
loss.backward() # 反向传播
optimizer.step() # 每一步自动更新
# 记录acc与loss
train_acc += (pred.argmax(1) == y).type(torch.float).sum().item()
train_loss += loss.item()
train_acc /= size
train_loss /= num_batches
return train_acc, train_loss
def validata(dataloader, model, loss_fn):
size = len(dataloader.dataset) # 测试集的大小
num_batches = len(dataloader) # 批次数目
test_loss, test_acc = 0, 0
# 当不进行训练时,停止梯度更新,节省计算内存消耗
with torch.no_grad():
for imgs, target in dataloader:
imgs, target = imgs.to(device), target.to(device)
# 计算loss
target_pred = model(imgs)
loss = loss_fn(target_pred, target)
test_loss += loss.item()
test_acc += (target_pred.argmax(1) == target).type(torch.float).sum().item()
test_acc /= size
test_loss /= num_batches
return test_acc, test_loss
# 加载训练好的模型
pre_model_dir = './model/resnet50_pre_model.pkl'
if os.path.exists(pre_model_dir):
model.load_state_dict(torch.load(pre_model_dir, map_location=torch.device('cpu'))) # 加载预训练模型
print("Start training ......")
best_model = None
for epoch in range(epochs):
model.train()
epoch_train_acc, epoch_train_loss = train(train_dl, model, loss_fn, optimizer)
model.eval()
epoch_validate_acc, epoch_validate_loss = validata(validate_dl, model, loss_fn)
train_acc.append(epoch_train_acc)
train_loss.append(epoch_train_loss)
validate_acc.append(epoch_validate_acc)
validate_loss.append(epoch_validate_loss)
# 获取当前的学习率
lr = optimizer.state_dict()['param_groups'][0]['lr']
template = ('Epoch:{:2d}, Train_acc:{:.1f}%, Train_loss:{:.3f}, Test_acc:{:.1f}%, Test_loss:{:.3f}, Lr:{:.2E}')
print(time.strftime('[%Y-%m-%d %H:%M:%S]'),
template.format(epoch + 1, epoch_train_acc * 100, epoch_train_loss, epoch_validate_acc * 100,
epoch_validate_loss,
lr))
# 保存最佳模型
if epoch_best_acc < epoch_validate_acc:
epoch_best_acc = epoch_validate_acc
best_model = copy.deepcopy(model)
print(('acc = {:.1f}%, saving model to best.pkl').format(epoch_best_acc * 100))
torch.save(best_model.state_dict(), './model/best.pkl')
print('Done\n')
# 四、评估模型
range_epochs = range(epochs)
# 隐藏警告
warnings.filterwarnings("ignore") # 忽略警告信息
plt.rcParams['font.sans-serif'] = ['SimHei'] # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False # 用来正常显示负号
plt.rcParams['figure.dpi'] = 100 # 分辨率
plt.figure(figsize=(12, 3))
plt.subplot(1, 2, 1)
plt.plot(range_epochs, train_acc, label='Training Accuracy')
plt.plot(range_epochs, validate_acc, label='Validate Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(range_epochs, train_loss, label='Training Loss')
plt.plot(range_epochs, validate_loss, label='Validate Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
2、ResNet50.py
# -*- coding: utf-8 -*-
import torch
import torch.nn as nn
class IdentityBlock(nn.Module):
def __init__(self, in_channel, kernel_size, filters):
super(IdentityBlock, self).__init__()
filters1, filters2, filters3 = filters
self.conv1 = nn.Sequential(
nn.Conv2d(in_channel, filters1, 1, stride=1, padding=0, bias=False),
nn.BatchNorm2d(filters1),
nn.ReLU(True)
)
self.conv2 = nn.Sequential(
nn.Conv2d(filters1, filters2, kernel_size, stride=1, padding=autopad(kernel_size), bias=False),
nn.BatchNorm2d(filters2),
nn.ReLU(True)
)
self.conv3 = nn.Sequential(
nn.Conv2d(filters2, filters3, 1, stride=1, padding=0, bias=False),
nn.BatchNorm2d(filters3)
)
self.relu = nn.ReLU(True)
def forward(self, x):
x1 = self.conv1(x)
x1 = self.conv2(x1)
x1 = self.conv3(x1)
x = x1 + x
self.relu(x)
return x
class ConvBlock(nn.Module):
def __init__(self, in_channel, kernel_size, filters, stride=2):
super(ConvBlock, self).__init__()
filters1, filters2, filters3 = filters
self.conv1 = nn.Sequential(
nn.Conv2d(in_channel, filters1, 1, stride=stride, padding=0, bias=False),
nn.BatchNorm2d(filters1),
nn.ReLU(True)
)
self.conv2 = nn.Sequential(
nn.Conv2d(filters1, filters2, kernel_size, stride=1, padding=autopad(kernel_size), bias=False),
nn.BatchNorm2d(filters2),
nn.ReLU(True)
)
self.conv3 = nn.Sequential(
nn.Conv2d(filters2, filters3, 1, stride=1, padding=0, bias=False),
nn.BatchNorm2d(filters3)
)
self.conv4 = nn.Sequential(
nn.Conv2d(in_channel, filters3, 1, stride=stride, padding=0, bias=False),
nn.BatchNorm2d(filters3)
)
self.relu = nn.ReLU(True)
def forward(self, x):
x1 = self.conv1(x)
x1 = self.conv2(x1)
x1 = self.conv3(x1)
x2 = self.conv4(x)
x = x1 + x2
self.relu(x)
return x
class ResNet50(nn.Module):
def __init__(self):
super(ResNet50, self).__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(3, 64, 7, stride=2, padding=3, bias=False, padding_mode='zeros'),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2, padding=0)
)
self.conv2 = nn.Sequential(
ConvBlock(64, 3, [64, 64, 256], stride=1),
IdentityBlock(256, 3, [64, 64, 256]),
IdentityBlock(256, 3, [64, 64, 256])
)
self.conv3 = nn.Sequential(
ConvBlock(256, 3, [128, 128, 512]),
IdentityBlock(512, 3, [128, 128, 512]),
IdentityBlock(512, 3, [128, 128, 512]),
IdentityBlock(512, 3, [128, 128, 512])
)
self.conv4 = nn.Sequential(
ConvBlock(512, 3, [256, 256, 1024]),
IdentityBlock(1024, 3, [256, 256, 1024]),
IdentityBlock(1024, 3, [256, 256, 1024]),
IdentityBlock(1024, 3, [256, 256, 1024]),
IdentityBlock(1024, 3, [256, 256, 1024]),
IdentityBlock(1024, 3, [256, 256, 1024])
)
self.conv5 = nn.Sequential(
ConvBlock(1024, 3, [512, 512, 2048]),
IdentityBlock(2048, 3, [512, 512, 2048]),
IdentityBlock(2048, 3, [512, 512, 2048])
)
self.pool = nn.AvgPool2d(kernel_size=7, stride=7, padding=0)
self.fc = nn.Linear(2048, 4)
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = self.conv3(x)
x = self.conv4(x)
x = self.conv5(x)
x = self.pool(x)
x = torch.flatten(x, start_dim=1)
x = self.fc(x)
return x
def autopad(k, p=None): # kernel, padding
# Pad to 'same'
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
return p
predict.py
# -*- coding: utf-8 -*-
import torch
import torchvision
from PIL import Image
from ResNet50 import ResNet50
def predict(model, img_path, classeNames):
img = Image.open(img_path)
train_transforms = torchvision.transforms.Compose([
torchvision.transforms.Resize([224, 224]), # 将输入图片resize成统一尺寸
torchvision.transforms.ToTensor(), # 将PIL Image或numpy.ndarray转换为tensor,并归一化到[0,1]之间
torchvision.transforms.Normalize( # 标准化处理-->转换为标准正太分布(高斯分布),使模型更容易收敛
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]) # 其中 mean=[0.485,0.456,0.406]与std=[0.229,0.224,0.225] 从数据集中随机抽样计算得到的。
])
img = train_transforms(img)
img = torch.reshape(img, (1, 3, 224, 224))
output = model(img)
_, indices = torch.max(output, 1)
percentage = torch.nn.functional.softmax(output, dim=1)[0] * 100
perc = percentage[int(indices)].item()
result = classeNames[indices]
print('predicted:', result, perc)
if __name__ == '__main__':
classeNames = ['Bananaquit', 'Black Throated Bushtiti', 'Black skimmer', 'Cockatoo']
num_classes = len(classeNames)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Using {} device\n".format(device))
model = ResNet50().to(device)
model.load_state_dict(torch.load('./model/best.pkl', map_location=torch.device('cpu')))
model.eval()
img_path = './data/Bananaquit/001.jpg'
predict(model, img_path, classeNames)
四、遇到的问题
没有加载预训练模型,模型epoch 50已经收敛,但是效果较差,验证准确率85%。将训练的最优模型保存后,再第二次加载最优模型参数进行训练,结果epoch 10就取得了95%以上的效果。