使用visual transformer实现图像分类图像分类算法代码

转载

mob64ca1412b28c 2024-02-27 22:27:23

文章标签 分类 cnn matlab 图像分类预处理 文章分类 计算机视觉人工智能

💥1 概述

人工分类的时间和速度有很大的不确定性和不稳定性，若图像种类和数量都很多的情况下，采取人工分类的方法耗费人力和时间，不如利用计算机的处理速度和稳定性来代替人工人类。图像分类技术是计算机视觉任务的基础 [1] 。与深度学习相结合的图像分类技术，要经过图像预处理、特征提

取和分类这三步骤。首先用三种方法中的一种或几种对图像做一些预处理；然后根据三类特征用相应的算法进行提取；经过一系列的特征向量的变化，最后在分类时分二分类或多分类，输出一个值。

（ 1 ）图像的预处理 [2] 在有很重要的地位，因为图像的质量会影响了后面模型的需完成的任务，例如分类、识别或者分割等技术的实现。预处理主要包括是三个部分：

1. 灰度化，在 RGB 图像转为灰度化的图像时对应的值叫做灰度值，还可以叫强度值或亮度值，灰度值取值区间为[0,255]，而在其的颜色空间之中， R 、 G 、 B 都为零时，是黑色；都为 100 时。是灰色，而为 255 时是白色，在这种 R 、 G 、B 的值都相等的时候，图像是在黑白灰三种颜色中过渡。灰度化方法一般会用：最大值法、平均值法、加权平均法等。

2. 几何变换也叫图像的空间变换，它对图像进行平移、镜像或转置等处理，使得在样本不足的情况下，用这种方法增加样本，来增加图像分类模型训练后的正确性，减少误差。然后用图像插值方法。线性插值方法主要分为两种：线性插值：最近邻插值、双线性插值、双三次插值。和非线性

插值：分为两类——基于小波系数、基于边缘信息（分为显示方法、隐式方法，其中隐式方法包括：NEDI 、 LMMSE 、SAI、 CGI ）。

📚2 运行结果

使用visual transformer实现图像分类图像分类算法代码_cnn

使用visual transformer实现图像分类图像分类算法代码_cnn_02

使用visual transformer实现图像分类图像分类算法代码_matlab_03

使用visual transformer实现图像分类图像分类算法代码_图像分类_04

使用visual transformer实现图像分类图像分类算法代码_图像分类_05

部分代码：

function [gradients,loss] = modelGradients(dlnet,dlX,Y)
% A convolutional operation was done with the function forward
dlYPred = forward(dlnet,dlX);
% the result obtained was normalized with the function softmax as done in
% softmax layer
dlYPred = softmax(dlYPred);
% cross entropy loss was calculated with the function crossentropy
% if you would like the network to solve a regression problem, squared loss
% is used in many cases
loss = crossentropy(dlYPred,Y);
% the gradiant was calculated with the function dlgradient
gradients = dlgradient(loss,dlnet.Learnables);
end

%This function resizes the images to the proper size for the pre-trained
%network
function Iout = readAndPreproc(inFilename,imgSize)
% read the target image
I = imread(inFilename);
if size(I,3)==1
I=cat(3,I,I,I);
end
% resize into the input size for the pre-trained model
Iout = imresize(I,[imgSize(1:2)]);
end

function [XTrainX,YY,idxS]=MixUpPreProc(XTrain,YTrain,numMixUp)
% first, the composition ratio of each image was randomly determined
% for example, in the explanation above, the value of α, β and γ was
% decided
lambda=rand([numel(YTrain),numMixUp]);
lambda=lambda./sum(lambda,2);%the sum of lambda value accross the class should be 1
lambda=reshape(lambda,[1 1 1 numel(YTrain) numMixUp]);

idxS=[]; XTrainK=[]; YTrainK=[]; XTrainX=zeros([size(XTrain)]);
numClasses=numel(countcats(YTrain));classes = categories(YTrain);
% after this loop, idxS will be a vector with the size of (number of
% training image) * 1 * (number of mix up)
% number of mixup is, in many cases, 2, but you can specify as you want
% The value extracted from idxS(N,1,1:end) represents the index of training images to mix up
% this means, the images with the same class can be mixed up
% The images were mixed up with the weight of lamda
% The variable XTrainX is the images after mixed-up
for k=1:numMixUp
idxK=randperm(numel(YTrain));
idxS=cat(3,idxS,idxK);
XTrainK=cat(5,XTrainK,double(XTrain(:,:,:,idxK)));
YTrainK=cat(2,YTrainK,YTrain(idxK)); %YTrainK:(miniBatchSize)×(numMixUp)
XTrainX=XTrainX+double(XTrain(:,:,:,idxK)).*lambda(1,1,1,:,k);
end

% Next, the vector which corresponds to the label information was made
% if the classes in the task are dog, cat and bird and one image was
% synthesized using 50 % of dog and bird image,
% the label for the synthesized image should be [0.5 0 0.5]
% Howeve, in this loop, the weitht and the classes to pick were
% randomly collected, then the lables were prepared as follows:
lambda=squeeze(lambda);
Y = zeros(numClasses, numel(YTrain), numMixUp,'single');
for j=1:numMixUp
lambdaJ=lambda(:,j);
for c = 1:numClasses
Y(c,YTrain(idxS(1,:,j))==classes(c),j) = lambdaJ(find(YTrain(idxS(1,:,j))==classes(c)));
end
end
YY=sum(Y,3);
end