本次模式识别课程要求实现路标检测,训练集只给了5个样本,测试集有50个样本,听说HOG特征+特征匹配就能达到很好的效果,因此采用了这种方法。
在python-opencv里,有定义了一个类cv2.HOGDescriptor,使用这个类就可以直接提取图片的HOG特征。图片没有要求,3通道和单通道的我试一下结果一样。
网上关于这个类的介绍很少,翻了好多内容才找到了一部分。首先来看一下如何直接使用构造函数来定义一个hog对象,下面就是定义的方法,里面的参数稍微看一下(常用的就前面几个,后面的默认就行,在opencv教材里全部用的默认参数)
hog = cv2.HOGDescriptor(winSize,blockSize,blockStride,cellSize,nbins,derivAperture,winSigma,
histogramNormType,L2HysThreshold,gammaCorrection,nlevels)
常用的是winSize, blockSize, blockStride, cellSize, nbins这四个,分别是窗口大小(单位:像素)、block大小(单位:像素)、block步长(单位:像素)、cell大小(单位:像素)、bin的取值
这些的概念建议找一个HOG教程自己看一下就行,我们用的时候就自己规定这几个参数就差不多了(用默认的也可以,但是效果可能不好,毕竟这个特征描述子很看参数设置的,可以更换几组参数多试试)
贴一下我自己用的时候的过程:
import numpy as np
import cv2
img = cv2.imread(test)
#在这里设置参数
winSize = (128,128)
blockSize = (64,64)
blockStride = (8,8)
cellSize = (16,16)
nbins = 9
#定义对象hog,同时输入定义的参数,剩下的默认即可
hog = cv2.HOGDescriptor(winSize,blockSize,blockStride,cellSize,nbins)
定义完HOG描述子对象hog之后,就能拿来计算图像的HOG特征了,它封装的太好以至于用着非常简单无脑,直接用这个类中的成员函数compute就能求得一个图片的HOG特征描述子,它的返回值一个拼接好的n*1维的特征向量(应该就是把许多个特征向量横向拼接起来了,具体的n要看你设置的参数和窗口、block的步长),数据结构是Numpy-nparray类型,利用numpy也非常方便处理,使用过程如下
compute常用的参数有3个,第一个是必须的参数,就是图片(用opencv读取的numpy-nparray,经测试3通道BGR或者单通道灰度图都可以,而且结果也一样)数据结构。第二个是winStride,是窗口滑动步长(影响最终n的大小)。第三个是padding,填充,就是在外面填充点来处理边界。
然后就开始使用compute来计算HOG特征描述子
winStride = (8,8)
padding = (8,8)
test_hog = hog.compute(img, winStride, padding).reshape((-1,))
这里我就得到了HOG描述子,一个n*1的矩阵(numpy-nparray),这样HOG描述子就提取出来了,剩下就随心所欲了,这就是用python-opencv来实现提取HOG描述子
对于上述的路标问题我就是提取每个图片的HOG描述子,然后相互求内积,内积大的就说明两者最相近。
下面是我自己用python实现的HOG特征提取,主要是思路和梯度计算部分需要想明白,别的都很简单。
提取流程
梯度计算
归一化
提取单通道、三通道HOG特征,完成路标识别代码如下:
import numpy as np
import cv2
import os
import math
from sklearn.preprocessing import normalize
eps = 0.000001
#灰度图提取HOG
def getHOG_1dims(pic_name):
img = cv2.imread(pic_name,cv2.IMREAD_GRAYSCALE)
img = img/255
img = cv2.resize(img,(207,194))
g_img = np.zeros((img.shape[0],img.shape[1],2))
for i in range(1,img.shape[0]-1):
for j in range(1,img.shape[1]-1):
gx = img[i+1,j] - img[i-1,j]
gy = img[i,j+1] - img[i,j-1]
g = (gx**2 + gy**2)**0.5
if gx == 0 and gy == 0:
dg = 0
elif gx == 0 and gy != 0:
dg = math.pi/2
else:
dg = math.atan(gy/gx)
if dg < 0:
dg = dg + math.pi
if dg == math.pi:
dg = 0
g_img[i,j,0] = g
g_img[i,j,1] = dg
cell_n = np.zeros((9))
#cell h
h = img.shape[0]//16
#cell w
w = img.shape[1]//16
#cell size per h
h_size = 16
#cell size per w
w_size = 16
cell = np.zeros((h,w,9))
for m in range(h):
for n in range(w):
for i in range(h_size*m,h_size*(m+1)):
for j in range(w_size*n,w_size*(n+1)):
cell_n[int(g_img[i,j,1]//(math.pi/9))] += g_img[i,j,0]
cell[m,n] = cell_n
block = np.zeros((h//2,w//2,9))
for p in range(h//2):
for q in range(w//2):
for i in range(2*p,2*p+2):
for j in range(2*q,2*q+2):
block[p,q] += cell[i,j]
block_norm = np.zeros((h//2,w//2,9))
for i in range(h//2):
for j in range(w//2):
length = (np.linalg.norm(block[i,j])**2 + 0.000001)**0.5
block_norm[i,j] = block[i,j]/length
block_norm = block_norm.reshape(block_norm.shape[0]*block_norm.shape[1],9)
return block_norm
#RGB提取HOG
def getHOG_3dims(pic_name):
img = cv2.imread(pic_name)
img = cv2.resize(img,(192,192))
#img = cv2.imread('xxx.jpg')
#cv2.imshow('result',img)
#key = cv2.waitKey()
#print(img.shape[0])
#if key & 0xff == ord('q'):
# cv2.destroyAllWindows()
#gamma归一化 gamma取1
img = img/255
#用1阶微分算子计算图像梯度
g_img = np.zeros((img.shape[0],img.shape[1],3,2))
for i in range(1,img.shape[0]-1):
for j in range(1,img.shape[1]-1):
gx_b = img[i+1,j,0] - img[i-1,j,0]
gy_b = img[i,j+1,0] - img[i,j-1,0]
gx_g = img[i+1,j,1] - img[i-1,j,1]
gy_g = img[i,j+1,1] - img[i,j-1,1]
gx_r = img[i+1,j,2] - img[i-1,j,2]
gy_r = img[i,j+1,2] - img[i,j-1,2]
gb = (gx_b**2 + gy_b**2)**0.5
gg = (gx_g**2 + gy_g**2)**0.5
gr = (gx_r**2 + gy_r**2)**0.5
if gx_b == 0 and gy_b == 0:
dgb = 0
elif gx_b == 0 and gy_b != 0:
dgb = math.pi/2
else:
dgb = math.atan(gy_b/gx_b)
if dgb < 0:
dgb = dgb + math.pi
if gx_g == 0 and gy_g == 0:
dgg = 0
elif gx_g == 0 and gy_g != 0:
dgg = math.pi/2
else:
dgg = math.atan(gy_g/gx_g)
if dgg < 0:
dgg = dgg + math.pi
if gx_r == 0 and gy_r == 0:
dgr = 0
elif gx_r == 0 and gy_r != 0:
dgr = math.pi/2
else:
dgr = math.atan(gy_r/gx_r)
if dgr < 0:
dgr = dgr + math.pi
g_img[i,j,0,0] = gb
g_img[i,j,1,0] = gg
g_img[i,j,2,0] = gr
g_img[i,j,0,1] = dgb
g_img[i,j,1,1] = dgg
g_img[i,j,2,1] = dgr
#计算cell的梯度直方图向量,其中每个cell包含8*8个像素,每个block包含16*16个像素,即2*2个cell
cell_n = np.zeros((3,9))
#cell h
h = 24
#cell w
w = 24
#cell size per h
h_size = img.shape[0]//24
#cell size per w
w_size = img.shape[1]//24
cell = np.zeros((h,w,27))
for m in range(h):
for n in range(w):
for i in range(h_size*m,h_size*(m+1)):
for j in range(w_size*n,w_size*(n+1)):
for k in range(3):
cell_n[k,int(g_img[i,j,k,1]//(math.pi/9))] += g_img[i,j,k,0]
cell[m,n] = cell_n.reshape(27)
block = np.zeros((h//2,w//2,27))
for p in range(h//2):
for q in range(w//2):
for i in range(2*p,2*p+2):
for j in range(2*q,2*q+2):
block[p,q] += cell[i,j]
block_norm = np.zeros((h//2,w//2,27))
for i in range(h//2):
for j in range(w//2):
length = (np.linalg.norm(block[i,j])**2 + 0.000001)**0.5
block_norm[i,j] = block[i,j]/length
block_norm = block_norm.reshape(block_norm.shape[0]*block_norm.shape[1],27)
return block_norm
def judge(test):
global train
test_hog = getHOG_1dims(test)
temp = 0
result = 0
for i in range(5):
matrix = np.dot(test_hog,train[i].T)
num_sum = 0
for j in range(36):
# for k in range(16):
num_sum += matrix[j,j]
if num_sum > temp:
temp = num_sum
result = i+1
return result
if __name__ == '__main__':
train_1 = getHOG_1dims('xxx/1.jpg')
train_2 = getHOG_1dims('/xxx/2.jpg')
train_3 = getHOG_1dims('/xxx/3.jpg')
train_4 = getHOG_1dims('/xxx/4.jpg')
train_5 = getHOG_1dims('/xxx/5.jpg')
train = [train_1,train_2,train_3,train_4,train_5]
path = '/xxx/test/'
path_list = os.listdir(path)
path_list.sort(key=lambda x: int(x[:-4]))
count = 0
for filename in path_list:
result_1 = judge(path + filename)
print(result_1)
if (int(filename[:-4])-1)//10 + 1 == result_1:
count += 1
print("accquracy is :" + str(count/50))