机器学习支持向量机答案支持向量机应用实例

转载

clghxq 2024-08-06 10:56:20

文章标签 机器学习支持向量机答案支持向量机机器学习核函数数据 文章分类 机器学习人工智能

SVM的简单应用

SVM支持向量机
SVM寻找区分两类的点的超平面（Hyper plane）,使用的边际（margin）是越大越好的
所有坐落在边际两边的超平面上的点被称作为“支持向量”
（1,1）（2,0）(2,3)

from sklearn import svm
X = [[2,0],[1,1],[2,3]]  # 首先是来定义的3个点
y = [0,0,1] # 就是对应分类的标记。
clf = svm.SVC(kernel= 'linear')  # 使用的线性的核函数
clf.fit(X,y)

print(clf)
print(clf.support_vectors_)  # [1,1][2,3]就是找到了那几点是对应的suppout vector,也就是对应的支持向量
print(clf.support_)   # [1 2] 对应的支持向量他是那几个，也就是支持向量的索引
print(clf.n_support_)  # 对于两个分类里面，每个只是找出了一个向量

# predict是应传入一个list列表所对应的参数的
print(clf.predict([[2,0]]))

SVM的人脸识别过程

SVM线性不可分的情况的，SVM算法的复杂度是有支持向量来决定的。
SVM真正起作用的点，SVM所对应的点的
SVM点越少，越容易被泛化，支持向量的点越多，越不容易被泛化
SVM解决线性不可分的情况的。
把线性不可分的数据，映射到高维空间上，进行线性可分的情况的。，高维空间上，找线性的超平面的
常用的核函数，是能够减少相应的计算量的过程的
是把训练集的向量点转化到高维的非线性映射函数，因为内积的算法复杂度是非常大，所以我们利用
核函数来取代计算非线性映射函数的内积。
核函数，主要是用来解决计算量的问题的，SVM只能够用来解决二分类问题，那么我们用什么方法来解决多分类的
问题那，首先我们是分为1类和多类这两类问题，然后是在对这多类来进行划分，又划分为两类，这样就是能够解决SVM的
分类问题的。
使用支持向量机来阶级人脸识别问题的

from __future__ import print_function

from time import time # 每个步骤花了多少时间。
import logging
import matplotlib.pyplot as plt  # 这里是绘图的工具的，视图出的人脸给绘图出来。

from sklearn.model_selection import train_test_split
from sklearn.datasets import  fetch_lfw_people
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.decomposition import PCA
from sklearn.svm import SVC
from IPython import  embed

print(__doc__)
# 程序的进展的一些信息给打印出来。
logging.basicConfig(level=logging.INFO,format='%(asctime)s %(message)s')

lfw_people  = fetch_lfw_people(min_faces_per_person=70,resize = 0.4) # 这里是来对相应的数据集进行加载的过程的
n_samples,h,w = lfw_people.images.shape # 提取图片矩阵,1288张采样图片,其中所对应的w = 50,h = 37

X = lfw_people.data  # 提取特征向量的矩阵 (1288, 1850)这里是对应的 X,所对应的维度的
n_features = X.shape[1]  # 矩阵对应的列数返回  1850，其中所对对应的每个维度是1850

y = lfw_people.target  # 返回的不同的人的身份 array([5, 6, 3, ..., 5, 3, 5], dtype=int64)
target_names = lfw_people.target_names  # 多少个人的名字,返回是有多少个名字
n_classes = target_names.shape[0] #  7总共是7个人进行人脸识别的。

print("total dataset size:")
print("n_sample:%d"% n_samples)
print("n_features:%d"% n_features)
print("n_classes:%d"% n_classes)

# 其中是75%是对应的训练集，25%是对应的测试集
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.25)  #把数据分成训练集和相应的测试集

n_components = 150  # 组成元素的数量的

print("Extracting the top %d eigenfaces from %d faces" % (n_components,X_train.shape[0]))
t0 = time()   # 初始时间，pca高维特征的向量的，降维低维的特征向量的。
pca = PCA(n_components=n_components,whiten = True).fit(X_train)  # 这里是来对相应的数据进行降维的过程的
print("dong in %0.3fs"% (time()-t0))

eigenfaces = pca.components_.reshape((n_components,h,w))   # 对于人脸来提取相应的特征的(150, 50, 37)
print("Projecting the input data on the eigenfaces orthonormal basis")
t0 = time()
X_train_pca = pca.fit_transform(X_train) # 降维后，所对应特征向量 # (966, 150)这里是来把维度降维了150
X_test_pca = pca.fit_transform(X_test)  # (322, 150) 这里是322张维度是150的图片
print("dong in %0.3fs" %(time()-t0))

print("Fitting the classifier to the training set")
t0 = time()
# 不同值，gamma来建立相应的核函数的过程的。
param_graid = {'C':[1e3,5e3,1e4,5e4,1e5],   # gamma 是多少特征点，会被使用的。这30中组合哪一种会达到最好准确率
               'gamma':[0.0001,0.0005,0.001,0.005,0.01,0.1],} # C是对错误部分来进行惩罚的过程的。

# 是来把上面核函数的每一组值，通过想相应的核函数进行计算，看那一组值，取得准确率是最高的。
# 然后是来使用的上面的值，这样就是能够让不同的组合是有30种值，这样就是能够让，寻找一组值能够让准确率最好的。
clf = GridSearchCV(SVC(kernel='rbf',class_weight='balanced'),param_graid)  # 这里是来把支撑向量机模型给建立好了
clf = clf.fit(X_train_pca,y_train)  #SVM来进行建模的过程的。然后是来使用的相应的训练数据，和训练labal来拟合数据
print("done is %0.3s" % (time()-t0)) #
print("Best estimator found by grid search:")
print(clf.best_estimator_)  # clf里面是存储了已经建立好的model的
print("Predicting people's name on the test set")
t0 = time()
y_pred = clf.predict(X_test_pca)  # 预测出相应的label
print("done in %0.3fs" % (time()-t0))

print(classification_report(y_test,y_pred,target_names=target_names))  # 通过相应的预测标签和测试标签这样来得到相应的准确率
print(confusion_matrix(y_test,y_pred,labels= range(n_classes)))   # 得到相应的混淆矩阵，在对角线越多说明预测的越准确。

# 图像，titles预测的标签和真实的标签，和对应的图片大小
def plot_gallery(images,titles,h,w,n_row=3,n_col = 4):
    plt.figure(figsize=(1.8 * n_col,2.4*n_row))  #建立一个图做为相应的背景。这里是对应的背景板的大小
    # hspace表示的是上下之间间距
    plt.subplots_adjust(bottom=0,left=0.01,right=0.99,hspace=0.35)
    for i in range(n_row * n_col):
        plt.subplot(n_row,n_col,i+1)
        plt.imshow(images[i].reshape((h,w)),cmap=plt.cm.gray)
        plt.title(titles[i],size=12)
        plt.xticks(())
        plt.yticks(())

# 其中的对应的预测标签，测试标签，
def title(y_pred,y_test,target_names,i):
    pred_name = target_names[y_pred[i]].rsplit(' ',1)[-1]
    true_name = target_names[y_test[i]].rsplit(' ',1)[-1]
    return 'predicted: %s \n true: %s'%(pred_name,true_name)  # 返回相应的预测名字和真实的名字

prediction_titles = [title(y_pred,y_test,target_names,i) for i in range(y_pred.shape[0])]

plot_gallery(X_test,prediction_titles,h,w)

eigenfaces_titles = ["eighface %d" % i for i in range(eigenfaces.shape[0])]
plot_gallery(eigenfaces,eigenfaces_titles,h,w)

plt.show()