python分类实战 python分类问题

转载

数据侠客行 2023-11-21 21:21:21

文章标签 python分类实战 python 回归 Logistic sed 文章分类 Python 后端开发

【实验目的】

1．掌握常见机器学习分类模型思想、算法，包括Fisher线性判别、KNN、朴素贝叶斯、Logistic回归、决策树等；
2．掌握Python编程实现分类问题，模型评价指标、计时功能、保存模型。

【实验要求】

理解Python在分类问题中的评价指标等细节操作；
掌握本章讲授的分类问题的Python编程操作。

【实验过程】（必要的实验步骤、绘图、代码注释、数据分析）

实验步骤
1、读入数据
2、数据预处理
3、数据分析方法介绍
4、编程实现数据分析方法，含代码注释
5、重要结果的图表绘制
6、必要的结果解释

【实验题目】

利用Logistic回归模型分类器计算下面的问题，要求报告模型评价指标、计时功能、保存模型。

为了防止出现假冒伪劣、逃避关税情况，某海关单位已有以往化验检测2个原产地的葡萄酒样品124个，现有6个未知类别样本葡萄酒需要确定来源地，如何制定检验策略对6个未知样本原产地进行甄别？利用所学分类算法探讨此问题，要求进行模型评价、增加计时功能、保存模型。数据见 data_wine_new.csv.

python分类实战 python分类问题_python分类实战

0 1 分别对应1 2原产地

Logistic回归模型

代码

# -*- coding: utf-8 -*-
'''step1 调用包'''
import time
starttime = time.time()
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
#Logistic regression
#from sklearn.linear_model.logistic import LogisticRegression
from sklearn.linear_model import LogisticRegression  
#调用准确率计算函数
from sklearn import metrics
from sklearn.metrics import accuracy_score  

from sklearn.metrics import confusion_matrix,classification_report

'''step2 导入数据'''
data = pd.read_csv('data_wine_new.csv')

'''step3 数据预处理'''
# 把带类标号数据（用于训练和检验）
# 和待判（最后3行）数据分开
data_used = data.iloc[:124,:] 
#Python从0开始计数,故上一行代码从第0行取到第114行
#共提取了115行。

data_unused = data.iloc[124:,:]


'''step4 划分数据集'''
#（将带类标号数据
#划分为训练集（75%）
#和检验集（25%）

#将类别列和特征列拆分，
#便于下面调用划分函数
y_data=data_used.iloc[:,0]
x_data=data_used.iloc[:,1:]

#调用sklearn中的函数划分上述数据
x_train, x_test, y_train, y_test = train_test_split(
        x_data,y_data,test_size=0.25,random_state=0)

'''step5 模型计算（训练、检验、评价）'''
#step5.1 训练模型
model_LR = LogisticRegression()
model_LR.fit(x_train, y_train)

# 模型系数:
print('模型系数为：\n', np.round(model_LR.coef_,4))
print('模型常数项为：\n',np.round(model_LR.intercept_,4))

#step5.2 检验模型
y_pred = model_LR.predict(x_test)

#step5.3 模型评价(准确率)
#这里y_test为真实检验集类标号
#pred_test为模型预测的检验集类标号
#比较二者即可得到准确率
acc_test = accuracy_score(y_test,y_pred)
print('检验准确率为：',np.round(acc_test,2))

'''模型评价  升级版'''
accuracy_score = accuracy_score(y_test,y_pred)
confusion_matrix = confusion_matrix(y_test,y_pred,labels=[0,1])
classification_report = classification_report(y_test,y_pred,labels=[0,1], target_names=["1号原产地","2号原产地"])
classification_report_split=classification_report.split ()   #提取矩阵中的值
#        auc = auc(best_estimator, X_train, X_test) 

##计算AUC     
#        from sklearn import metrics
from itertools import cycle
colors = cycle('gmcr')
if hasattr(model_LR, 'predict_proba'):
    y_score = model_LR.predict_proba(x_test)
    y_score=y_score[:,1]
else:
    y_score = model_LR.decision_function(x_test)
fpr, tpr, thresholds = metrics.roc_curve(y_test, y_score)
auc = metrics.auc(fpr, tpr)
#        print('\n AUC is:',auc)

endtime = time.time()
testtime = endtime-starttime

print('\n confusion_matrix = \n',confusion_matrix)
print('\n classification_report =\n', classification_report)
print('\n accuracy_score= ',accuracy_score)
print('\n precison= ',metrics.precision_score(y_test, y_pred))
print('\n recall= ',metrics.recall_score(y_test, y_pred))
print('\n f1_Score= ',metrics.f1_score(y_test, y_pred))
#        print('\n auc of train/test =', auc )
print('\n AUC is:',auc)
print("\n Time consuming is",np.round(testtime,3))

'''step6 模型存储'''
import joblib      
joblib.dump(model_LR,'model_LR.m')   #保存模型


'''step7 预测结果'''
#模型训练完，检验效果满意
#未知类别进行预测（分类）
x_unused = data_unused.iloc[:,1:]
pred_unused = model_LR.predict(x_unused)
print('待判样本预测类别为：',pred_unused)

结果

模型系数为：
 [[ 0.8206 -0.8799 -0.7472 0.4186 0.0166 0.0518 -0.551 -0.0446 0.4337
 -0.9608 0.2149 -0.4388 -0.0148]]
 模型常数项为：
 [0.3276]
 检验准确率为： 1.0confusion_matrix =
 [[16 0]
 [ 0 15]]

classification_report =	precision	recall	f1-score	support
1号原产地	1.00	1.00	1.00	16
2号原产地	1.00	1.00	1.00	15
accuracy			1.00	31
macro avg	1.00	1.00	1.00	31
weighted avg	1.00	1.00	1.00	31

accuracy_score= 1.0
precison= 1.0
recall= 1.0
f1_Score= 1.0
AUC is: 1.0
Time consuming is 0.033

待判样本预测类别为： [0. 0. 1. 0. 0. 0.]

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：mysql round 向上 mysql中row_number() over

下一篇：python要装什么包想用python需要装什么

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯