r语言混合模型数据集 r语言绘制混淆矩阵

转载

mob64ca13fe9c58 2023-08-21 11:03:00

文章标签 r语言混合模型数据集混淆矩阵 ci 接受者 文章分类 R语言后端开发

分类模型评价一般有以下几种方法：混淆矩阵(Confusion Matrix)、收益图(Gain Chart)、提升图(Lift Chart)、KS图(KS Chart)、接受者操作特性曲线(ROC Chart)。“分类模型评价与在R中的实现”系列中将逐个介绍。

本篇介绍最基础的混淆矩阵。

一、混淆矩阵简介

混淆矩阵将分类预测结果与实际目标进行比较，并汇总成NXN列联表（N为分类类型数）。

以二元分类为例：

<td colspan="2">Confusion Matrix

	Target
		Positive	Negative
Model	Positive	True Positives(TP)	False Positives(FP)
Model	Negative	False Negatives(FN)	True Negatives(TN)
		Positive Samples(P)	Negative Samples(N)

由上表可以计算的指标有：

Accuracy = (TP+TN)/(P+N)
 Error Rate = 1 – Accuracy = (FP+FN)/(P+N)
 False Positive Rate = Fallout = FP/N
 True Positive Rate = Recall = Sensitivity = TP/P
 False Negative Rate = Miss = FN/P
 True Negative Rate = Specificity = TN/N
 Positive Predictive Value = Precision = TP/(TP+FP)
 Negative Predictive Value = TN/(TN+FN)
 Prediction-conditioned Fallout = FP/(TP+FP)
 Prediction-conditioned Miss = FN/(TN+FN)
 Rate of Positive Predictions = Detection Prevalence = (TP+FP)/(P+N)
 Rate of Negative Predictions = (TN+FN)/(P+N)
 Prevalence = (TP+FN)/(P+N)
 Detection Rate = TP/(P+N)
 Balanced Accuracy = (Sensitivity+Specificity)/2

是不是感觉这些货已经组合完所有的分子/分母了？没关系，其实只要知道TP和TN越高越好就好了。

二、在R中计算混淆矩阵

这次使用ROCR包中的ROCR.simple数据集，其中prediction是预测值，labels为真实值。

require(ROCR)
data(ROCR.simple)
str(ROCR.simple)

## List of 2
##  $ predictions: num [1:200] 0.613 0.364 0.432 0.14 0.385 ...
##  $ labels     : num [1:200] 1 1 0 0 0 1 1 1 1 0 ...

1 用table()直接计算

在确定好阀值后，可以直接用table函数计算列联表，再根据之前的公式计算各个指标。假设我们认为prediciton>0.5的都预测为1，其余为0.

pred.class <- as.integer(ROCR.simple$predictions > 0.5)
print(cft <- table(pred.class, ROCR.simple$labels))

##           
## pred.class  0  1
##          0 91 14
##          1 16 79

通过对混淆矩阵中数值的计算可以得到：

tp <- cft[2, 2]
tn <- cft[1, 1]
fp <- cft[2, 1]
fn <- cft[1, 2]
print(accuracy <- (tp + tn)/(tp + tn + fp + fn))

## [1] 0.85

print(sensitivity <- tp/(tp + fn))

## [1] 0.8495

print(specificity <- tn/(tn + fp))

## [1] 0.8505

2 用confusionMatrix()算

如果不想手动算，可以借助caret包中的confusionMatrix函数计算。该函数既可以用混淆矩阵的结果，也可以直接输入预测/目标两列原始数据计算上述值。只要确定好positive分类是那个，就能得出跟之前一样的结果。

require(caret)
confusionMatrix(cft, positive = "1")

confusionMatrix(pred.class, ROCR.simple$labels, positive = "1")

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 91 14
##          1 16 79
##                                         
##                Accuracy : 0.85          
##                  95% CI : (0.793, 0.896)
##     No Information Rate : 0.535         
##     P-Value [Acc > NIR] : <2e-16        
##                                         
##                   Kappa : 0.699         
##  Mcnemar's Test P-Value : 0.855         
##                                         
##             Sensitivity : 0.849         
##             Specificity : 0.850         
##          Pos Pred Value : 0.832         
##          Neg Pred Value : 0.867         
##              Prevalence : 0.465         
##          Detection Rate : 0.395         
##    Detection Prevalence : 0.475         
##       Balanced Accuracy : 0.850         
##                                         
##        'Positive' Class : 1             
##