R中的逻辑回归实现

R可以很容易地拟合逻辑回归模型。要调用的函数是glm()，拟合过程与线性回归中使用的过程没有太大差别。在这篇文章中，我将拟合一个二元逻辑回归模型并解释每一步。

``training.data.raw < - read.csv（'train.csv'，header = T，na.strings = c（“”））``

``````

PassengerId    Survived      Pclass        Name         Sex
0           0           0           0           0
Age       SibSp       Parch      Ticket        Fare
177           0           0           0           0
Cabin    Embarked
687           2

length(unique(x)))

PassengerId    Survived      Pclass        Name         Sex
891           2           3         891           2
Age       SibSp       Parch      Ticket        Fare
89           7           7         681         248
Cabin    Embarked
148           4``````

``data < - subset（training.data.raw，select = c（2,3,5,6,7,8,10,12））``

``````data\$ Age [is.na（data \$ Age）] < - mean（data\$ Age，na.rm = T）

``````

``````model <- glm(Survived ~.,family=binomial(link='logit'),data=train)
``````

``````
Deviance Residuals:
Min       1Q   Median       3Q      Max
-2.6064  -0.5954  -0.4254   0.6220   2.4165
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)  5.137627   0.594998   8.635  < 2e-16 ***
Pclass      -1.087156   0.151168  -7.192 6.40e-13 ***
Sexmale     -2.756819   0.212026 -13.002  < 2e-16 ***
Age         -0.037267   0.008195  -4.547 5.43e-06 ***
SibSp       -0.292920   0.114642  -2.555   0.0106 *
Parch       -0.116576   0.128127  -0.910   0.3629
Fare         0.001528   0.002353   0.649   0.5160
EmbarkedQ   -0.002656   0.400882  -0.007   0.9947
EmbarkedS   -0.318786   0.252960  -1.260   0.2076
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
``````

``````Analysis of Deviance Table
Model: binomial, link: logit
Response: Survived
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev  Pr(>Chi)
NULL                       799    1065.39
Pclass    1   83.607       798     981.79 < 2.2e-16 ***
Sex       1  240.014       797     741.77 < 2.2e-16 ***
Age       1   17.495       796     724.28 2.881e-05 ***
SibSp     1   10.842       795     713.43  0.000992 ***
Parch     1    0.863       794     712.57  0.352873
Fare      1    0.994       793     711.58  0.318717
Embarked  2    2.187       791     709.39  0.334990    ``````

`````` fitting.results < - ifelse（fitted.results> 0.5,1,0）
misClasificError < - mean（fitted.results！= test \$ Survived``````

ROC是通过在各种阈值设置下将真阳性率（TPR）与假阳性率（FPR）作图而产生的曲线，而AUC是ROC曲线下的面积。作为一个经验法则，具有良好预测能力的模型应该接近于1。