逻辑回归案例

  • 问题阐述
  • (1)引入 以及 加载数据集
  • (2)构建 Sigmoid 函数
  • (3)构建损失函数
  • (4)构建梯度下降
  • (5)训练,查看损失
  • (6)绘图展示所有点
  • (7)计算精确度


引言 上述博文中,将每个部分都做了分开的阐述,包含简历逻辑回归函数,计算逻辑回归的损失函数,计算梯度下降,防止过拟合做正则化,以下是各个博文的链接,诸位遇到问题可以及时查阅:

【机器学习】P5 Sigmoid 函数、逻辑回归 与 决策边界:
【机器学习】P6 逻辑回归的 损失函数 以及 梯度下降:
【机器学习】P7 Scikit Learn 实现线性回归以及逻辑回归:
【机器学习】P8 过拟合与欠拟合、正则化与正则化后的损失函数和梯度下降:


正文

问题阐述

问题阐述:构建逻辑回归模型来预测学生是否被大学录取,我们通过两次录取考试成绩来确定是否录取;

(1)引入 以及 加载数据集

请复制完整数据集,请勿缺失数据导致无法使用数据集;

import math
import numpy as np
import matplotlib.pyplot as plt

# X_train 为每个学生的两次考试成绩

X_train = np.array([[34.62365962 ,78.02469282], [30.28671077 ,43.89499752], [35.84740877 ,72.90219803], [60.18259939 ,86.3085521 ], [79.03273605 ,75.34437644], [45.08327748 ,56.31637178], [61.10666454 ,96.51142588], [75.02474557 ,46.55401354], [76.0987867  ,87.42056972], [84.43281996 ,43.53339331], [95.86155507 ,38.22527806], [75.01365839 ,30.60326323], [82.30705337 ,76.4819633 ], [69.36458876 ,97.71869196], [39.53833914 ,76.03681085], [53.97105215 ,89.20735014], [69.07014406 ,52.74046973], [67.94685548 ,46.67857411], [70.66150955 ,92.92713789], [76.97878373 ,47.57596365], [67.37202755 ,42.83843832], [89.67677575 ,65.79936593], [50.53478829 ,48.85581153], [34.21206098 ,44.2095286 ], [77.92409145 ,68.97235999], [62.27101367 ,69.95445795], [80.19018075 ,44.82162893], [93.1143888  ,38.80067034], [61.83020602 ,50.25610789], [38.7858038  ,64.99568096], [61.37928945 ,72.80788731], [85.40451939 ,57.05198398], [52.10797973 ,63.12762377], [52.04540477 ,69.43286012], [40.23689374 ,71.16774802], [54.63510555 ,52.21388588], [33.91550011 ,98.86943574], [64.17698887 ,80.90806059], [74.78925296 ,41.57341523], [34.18364003 ,75.23772034], [83.90239366 ,56.30804622], [51.54772027 ,46.85629026], [94.44336777 ,65.56892161], [82.36875376 ,40.61825516], [51.04775177 ,45.82270146], [62.22267576 ,52.06099195], [77.19303493 ,70.4582], [97.77159928 ,86.72782233], [62.0730638  ,96.76882412], [91.5649745  ,88.69629255], [79.94481794 ,74.16311935], [99.27252693 ,60.999031  ], [90.54671411 ,43.39060181], [34.52451385 ,60.39634246], [50.28649612 ,49.80453881], [49.58667722 ,59.80895099], [97.64563396 ,68.86157272], [32.57720017 ,95.59854761], [74.24869137 ,69.82457123], [71.79646206 ,78.45356225], [75.39561147 ,85.75993667], [35.28611282 ,47.02051395], [56.2538175  ,39.26147251], [30.05882245 ,49.59297387], [44.66826172 ,66.45008615], [66.56089447 ,41.09209808], [40.45755098 ,97.53518549], [49.07256322 ,51.88321182], [80.27957401 ,92.11606081], [66.74671857 ,60.99139403], [32.72283304 ,43.30717306], [64.03932042 ,78.03168802], [72.34649423 ,96.22759297], [60.45788574 ,73.0949981 ], [58.84095622 ,75.85844831], [99.8278578  ,72.36925193], [47.26426911 ,88.475865  ], [50.4581598  ,75.80985953], [60.45555629 ,42.50840944], [82.22666158 ,42.71987854], [88.91389642 ,69.8037889 ], [94.83450672 ,45.6943068 ], [67.31925747 ,66.58935318], [57.23870632 ,59.51428198], [80.366756   ,90.9601479 ], [68.46852179 ,85.5943071 ], [42.07545454 ,78.844786  ], [75.47770201 ,90.424539  ], [78.63542435 ,96.64742717], [52.34800399 ,60.76950526], [94.09433113 ,77.15910509], [90.44855097 ,87.50879176], [55.48216114 ,35.57070347], [74.49269242 ,84.84513685], [89.84580671 ,45.35828361], [83.48916274 ,48.3802858 ], [42.26170081 ,87.10385094], [99.31500881 ,68.77540947], [55.34001756 ,64.93193801], [74.775893   ,89.5298129 ]])

# y_train 为标签,记录着是否被录取
y_train = np.array([0. ,0. ,0. ,1. ,1. ,0. ,1. ,1. ,1. ,1. ,0. ,0. ,1. ,1. ,0. ,1. ,1. ,0. ,1. ,1. ,0. ,1. ,0. ,0. ,1. ,1. ,1. ,0. ,0. ,0. ,1. ,1. ,0. ,1. ,0. ,0. ,0. ,1. ,0. ,0. ,1. ,0. ,1. ,0. ,0. ,0. ,1. ,1. ,1. ,1. ,1. ,1. ,1. ,0. ,0. ,0. ,1. ,0. ,1. ,1. ,1. ,0. ,0. ,0. ,0. ,0. ,1. ,0. ,1. ,1. ,0. ,1. ,1. ,1. ,1. ,1. ,1. ,1. ,0. ,0. ,1. ,1. ,1. ,1. ,1. ,1. ,0. ,1. ,1. ,0. ,1. ,1. ,0. ,1. ,1. ,1. ,1. ,1. ,1. ,1.])

(2)构建 Sigmoid 函数

逻辑回归的目标是预测一个二元变量的概率,例如正样本和负样本的概率,而不是一个连续值。该概率是由一个 线性函数 和一个 sigmoid函数 组成:

线性函数为:
rauc 回归_损失函数

sigmoid 函数为:
rauc 回归_机器学习_02

def sigmoid(z):
	g = 1/(1 + np.exp(-z))
	return g

(3)构建损失函数

逻辑回归是一种用于二分类问题的监督学习算法,其损失函数采用交叉熵(Cross-Entropy)损失函数;

rauc 回归_算法_03 函数公式为:
rauc 回归_机器学习_04

rauc 回归_算法_05 函数公式为:
rauc 回归_算法_06

def compute_cost(X,y,w,b):
	m,n = X.shape
	cost = 0.
	for i in range(m):
		f_wb_i = sigmoid(np.dot(w,X[i]) + b)
		cost += -y[i] * np.log(f_wb_i) - (1 - y[i]) * np.log(1 - f_wb_i)
	total_loss = cost / m
	return total_loss

(4)构建梯度下降

梯度下降总公式:
rauc 回归_损失函数_07

关键部分 gradient 分解:
rauc 回归_损失函数_08

代码实现关键部分 gradient :

def compute_gradient(X,y,w,b):
	m,n = X.shape
	dj_dw = np.zeros(w.shape)
	dj_db = 0.
	
	for i in range(m):
		f_wb = sigmoid(np.dot(w,X[i]) + b)
		cost = f_wb - y[i]
		
		dj_db += cost
		dj_dw += cost * X[i]
	
	dj_dw = dj_dw / m
	dj_db = dj_db / m
	return dj_db, dj_dw

代码实现总部分 gradient_descent :

def gradient_descent(X,y,w_in,b_in,cost_function,gradient_function,alpha,num_iters):

    m = len(X)
  
    J_history = []		
    w_history = []		
    
    for i in range(num_iters):
        dj_db, dj_dw = gradient_function(X, y, w_in, b_in)   

        w_in = w_in - alpha * dj_dw     # alpha 为学习率          
        b_in = b_in - alpha * dj_db              
       
        if i<100000:
            cost =  cost_function(X, y, w_in, b_in)
            J_history.append(cost)

        if i% math.ceil(num_iters/10) == 0 or i == (num_iters-1):
            w_history.append(w_in)
            print(f"Iteration {i:4}: Cost {float(J_history[-1]):8.2f}   ")
        
    return w_in, b_in, J_history, w_history

(5)训练,查看损失

np.random.seed(1)
initial_w = 0.01 * (np.random.rand(2) - 0.5)
initial_b = -8

# 设定循环次数以及学习率值
iterations = 10000
alpha = 0.001

w,b, J_history,_ = gradient_descent(X_train ,y_train, initial_w, initial_b,compute_cost, compute_gradient, alpha, iterations)

上半部分实现的是对100名学生的两次成绩以及其是否录取的结果数据进行建模,使用梯度下降方法减小损失值,我们将其结果通过 plt 展示出来:

(6)绘图展示所有点

for i in range(X_train.shape[0]):
    if y_train[i] == 1.:
        plt.scatter(X_train[i,0],X_train[i,1],marker="x",c='g')
    else:
        plt.scatter(X_train[i, 0], X_train[i, 1], marker=".", c='r')
plt.ylabel('Exam 2 score')
plt.xlabel('Exam 1 score')

# plt.legend(loc="upper right")
plt.show()

(7)计算精确度

if rauc 回归_机器学习_09, predict rauc 回归_逻辑回归_10

if rauc 回归_逻辑回归_11, predict rauc 回归_rauc 回归_12

根据训练好的模型做出预测方法 predict

# 做出预测
def predict(X, w, b): 

    m, n = X.shape   
    p = np.zeros(m)

    for i in range(m):   
  
        f_wb = sigmoid(np.dot(w,X[i])+b)
        p[i] = f_wb > 0.5
       
    return p

根据预估值与实际值计算精确度:

# Accuracy
np.random.seed(1)
tmp_w = np.random.randn(2)
tmp_b = 0.3    
tmp_X = np.random.randn(4, 2) - 0.5

tmp_p = predict(tmp_X, tmp_w, tmp_b)
print(f'Output of predict: shape {tmp_p.shape}, value {tmp_p}')
       
p = predict(X_train, w,b)
print('Train Accuracy: %f'%(np.mean(p == y_train) * 100))

rauc 回归_机器学习_13