py LogisticRegression 结果为数组 python logisticregression参数

转载

davisl 2024-03-19 01:33:22

文章标签 迭代牛顿法损失函数 文章分类 云原生云计算

写在开头：

这篇文章主要是为了整合我在学习并用python实现logistic regression回归模型的参数估计时的一系列知识点。文章内干货较多，然而对原理方面的介绍较少，希望理解原理的话请移步wiki。

batch gradient descent: https://en.wikipedia.org/wiki/Gradient_descent

logistic regression: https://en.wikipedia.org/wiki/Logistic_regression

Newton's method: https://en.wikipedia.org/wiki/Newton%27s_method_in_optimization

1. Logistic regression 介绍

Logistic regression用来预测输入数据的判断结果，其分类函数为sigmoid函数：

py LogisticRegression 结果为数组 python logisticregression参数_损失函数

模型输出为：

py LogisticRegression 结果为数组 python logisticregression参数_损失函数_02

求最优参数的最大似然函数可以表示为：

py LogisticRegression 结果为数组 python logisticregression参数_损失函数_03

损失函数为：

py LogisticRegression 结果为数组 python logisticregression参数_牛顿法_04

py LogisticRegression 结果为数组 python logisticregression参数_迭代_05

其中第二项是正则项，对过大的参数施加惩罚

也可以用向量表示为：

py LogisticRegression 结果为数组 python logisticregression参数_迭代_06

py LogisticRegression 结果为数组 python logisticregression参数_牛顿法_07

其中：

py LogisticRegression 结果为数组 python logisticregression参数_迭代_08

PS：

在这里，共n个样本，m个属性。

x表示一个样本， py LogisticRegression 结果为数组 python logisticregression参数_牛顿法_09 表示第 i 个样本的第 j 个属性；

x和θ均为列向量；

X为样本x的集合，每个样本x为X中的一个行向量，即X是一个n×m矩阵。

2．梯度下降法

梯度下降法是一个求函数局部极小值的方法。它沿着函数当前点梯度下降最快的方向（函数当前点对应梯度的反方向）一定步长的距离点进行迭代搜索，从而得到损失函数的全局最小值时的参数θ。

暂不考虑正则化，对损失函数J(θ)求偏导，得到：

py LogisticRegression 结果为数组 python logisticregression参数_损失函数_10

则梯度下降法的迭代形式为：

Repeat{

py LogisticRegression 结果为数组 python logisticregression参数_牛顿法_11

}

其中迭代式也可以用向量表示为：

py LogisticRegression 结果为数组 python logisticregression参数_损失函数_12

加上正则项后表示为

py LogisticRegression 结果为数组 python logisticregression参数_牛顿法_13

具体过程如下：

1）首先对θ赋初始值，这个值可以是随机的，也可以是一个零向量；

2）改变θ的值，使得J(θ)按梯度下降的方向进行减少；

3）当J(θ)下降到足够小时，算法结束。

3. 牛顿法

牛顿法的最优化算法是一种迭代方法，通常用来优化寻找目标函数的极大值或极小值，它每一步的迭代方向都是沿着当前点函数值下降的方向。牛顿法的每次迭代，在现有极小点估计值的附近对f(x)做二阶泰勒展开，进而找到极小点的下一个估计值。

我们以一维空间为例，此时的目标函数为f(x)

设xk为现有极小点估计值，在xk处对f(x)进行二阶泰勒展开，则有：

py LogisticRegression 结果为数组 python logisticregression参数_迭代_14

我们希望得到极小点，对上式求导并令其为0，则有：

py LogisticRegression 结果为数组 python logisticregression参数_牛顿法_15

即：

py LogisticRegression 结果为数组 python logisticregression参数_牛顿法_16

产生序列{xn}来逼近f(x)的极小值点。在一定条件下f(x)可以收敛到极小值点。

这就是牛顿法的基本更新公式。在logistic regression中用此式对参数θ py LogisticRegression 结果为数组 python logisticregression参数_牛顿法_17 进行迭代更新。当推广到多维的情况下，

py LogisticRegression 结果为数组 python logisticregression参数_牛顿法_18

为f的梯度向量，

py LogisticRegression 结果为数组 python logisticregression参数_牛顿法_19

为f的Hessian矩阵。

暂不考虑正则化，对损失函数J(θ)求偏导:

py LogisticRegression 结果为数组 python logisticregression参数_牛顿法_20

可推得二阶导数公式：

py LogisticRegression 结果为数组 python logisticregression参数_损失函数_21

将正则化项考虑在内，完整的牛顿法的迭代过程表示为：

Repeat{

py LogisticRegression 结果为数组 python logisticregression参数_迭代_22

}

具体过程如下：

1）首先对θ赋初始值，这个值可以是随机的，也可以是一个零向量；

2）迭代

py LogisticRegression 结果为数组 python logisticregression参数_牛顿法_23

改变θ的值，使得J(θ)不断逼近极小值；

3）当J(θ)下降到足够小时，算法结束。

4. python实现

# 批梯度下降
def batch_gradient_descent(X, y, reg_lambda=math.exp(-8), step_size=0.5, max_iter_count=10000):
    (n, m) = X.shape
    w = np.zeros((m,))
    for i in range(max_iter_count):
        z = sigmoid(np.dot(X, w))
        w = (1 - reg_lambda) * w - step_size / n * np.dot(X.transpose(), z - y)
    return w

# 牛顿法
def newton_method(X, y, reg_lambda=math.exp(-1), max_iter_count=100):
    (n, m) = X.shape
    w = np.zeros((m,))
    for i in range(max_iter_count):
        temp = sigmoid(X.dot(w))
        gradient = X.T.dot(temp - y)
        A = np.eye(n)
        for j in range(n):
            h = sigmoid(X[j].dot(w))
            A[j, j] = h * (1 - h) + 0.0001
        Hessian = X.T.dot(A).dot(X)
        delta_theta = np.linalg.solve(Hessian, gradient) + reg_lambda * w
        # newton's method parameter update
        w = w - delta_theta
    return w

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。