《统计学习方法》啃书辅助:附录B 牛顿法和拟牛顿法

B.1 牛顿法

【基础知识】梯度

【基础知识】浅谈「正定矩阵」和「半正定矩阵」 - Xinyu Chen 的文章 - 知乎

二阶泰勒展开和黑塞矩阵

若一元函数 f ( x ) f(x) f(x) 具有二阶连续导数,则 f ( x ) f(x) f(x) 在点 x 0 x_0 x0 处的二阶泰勒展开式如下(省略高阶无穷小):

f ( x ) = f ( x 0 ) + f ′ ( x 0 ) Δ x + 1 2 f ′ ′ ( x 0 ) ( Δ x ) 2 f(x) = f(x_0) + f'(x_0) \Delta x + \frac{1}{2} f''(x_0) (\Delta x)^2 f(x)=f(x0)+f(x0)Δx+21f(x0)(Δx)2

其中 Δ x = x − x 0 \Delta x=x-x_0 Δx=xx0

若二元函数 f ( x ( 1 ) , x ( 2 ) ) f(x^{(1)},x^{(2)}) f(x(1),x(2)) 具有二阶连续偏导数,则 f ( x ( 1 ) , x ( 2 ) ) f(x^{(1)},x^{(2)}) f(x(1),x(2)) 在点 ( x 0 ( 1 ) , x 0 ( 2 ) ) (x_0^{(1)},x_0^{(2)}) (x0(1),x0(2)) 处的二阶泰勒展开式如下(省略高阶无穷小):

《统计学习方法》啃书辅助:附录B 牛顿法和拟牛顿法_拟牛顿法

其中 Δ x ( 1 ) = x ( 1 ) − x 0 ( 1 ) \Delta x^{(1)} = x^{(1)} - x_0^{(1)} Δx(1)=x(1)x0(1) Δ x ( 2 ) = x ( 2 ) − x 0 ( 2 ) \Delta x^{(2)} = x^{(2)} - x_0^{(2)} Δx(2)=x(2)x0(2)。将上述展开式写成矩阵形式,有

《统计学习方法》啃书辅助:附录B 牛顿法和拟牛顿法_搜索_02

其中 [ ∂ f ∂ x ( 1 ) ∂ f ∂ x ( 2 ) ] \begin{bmatrix}\frac{\partial f}{\partial x^{(1)}} & \frac{\partial f}{\partial x^{(2)}}\end{bmatrix} [x(1)fx(2)f] f ( X ) f(X) f(X) 在点 X 0 X_0 X0 的梯度向量的转置; [ ∂ 2 f ∂ 2 x ( 1 ) ∂ 2 f ∂ x ( 1 ) ∂ x ( 2 ) ∂ 2 f ∂ x ( 2 ) ∂ x ( 1 ) ∂ 2 f ∂ 2 x ( 2 ) ] \begin{bmatrix}\frac{\partial^2 f}{\partial^2 x^{(1)}} & \frac{\partial^2 f}{\partial x^{(1)} \partial x^{(2)}} \\ \frac{\partial^2 f}{\partial x^{(2)} \partial x^{(1)}} & \frac{\partial^2 f}{\partial^2 x^{(2)}}\end{bmatrix} [2x(1)2fx(2)x(1)2fx(1)x(2)2f2x(2)2f] f ( x ) f(x) f(x) 的黑塞矩阵在点 X 0 X_0 X0 的值,记作 H ( X 0 ) H(X_0) H(X0)。于是上式可以写成:

f ( X ) = f ( X 0 ) + ∇ f ( X 0 ) T Δ X + 1 2 Δ X T H ( X 0 ) Δ X f(X) = f(X_0) + \nabla f(X_0) ^T \Delta X + \frac{1}{2} \Delta X^T H(X_0) \Delta X f(X)=f(X0)+f(X0)TΔX+21ΔXTH(X0)ΔX

以上结果可以推广到三元及三元以上的多元函数,多元函数的黑塞矩阵为:

H ( f ) = [ ∂ 2 f ∂ 2 x ( 1 ) ∂ 2 f ∂ x ( 1 ) ∂ x ( 2 ) ⋯ ∂ 2 f ∂ x ( 1 ) ∂ x ( n ) ∂ 2 f ∂ x ( 2 ) ∂ ( 1 ) ∂ 2 f ∂ 2 x ( 2 ) ⋯ ∂ 2 f ∂ x ( 2 ) ∂ x ( n ) ⋮ ⋮ ⋱ ⋮ ∂ 2 f ∂ x ( n ) ∂ x ( 1 ) ∂ 2 f ∂ x ( n ) ∂ x ( 2 ) ⋯ ∂ 2 f ∂ 2 x ( n ) ] H(f) = \begin{bmatrix} \frac{\partial^2 f}{\partial^2 x^{(1)}} & \frac{\partial^2 f}{\partial x^{(1)} \partial x^{(2)}} & \cdots & \frac{\partial^2 f}{\partial x^{(1)} \partial x^{(n)}} \\ \frac{\partial^2 f}{\partial x^{(2)} \partial^{(1)}} & \frac{\partial^2 f}{\partial^2 x^{(2)}} & \cdots & \frac{\partial^2 f}{\partial x^{(2)} \partial x^{(n)}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x^{(n)} \partial x^{(1)}} & \frac{\partial^2 f}{\partial x^{(n)} \partial x^{(2)}} & \cdots & \frac{\partial^2 f}{\partial^2 x^{(n)}} \end{bmatrix} H(f)=2x(1)2fx(2)(1)2fx(n)x(1)2fx(1)x(2)2f2x(2)2fx(n)x(2)2fx(1)x(n)2fx(2)x(n)2f2x(n)2f

黑塞矩阵的计算(Python+scipy 实现)

【定理】 (来自同济大学《高等数学》第七版下册 P. 70)

如果函数 z = f ( x , y ) z=f(x,y) z=f(x,y) 的两个二阶混合偏导数 ∂ 2 z ∂ x ∂ y \frac{\partial^2 z}{\partial x \partial y} xy2z ∂ 2 z ∂ y ∂ x \frac{\partial^2 z}{\partial y \partial x} yx2z 在区域 D 内连续,那么在该区域内这两个二阶混合偏导数必相等。

源码地址】code.newton_method.get_hessian

# https://github.com/ChangxingJiang/Data-Mining-HandBook/blob/master/code/newton_method/_get_hessian.py

from scipy.misc import derivative

def get_hessian(func, x0, dx=1e-6):
    """计算n元函数在某点的黑塞矩阵

    :param func: [function] n元函数
    :param x0: [list/tuple] 目标点的自变量坐标
    :param dx: [int/float] 计算时x的增量
    :return: [list] 黑塞矩阵
    """
    n_features = len(x0)
    ans = [[0] * n_features for _ in range(n_features)]
    for i in range(n_features):
        def f1(xi, x1):
            x2 = list(x1)
            x2[i] = xi
            return func(x2)

        for j in range(n_features):
            # 当x[j]=xj时,x[i]方向的斜率
            def f2(xj):
                x1 = list(x0)
                x1[j] = xj
                res = derivative(f1, x0=x1[i], dx=dx, args=(x1,))
                return res

            ans[i][j] = derivative(f2, x0[j], dx=dx)

    return ans

源码地址】测试

>>> from code.newton_method import get_hessian
>>> get_hessian(lambda x: (x[0] ** 3) * (x[1] ** 2) - 3 * x[0] * (x[1] ** 3) - x[0] * x[1] + 1, [0, 2])
[[0 ,  -37], [-37, 0]]
>>> get_hessian(lambda x: (x[0] ** 3) * (x[1] ** 2) - 3 * x[0] * (x[1] ** 3) - x[0] * x[1] + 1, [1, 1])
[[6 ,  -4], [-4, -16]]

计算逆矩阵(Python+numpy 实现)

源码地址】测试

>>> import numpy as np
>>> mat = np.matrix([[6, -4], [-4, -16]])
>>> mat ** -1
[[ 0.14285714 -0.03571429]
 [-0.03571429 -0.05357143]]

牛顿法(Python+numpy+scipy 实现)

源码地址】code.newton_method.newton_method

# https://github.com/ChangxingJiang/Data-Mining-HandBook/blob/master/code/newton_method/_newton_method.py

import numpy as np
from ._get_hessian import get_hessian  # code.newton_method.get_hessian
from ..gradient_descent import partial_derivative  # code.gradient_descent.partial_derivative

def newton_method(func, n_features, epsilon=1e-6, maximum=1000):
    """牛顿法

    :param func: [function] n元目标函数
    :param n_features: [int] 目标函数元数
    :param epsilon: [int/float] 学习精度
    :param maximum: [int] 最大学习次数
    :return: [list] 结果点坐标
    """
    x0 = [0] * n_features  # 取初始点x0

    for _ in range(maximum):
        # 计算梯度 gk
        nabla = partial_derivative(func, x0)
        gk = np.matrix([nabla])

        # 当梯度的模长小于精度要求时,停止迭代
        if pow(sum([nabla[i] ** 2 for i in range(n_features)]), 0.5) < epsilon:
            return x0

        # 计算黑塞矩阵
        hessian = np.matrix(get_hessian(func, x0))

        # 计算步长 pk
        pk = - (hessian ** -1) * gk.T

        # 迭代
        for j in range(n_features):
            x0[j] += float(pk[j][0])

源码地址】测试

>>> from code.newton_method import newton_method
>>> newton_method(lambda x: x[0] ** 2, 1, epsilon=1e-6)
[0]
>>> newton_method(lambda x: ((x[0] + 3) ** 2 + (x[1] + 4) ** 2) / 2, 2, epsilon=1e-6)
[-2.998150057576512, -3.997780069092481]

B.2 拟牛顿法的思路

【补充说明】B14 推导到 B.15 时,近似是因为忽略了二阶泰勒展开式。

【问题】B.16 的拟牛顿条件中为什么可以是 G k + 1 G_{k+1} Gk+1,而不是 G k G_k Gk

设经过 k + 1 k+1 k+1 次迭代后得到 x ( k + 1 ) x^{(k+1)} x(k+1),此时将目标函数 f ( x ) f(x) f(x) x ( k + 1 ) x^{(k+1)} x(k+1) 处做二阶泰勒展开,得(类似于 B.2):

f ( x ) = f ( x ( k + 1 ) ) + g ( k + 1 ) T ( x − x ( k + 1 ) ) + 1 2 ( x − x ( k + 1 ) ) T H ( x ( k + 1 ) ) ( x − x ( k + 1 ) ) f(x) = f(x^{(k+1)}) + g_{(k+1)}^T (x-x^{(k+1)}) + \frac{1}{2} (x-x^{(k+1)})^T H(x^{(k+1)}) (x-x^{(k+1)}) f(x)=f(x(k+1))+g(k+1)T(xx(k+1))+21(xx(k+1))TH(x(k+1))(xx(k+1))

根据二阶导数的定义,于是有(类似于 B.6)

∇ f ( x ) = g k + 1 + H k + 1 ( x − x ( k + 1 ) ) \nabla f(x) = g_{k+1} + H_{k+1} (x-x^{(k+1)}) f(x)=gk+1+Hk+1(xx(k+1))

在上式中取 x = x ( k ) x = x^{(k)} x=x(k) x ( k ) x^{(k)} x(k) 也在 x ( k + 1 ) x^{(k+1)} x(k+1) 的邻域内),即得(类似于 B.11)

g k − g k + 1 = H k + 1 ( x k − x ( k + 1 ) ) g_k - g_{k+1} = H_{k+1} (x_k-x^{(k+1)}) gkgk+1=Hk+1(xkx(k+1))

y k = g k + 1 − g k y_k=g_{k+1} - g_k yk=gk+1gk δ k = x ( k + 1 ) − x ( k ) \delta_k = x^{(k+1)}-x^{(k)} δk=x(k+1)x(k),则(类似于 B.13)

H k + 1 − 1 y k = δ k H_{k+1}^{-1} y_k = \delta_k Hk+11yk=δk

B.3 DFP 算法

【补充说明】正定对称的初始矩阵 G 0 G_0 G0,不妨取单位矩阵 I I I

【问题】DFP 算法和 BFGS 算法等为什么需要一维搜索?

当目标函数是二次函数时,因为二阶泰勒展开函数与原目标函数是完全相同的,所以从任一初始点出发,只需要一次迭代即可达到极小值点。因此,牛顿法是一种具有二次收敛性的算法。对于非二次函数,若函数的二次性态较强,或迭代点已进入极小点的邻域,则其收敛速度是很快的。

但是,由于迭代公式中没有步长因子,而是定步长迭代,对于非二次型目标函数,有时会使函数值上升,不能保证函数值稳定地下降,甚至可能造成无法收敛的情况。因此,人们提出了“阻尼牛顿法”,即每次迭代方向仍采用 p k p_k pk,但每次迭代需沿此方向一维搜索,寻求最优的步长因子 λ k \lambda_k λk

DFP 算法(Python 实现)

源码地址】code.newton_method.dfp_algorithm

# https://github.com/ChangxingJiang/Data-Mining-HandBook/blob/master/code/newton_method/_dfp_algorithm.py

import numpy as np
from ..gradient_descent import golden_section_for_line_search  # code.gradient_descent.golden_section_for_line_search
from ..gradient_descent import partial_derivative  # code.gradient_descent.partial_derivative

def dfp_algorithm(func, n_features, epsilon=1e-6, distance=3, maximum=1000):
    """DFP算法

    :param func: [function] n元目标函数
    :param n_features: [int] 目标函数元数
    :param epsilon: [int/float] 学习精度
    :param distance: [int/float] 每次一维搜索的长度范围(distance倍梯度的模)
    :param maximum: [int] 最大学习次数
    :return: [list] 结果点坐标
    """
    G0 = np.identity(n_features)  # 构造初始矩阵G0(单位矩阵)
    x0 = [0] * n_features  # 取初始点x0

    for _ in range(maximum):
        # 计算梯度 gk
        nabla = partial_derivative(func, x0)
        g0 = np.matrix([nabla]).T  # g0 = g_k

        # 当梯度的模长小于精度要求时,停止迭代
        if pow(sum([nabla[i] ** 2 for i in range(n_features)]), 0.5) < epsilon:
            return x0

        # 计算pk
        pk = - G0 * g0

        # 一维搜索求lambda_k
        def f(x):
            """pk 方向的一维函数"""
            x2 = [x0[j] + x * float(pk[j][0]) for j in range(n_features)]
            return func(x2)

        lk = golden_section_for_line_search(f, 0, distance, epsilon=1e-6)  # lk = lambda_k

        # 更新当前点坐标
        x1 = [x0[j] + lk * float(pk[j][0]) for j in range(n_features)]

        # 计算g_{k+1},若模长小于精度要求时,则停止迭代
        nabla = partial_derivative(func, x1)
        g1 = np.matrix([nabla]).T  # g0 = g_{k+1}

        # 当梯度的模长小于精度要求时,停止迭代
        if pow(sum([nabla[i] ** 2 for i in range(n_features)]), 0.5) < epsilon:
            return x1

        # 计算G_{k+1}
        yk = g1 - g0
        dk = np.matrix([[lk * float(pk[j][0]) for j in range(n_features)]]).T

        G1 = G0 + (dk * dk.T) / (dk.T * yk) + (G0 * yk * yk.T * G0) / (yk.T * G0 * yk)

        G0 = G1
        x0 = x1

源码地址】测试

>>> from code.newton_method import dfp_algorithm
>>> dfp_algorithm(lambda x: x[0] ** 2, 1, epsilon=1e-6)
[0]
>>> dfp_algorithm(lambda x: ((x[0] + 3) ** 2 + (x[1] + 4) ** 2) / 2, 2, epsilon=1e-6)
[-3.0000000003324554, -3.9999999998511546]

B.4 BFGS 算法

【补充说明】正定对称的初始矩阵 G 0 G_0 G0,不妨取单位矩阵 I I I

BFGS 算法(Python 实现)

源码地址】code.newton_method.bfgs_algorithm

# https://github.com/ChangxingJiang/Data-Mining-HandBook/blob/master/code/newton_method/_bfgs_algorithm.py

import numpy as np
from ..gradient_descent import golden_section_for_line_search  # code.gradient_descent.golden_section_for_line_search
from ..gradient_descent import partial_derivative  # code.gradient_descent.partial_derivative

def bfgs_algorithm(func, n_features, epsilon=1e-6, distance=3, maximum=1000):
    """BFGS算法

    :param func: [function] n元目标函数
    :param n_features: [int] 目标函数元数
    :param epsilon: [int/float] 学习精度
    :param distance: [int/float] 每次一维搜索的长度范围(distance倍梯度的模)
    :param maximum: [int] 最大学习次数
    :return: [list] 结果点坐标
    """
    B0 = np.identity(n_features)  # 构造初始矩阵G0(单位矩阵)
    x0 = [0] * n_features  # 取初始点x0

    for k in range(maximum):
        # 计算梯度 gk
        nabla = partial_derivative(func, x0)
        g0 = np.matrix([nabla]).T  # g0 = g_k

        # 当梯度的模长小于精度要求时,停止迭代
        if pow(sum([nabla[i] ** 2 for i in range(n_features)]), 0.5) < epsilon:
            return x0

        # 计算pk
        if k == 0:
            pk = - B0 * g0  # 若numpy计算逆矩阵时有0,则对应位置会变为inf
        else:
            pk = - (B0 ** -1) * g0

        # 一维搜索求lambda_k
        def f(x):
            """pk 方向的一维函数"""
            x2 = [x0[j] + x * float(pk[j][0]) for j in range(n_features)]
            return func(x2)

        lk = golden_section_for_line_search(f, 0, distance, epsilon=1e-6)  # lk = lambda_k

        # 更新当前点坐标
        x1 = [x0[j] + lk * float(pk[j][0]) for j in range(n_features)]

        # 计算g_{k+1},若模长小于精度要求时,则停止迭代
        nabla = partial_derivative(func, x1)
        g1 = np.matrix([nabla]).T  # g0 = g_{k+1}

        # 当梯度的模长小于精度要求时,停止迭代
        if pow(sum([nabla[i] ** 2 for i in range(n_features)]), 0.5) < epsilon:
            return x1

        # 计算G_{k+1}
        yk = g1 - g0
        dk = np.matrix([[lk * float(pk[j][0]) for j in range(n_features)]]).T

        B1 = B0 + (yk * yk.T) / (yk.T * dk) + (B0 * dk * dk.T * B0) / (dk.T * B0 * dk)

        B0 = B1
        x0 = x1

源码地址】测试

>>> from code.newton_method import bfgs_algorithm
>>> bfgs_algorithm(lambda x: x[0] ** 2, 1, epsilon=1e-6)
[0]
>>> bfgs_algorithm(lambda x: ((x[0] + 3) ** 2 + (x[1] + 4) ** 2) / 2, 2, epsilon=1e-6)
[-3.0000000003324554, -3.9999999998511546]

BFGS 算法(Sherman-Morrison 公式)(Python 实现)

源码地址】code.newton_method.bfgs_algorithm_with_sherman_morrison

# https://github.com/ChangxingJiang/Data-Mining-HandBook/blob/master/code/newton_method/_bfgs_algorithm_with_sherman_morrison.py

import numpy as np
from ..gradient_descent import golden_section_for_line_search  # code.gradient_descent.golden_section_for_line_search
from ..gradient_descent import partial_derivative  # code.gradient_descent.partial_derivative

def bfgs_algorithm_with_sherman_morrison(func, n_features, epsilon=1e-6, distance=3, maximum=1000):
    """BFGS算法(Sherman-Morrison公式)

    :param func: [function] n元目标函数
    :param n_features: [int] 目标函数元数
    :param epsilon: [int/float] 学习精度
    :param distance: [int/float] 每次一维搜索的长度范围(distance倍梯度的模)
    :param maximum: [int] 最大学习次数
    :return: [list] 结果点坐标
    """
    I = np.identity(n_features)
    D0 = np.identity(n_features)  # 构造初始矩阵G0(单位矩阵)
    x0 = [0] * n_features  # 取初始点x0

    for _ in range(maximum):
        # 计算梯度 gk
        nabla = partial_derivative(func, x0)
        g0 = np.matrix([nabla]).T  # g0 = g_k

        # 当梯度的模长小于精度要求时,停止迭代
        if pow(sum([nabla[i] ** 2 for i in range(n_features)]), 0.5) < epsilon:
            return x0

        # 计算pk
        pk = - D0 * g0

        # 一维搜索求lambda_k
        def f(x):
            """pk 方向的一维函数"""
            x2 = [x0[j] + x * float(pk[j][0]) for j in range(n_features)]
            return func(x2)

        lk = golden_section_for_line_search(f, 0, distance, epsilon=1e-6)  # lk = lambda_k

        # 更新当前点坐标
        x1 = [x0[j] + lk * float(pk[j][0]) for j in range(n_features)]

        # 计算g_{k+1},若模长小于精度要求时,则停止迭代
        nabla = partial_derivative(func, x1)
        g1 = np.matrix([nabla]).T  # g0 = g_{k+1}

        # 当梯度的模长小于精度要求时,停止迭代
        if pow(sum([nabla[i] ** 2 for i in range(n_features)]), 0.5) < epsilon:
            return x1

        # 计算G_{k+1}
        yk = g1 - g0
        dk = np.matrix([[lk * float(pk[j][0]) for j in range(n_features)]]).T

        D1 = D0 + (I - (dk * yk.T) / (yk.T * dk)) * D0 * (I - (yk * dk.T) / (yk.T * dk)) + (dk * dk.T) / (yk.T * dk)

        D0 = D1
        x0 = x1

源码地址】测试

>>> from code.newton_method import bfgs_algorithm_with_sherman_morrison
>>> bfgs_algorithm_with_sherman_morrison(lambda x: x[0] ** 2, 1, epsilon=1e-6)
[0]
>>> bfgs_algorithm_with_sherman_morrison(lambda x: ((x[0] + 3) ** 2 + (x[1] + 4) ** 2) / 2, 2, epsilon=1e-6)
[-3.0000000000105342, -4.000000000014043]

B.5 Broyden 类算法

源码地址】code.newton_method.broyden_algorithm

# https://github.com/ChangxingJiang/Data-Mining-HandBook/blob/master/code/newton_method/_broyden_algorithm.py

import numpy as np
from ..gradient_descent import golden_section_for_line_search  # code.gradient_descent.golden_section_for_line_search
from ..gradient_descent import partial_derivative  # code.gradient_descent.partial_derivative

def broyden_algorithm(func, n_features, alpha=0.5, epsilon=1e-6, distance=3, maximum=1000):
    """Broyden算法

    :param func: [function] n元目标函数
    :param n_features: [int] 目标函数元数
    :param alpha: [float] Broyden算法的alpha参数
    :param epsilon: [int/float] 学习精度
    :param distance: [int/float] 每次一维搜索的长度范围(distance倍梯度的模)
    :param maximum: [int] 最大学习次数
    :return: [list] 结果点坐标
    """
    G0 = np.identity(n_features)  # 构造初始矩阵G0(单位矩阵)
    I = np.identity(n_features)
    x0 = [0] * n_features  # 取初始点x0

    for _ in range(maximum):
        # 计算梯度 gk
        nabla = partial_derivative(func, x0)
        g0 = np.matrix([nabla]).T  # g0 = g_k

        # 当梯度的模长小于精度要求时,停止迭代
        if pow(sum([nabla[i] ** 2 for i in range(n_features)]), 0.5) < epsilon:
            return x0

        # 计算pk
        pk = - G0 * g0

        # 一维搜索求lambda_k
        def f(x):
            """pk 方向的一维函数"""
            x2 = [x0[j] + x * float(pk[j][0]) for j in range(n_features)]
            return func(x2)

        lk = golden_section_for_line_search(f, 0, distance, epsilon=1e-6)  # lk = lambda_k

        # 更新当前点坐标
        x1 = [x0[j] + lk * float(pk[j][0]) for j in range(n_features)]

        # 计算g_{k+1},若模长小于精度要求时,则停止迭代
        nabla = partial_derivative(func, x1)
        g1 = np.matrix([nabla]).T  # g0 = g_{k+1}

        # 当梯度的模长小于精度要求时,停止迭代
        if pow(sum([nabla[i] ** 2 for i in range(n_features)]), 0.5) < epsilon:
            return x1

        # 计算G_{k+1}
        yk = g1 - g0
        dk = np.matrix([[lk * float(pk[j][0]) for j in range(n_features)]]).T

        G_DFP = G0 + (dk * dk.T) / (dk.T * yk) + (G0 * yk * yk.T * G0) / (yk.T * G0 * yk)
        G_BFGS = G0 + (I - (dk * yk.T) / (yk.T * dk)) * G0 * (I - (yk * dk.T) / (yk.T * dk)) + (dk * dk.T) / (yk.T * dk)

        G0 = alpha * G_DFP + (1 - alpha) * G_BFGS
        x0 = x1

源码地址】测试

>>> from code.newton_method import broyden_algorithm
>>> broyden_algorithm(lambda x: x[0] ** 2, 1, epsilon=1e-6)
[0]
>>> broyden_algorithm(lambda x: ((x[0] + 3) ** 2 + (x[1] + 4) ** 2) / 2, 2, epsilon=1e-6)
[-3.0000000003324554, -3.9999999998511546]