B.1 牛顿法
【基础知识】梯度
【基础知识】浅谈「正定矩阵」和「半正定矩阵」 - Xinyu Chen 的文章 - 知乎
二阶泰勒展开和黑塞矩阵
若一元函数 f ( x ) f(x) f(x) 具有二阶连续导数,则 f ( x ) f(x) f(x) 在点 x 0 x_0 x0 处的二阶泰勒展开式如下(省略高阶无穷小):
f ( x ) = f ( x 0 ) + f ′ ( x 0 ) Δ x + 1 2 f ′ ′ ( x 0 ) ( Δ x ) 2 f(x) = f(x_0) + f'(x_0) \Delta x + \frac{1}{2} f''(x_0) (\Delta x)^2 f(x)=f(x0)+f′(x0)Δx+21f′′(x0)(Δx)2
其中 Δ x = x − x 0 \Delta x=x-x_0 Δx=x−x0。
若二元函数 f ( x ( 1 ) , x ( 2 ) ) f(x^{(1)},x^{(2)}) f(x(1),x(2)) 具有二阶连续偏导数,则 f ( x ( 1 ) , x ( 2 ) ) f(x^{(1)},x^{(2)}) f(x(1),x(2)) 在点 ( x 0 ( 1 ) , x 0 ( 2 ) ) (x_0^{(1)},x_0^{(2)}) (x0(1),x0(2)) 处的二阶泰勒展开式如下(省略高阶无穷小):
其中 Δ x ( 1 ) = x ( 1 ) − x 0 ( 1 ) \Delta x^{(1)} = x^{(1)} - x_0^{(1)} Δx(1)=x(1)−x0(1), Δ x ( 2 ) = x ( 2 ) − x 0 ( 2 ) \Delta x^{(2)} = x^{(2)} - x_0^{(2)} Δx(2)=x(2)−x0(2)。将上述展开式写成矩阵形式,有
其中 [ ∂ f ∂ x ( 1 ) ∂ f ∂ x ( 2 ) ] \begin{bmatrix}\frac{\partial f}{\partial x^{(1)}} & \frac{\partial f}{\partial x^{(2)}}\end{bmatrix} [∂x(1)∂f∂x(2)∂f] 是 f ( X ) f(X) f(X) 在点 X 0 X_0 X0 的梯度向量的转置; [ ∂ 2 f ∂ 2 x ( 1 ) ∂ 2 f ∂ x ( 1 ) ∂ x ( 2 ) ∂ 2 f ∂ x ( 2 ) ∂ x ( 1 ) ∂ 2 f ∂ 2 x ( 2 ) ] \begin{bmatrix}\frac{\partial^2 f}{\partial^2 x^{(1)}} & \frac{\partial^2 f}{\partial x^{(1)} \partial x^{(2)}} \\ \frac{\partial^2 f}{\partial x^{(2)} \partial x^{(1)}} & \frac{\partial^2 f}{\partial^2 x^{(2)}}\end{bmatrix} [∂2x(1)∂2f∂x(2)∂x(1)∂2f∂x(1)∂x(2)∂2f∂2x(2)∂2f] 是 f ( x ) f(x) f(x) 的黑塞矩阵在点 X 0 X_0 X0 的值,记作 H ( X 0 ) H(X_0) H(X0)。于是上式可以写成:
f ( X ) = f ( X 0 ) + ∇ f ( X 0 ) T Δ X + 1 2 Δ X T H ( X 0 ) Δ X f(X) = f(X_0) + \nabla f(X_0) ^T \Delta X + \frac{1}{2} \Delta X^T H(X_0) \Delta X f(X)=f(X0)+∇f(X0)TΔX+21ΔXTH(X0)ΔX
以上结果可以推广到三元及三元以上的多元函数,多元函数的黑塞矩阵为:
H ( f ) = [ ∂ 2 f ∂ 2 x ( 1 ) ∂ 2 f ∂ x ( 1 ) ∂ x ( 2 ) ⋯ ∂ 2 f ∂ x ( 1 ) ∂ x ( n ) ∂ 2 f ∂ x ( 2 ) ∂ ( 1 ) ∂ 2 f ∂ 2 x ( 2 ) ⋯ ∂ 2 f ∂ x ( 2 ) ∂ x ( n ) ⋮ ⋮ ⋱ ⋮ ∂ 2 f ∂ x ( n ) ∂ x ( 1 ) ∂ 2 f ∂ x ( n ) ∂ x ( 2 ) ⋯ ∂ 2 f ∂ 2 x ( n ) ] H(f) = \begin{bmatrix} \frac{\partial^2 f}{\partial^2 x^{(1)}} & \frac{\partial^2 f}{\partial x^{(1)} \partial x^{(2)}} & \cdots & \frac{\partial^2 f}{\partial x^{(1)} \partial x^{(n)}} \\ \frac{\partial^2 f}{\partial x^{(2)} \partial^{(1)}} & \frac{\partial^2 f}{\partial^2 x^{(2)}} & \cdots & \frac{\partial^2 f}{\partial x^{(2)} \partial x^{(n)}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x^{(n)} \partial x^{(1)}} & \frac{\partial^2 f}{\partial x^{(n)} \partial x^{(2)}} & \cdots & \frac{\partial^2 f}{\partial^2 x^{(n)}} \end{bmatrix} H(f)=⎣⎢⎢⎢⎢⎡∂2x(1)∂2f∂x(2)∂(1)∂2f⋮∂x(n)∂x(1)∂2f∂x(1)∂x(2)∂2f∂2x(2)∂2f⋮∂x(n)∂x(2)∂2f⋯⋯⋱⋯∂x(1)∂x(n)∂2f∂x(2)∂x(n)∂2f⋮∂2x(n)∂2f⎦⎥⎥⎥⎥⎤
黑塞矩阵的计算(Python+scipy 实现)
【定理】 (来自同济大学《高等数学》第七版下册 P. 70)
如果函数 z = f ( x , y ) z=f(x,y) z=f(x,y) 的两个二阶混合偏导数 ∂ 2 z ∂ x ∂ y \frac{\partial^2 z}{\partial x \partial y} ∂x∂y∂2z 及 ∂ 2 z ∂ y ∂ x \frac{\partial^2 z}{\partial y \partial x} ∂y∂x∂2z 在区域 D 内连续,那么在该区域内这两个二阶混合偏导数必相等。
【源码地址】code.newton_method.get_hessian
# https://github.com/ChangxingJiang/Data-Mining-HandBook/blob/master/code/newton_method/_get_hessian.py
from scipy.misc import derivative
def get_hessian(func, x0, dx=1e-6):
"""计算n元函数在某点的黑塞矩阵
:param func: [function] n元函数
:param x0: [list/tuple] 目标点的自变量坐标
:param dx: [int/float] 计算时x的增量
:return: [list] 黑塞矩阵
"""
n_features = len(x0)
ans = [[0] * n_features for _ in range(n_features)]
for i in range(n_features):
def f1(xi, x1):
x2 = list(x1)
x2[i] = xi
return func(x2)
for j in range(n_features):
# 当x[j]=xj时,x[i]方向的斜率
def f2(xj):
x1 = list(x0)
x1[j] = xj
res = derivative(f1, x0=x1[i], dx=dx, args=(x1,))
return res
ans[i][j] = derivative(f2, x0[j], dx=dx)
return ans
【源码地址】测试
>>> from code.newton_method import get_hessian
>>> get_hessian(lambda x: (x[0] ** 3) * (x[1] ** 2) - 3 * x[0] * (x[1] ** 3) - x[0] * x[1] + 1, [0, 2])
[[0 , -37], [-37, 0]]
>>> get_hessian(lambda x: (x[0] ** 3) * (x[1] ** 2) - 3 * x[0] * (x[1] ** 3) - x[0] * x[1] + 1, [1, 1])
[[6 , -4], [-4, -16]]
计算逆矩阵(Python+numpy 实现)
【源码地址】测试
>>> import numpy as np
>>> mat = np.matrix([[6, -4], [-4, -16]])
>>> mat ** -1
[[ 0.14285714 -0.03571429]
[-0.03571429 -0.05357143]]
牛顿法(Python+numpy+scipy 实现)
【源码地址】code.newton_method.newton_method
# https://github.com/ChangxingJiang/Data-Mining-HandBook/blob/master/code/newton_method/_newton_method.py
import numpy as np
from ._get_hessian import get_hessian # code.newton_method.get_hessian
from ..gradient_descent import partial_derivative # code.gradient_descent.partial_derivative
def newton_method(func, n_features, epsilon=1e-6, maximum=1000):
"""牛顿法
:param func: [function] n元目标函数
:param n_features: [int] 目标函数元数
:param epsilon: [int/float] 学习精度
:param maximum: [int] 最大学习次数
:return: [list] 结果点坐标
"""
x0 = [0] * n_features # 取初始点x0
for _ in range(maximum):
# 计算梯度 gk
nabla = partial_derivative(func, x0)
gk = np.matrix([nabla])
# 当梯度的模长小于精度要求时,停止迭代
if pow(sum([nabla[i] ** 2 for i in range(n_features)]), 0.5) < epsilon:
return x0
# 计算黑塞矩阵
hessian = np.matrix(get_hessian(func, x0))
# 计算步长 pk
pk = - (hessian ** -1) * gk.T
# 迭代
for j in range(n_features):
x0[j] += float(pk[j][0])
【源码地址】测试
>>> from code.newton_method import newton_method
>>> newton_method(lambda x: x[0] ** 2, 1, epsilon=1e-6)
[0]
>>> newton_method(lambda x: ((x[0] + 3) ** 2 + (x[1] + 4) ** 2) / 2, 2, epsilon=1e-6)
[-2.998150057576512, -3.997780069092481]
B.2 拟牛顿法的思路
【补充说明】B14 推导到 B.15 时,近似是因为忽略了二阶泰勒展开式。
【问题】B.16 的拟牛顿条件中为什么可以是 G k + 1 G_{k+1} Gk+1,而不是 G k G_k Gk?
设经过 k + 1 k+1 k+1 次迭代后得到 x ( k + 1 ) x^{(k+1)} x(k+1),此时将目标函数 f ( x ) f(x) f(x) 在 x ( k + 1 ) x^{(k+1)} x(k+1) 处做二阶泰勒展开,得(类似于 B.2):
f ( x ) = f ( x ( k + 1 ) ) + g ( k + 1 ) T ( x − x ( k + 1 ) ) + 1 2 ( x − x ( k + 1 ) ) T H ( x ( k + 1 ) ) ( x − x ( k + 1 ) ) f(x) = f(x^{(k+1)}) + g_{(k+1)}^T (x-x^{(k+1)}) + \frac{1}{2} (x-x^{(k+1)})^T H(x^{(k+1)}) (x-x^{(k+1)}) f(x)=f(x(k+1))+g(k+1)T(x−x(k+1))+21(x−x(k+1))TH(x(k+1))(x−x(k+1))
根据二阶导数的定义,于是有(类似于 B.6)
∇ f ( x ) = g k + 1 + H k + 1 ( x − x ( k + 1 ) ) \nabla f(x) = g_{k+1} + H_{k+1} (x-x^{(k+1)}) ∇f(x)=gk+1+Hk+1(x−x(k+1))
在上式中取 x = x ( k ) x = x^{(k)} x=x(k)( x ( k ) x^{(k)} x(k) 也在 x ( k + 1 ) x^{(k+1)} x(k+1) 的邻域内),即得(类似于 B.11)
g k − g k + 1 = H k + 1 ( x k − x ( k + 1 ) ) g_k - g_{k+1} = H_{k+1} (x_k-x^{(k+1)}) gk−gk+1=Hk+1(xk−x(k+1))
记 y k = g k + 1 − g k y_k=g_{k+1} - g_k yk=gk+1−gk, δ k = x ( k + 1 ) − x ( k ) \delta_k = x^{(k+1)}-x^{(k)} δk=x(k+1)−x(k),则(类似于 B.13)
H k + 1 − 1 y k = δ k H_{k+1}^{-1} y_k = \delta_k Hk+1−1yk=δk
B.3 DFP 算法
【补充说明】正定对称的初始矩阵 G 0 G_0 G0,不妨取单位矩阵 I I I。
【问题】DFP 算法和 BFGS 算法等为什么需要一维搜索?
当目标函数是二次函数时,因为二阶泰勒展开函数与原目标函数是完全相同的,所以从任一初始点出发,只需要一次迭代即可达到极小值点。因此,牛顿法是一种具有二次收敛性的算法。对于非二次函数,若函数的二次性态较强,或迭代点已进入极小点的邻域,则其收敛速度是很快的。
但是,由于迭代公式中没有步长因子,而是定步长迭代,对于非二次型目标函数,有时会使函数值上升,不能保证函数值稳定地下降,甚至可能造成无法收敛的情况。因此,人们提出了“阻尼牛顿法”,即每次迭代方向仍采用 p k p_k pk,但每次迭代需沿此方向一维搜索,寻求最优的步长因子 λ k \lambda_k λk。
DFP 算法(Python 实现)
【源码地址】code.newton_method.dfp_algorithm
# https://github.com/ChangxingJiang/Data-Mining-HandBook/blob/master/code/newton_method/_dfp_algorithm.py
import numpy as np
from ..gradient_descent import golden_section_for_line_search # code.gradient_descent.golden_section_for_line_search
from ..gradient_descent import partial_derivative # code.gradient_descent.partial_derivative
def dfp_algorithm(func, n_features, epsilon=1e-6, distance=3, maximum=1000):
"""DFP算法
:param func: [function] n元目标函数
:param n_features: [int] 目标函数元数
:param epsilon: [int/float] 学习精度
:param distance: [int/float] 每次一维搜索的长度范围(distance倍梯度的模)
:param maximum: [int] 最大学习次数
:return: [list] 结果点坐标
"""
G0 = np.identity(n_features) # 构造初始矩阵G0(单位矩阵)
x0 = [0] * n_features # 取初始点x0
for _ in range(maximum):
# 计算梯度 gk
nabla = partial_derivative(func, x0)
g0 = np.matrix([nabla]).T # g0 = g_k
# 当梯度的模长小于精度要求时,停止迭代
if pow(sum([nabla[i] ** 2 for i in range(n_features)]), 0.5) < epsilon:
return x0
# 计算pk
pk = - G0 * g0
# 一维搜索求lambda_k
def f(x):
"""pk 方向的一维函数"""
x2 = [x0[j] + x * float(pk[j][0]) for j in range(n_features)]
return func(x2)
lk = golden_section_for_line_search(f, 0, distance, epsilon=1e-6) # lk = lambda_k
# 更新当前点坐标
x1 = [x0[j] + lk * float(pk[j][0]) for j in range(n_features)]
# 计算g_{k+1},若模长小于精度要求时,则停止迭代
nabla = partial_derivative(func, x1)
g1 = np.matrix([nabla]).T # g0 = g_{k+1}
# 当梯度的模长小于精度要求时,停止迭代
if pow(sum([nabla[i] ** 2 for i in range(n_features)]), 0.5) < epsilon:
return x1
# 计算G_{k+1}
yk = g1 - g0
dk = np.matrix([[lk * float(pk[j][0]) for j in range(n_features)]]).T
G1 = G0 + (dk * dk.T) / (dk.T * yk) + (G0 * yk * yk.T * G0) / (yk.T * G0 * yk)
G0 = G1
x0 = x1
【源码地址】测试
>>> from code.newton_method import dfp_algorithm
>>> dfp_algorithm(lambda x: x[0] ** 2, 1, epsilon=1e-6)
[0]
>>> dfp_algorithm(lambda x: ((x[0] + 3) ** 2 + (x[1] + 4) ** 2) / 2, 2, epsilon=1e-6)
[-3.0000000003324554, -3.9999999998511546]
B.4 BFGS 算法
【补充说明】正定对称的初始矩阵 G 0 G_0 G0,不妨取单位矩阵 I I I。
BFGS 算法(Python 实现)
【源码地址】code.newton_method.bfgs_algorithm
# https://github.com/ChangxingJiang/Data-Mining-HandBook/blob/master/code/newton_method/_bfgs_algorithm.py
import numpy as np
from ..gradient_descent import golden_section_for_line_search # code.gradient_descent.golden_section_for_line_search
from ..gradient_descent import partial_derivative # code.gradient_descent.partial_derivative
def bfgs_algorithm(func, n_features, epsilon=1e-6, distance=3, maximum=1000):
"""BFGS算法
:param func: [function] n元目标函数
:param n_features: [int] 目标函数元数
:param epsilon: [int/float] 学习精度
:param distance: [int/float] 每次一维搜索的长度范围(distance倍梯度的模)
:param maximum: [int] 最大学习次数
:return: [list] 结果点坐标
"""
B0 = np.identity(n_features) # 构造初始矩阵G0(单位矩阵)
x0 = [0] * n_features # 取初始点x0
for k in range(maximum):
# 计算梯度 gk
nabla = partial_derivative(func, x0)
g0 = np.matrix([nabla]).T # g0 = g_k
# 当梯度的模长小于精度要求时,停止迭代
if pow(sum([nabla[i] ** 2 for i in range(n_features)]), 0.5) < epsilon:
return x0
# 计算pk
if k == 0:
pk = - B0 * g0 # 若numpy计算逆矩阵时有0,则对应位置会变为inf
else:
pk = - (B0 ** -1) * g0
# 一维搜索求lambda_k
def f(x):
"""pk 方向的一维函数"""
x2 = [x0[j] + x * float(pk[j][0]) for j in range(n_features)]
return func(x2)
lk = golden_section_for_line_search(f, 0, distance, epsilon=1e-6) # lk = lambda_k
# 更新当前点坐标
x1 = [x0[j] + lk * float(pk[j][0]) for j in range(n_features)]
# 计算g_{k+1},若模长小于精度要求时,则停止迭代
nabla = partial_derivative(func, x1)
g1 = np.matrix([nabla]).T # g0 = g_{k+1}
# 当梯度的模长小于精度要求时,停止迭代
if pow(sum([nabla[i] ** 2 for i in range(n_features)]), 0.5) < epsilon:
return x1
# 计算G_{k+1}
yk = g1 - g0
dk = np.matrix([[lk * float(pk[j][0]) for j in range(n_features)]]).T
B1 = B0 + (yk * yk.T) / (yk.T * dk) + (B0 * dk * dk.T * B0) / (dk.T * B0 * dk)
B0 = B1
x0 = x1
【源码地址】测试
>>> from code.newton_method import bfgs_algorithm
>>> bfgs_algorithm(lambda x: x[0] ** 2, 1, epsilon=1e-6)
[0]
>>> bfgs_algorithm(lambda x: ((x[0] + 3) ** 2 + (x[1] + 4) ** 2) / 2, 2, epsilon=1e-6)
[-3.0000000003324554, -3.9999999998511546]
BFGS 算法(Sherman-Morrison 公式)(Python 实现)
【源码地址】code.newton_method.bfgs_algorithm_with_sherman_morrison
# https://github.com/ChangxingJiang/Data-Mining-HandBook/blob/master/code/newton_method/_bfgs_algorithm_with_sherman_morrison.py
import numpy as np
from ..gradient_descent import golden_section_for_line_search # code.gradient_descent.golden_section_for_line_search
from ..gradient_descent import partial_derivative # code.gradient_descent.partial_derivative
def bfgs_algorithm_with_sherman_morrison(func, n_features, epsilon=1e-6, distance=3, maximum=1000):
"""BFGS算法(Sherman-Morrison公式)
:param func: [function] n元目标函数
:param n_features: [int] 目标函数元数
:param epsilon: [int/float] 学习精度
:param distance: [int/float] 每次一维搜索的长度范围(distance倍梯度的模)
:param maximum: [int] 最大学习次数
:return: [list] 结果点坐标
"""
I = np.identity(n_features)
D0 = np.identity(n_features) # 构造初始矩阵G0(单位矩阵)
x0 = [0] * n_features # 取初始点x0
for _ in range(maximum):
# 计算梯度 gk
nabla = partial_derivative(func, x0)
g0 = np.matrix([nabla]).T # g0 = g_k
# 当梯度的模长小于精度要求时,停止迭代
if pow(sum([nabla[i] ** 2 for i in range(n_features)]), 0.5) < epsilon:
return x0
# 计算pk
pk = - D0 * g0
# 一维搜索求lambda_k
def f(x):
"""pk 方向的一维函数"""
x2 = [x0[j] + x * float(pk[j][0]) for j in range(n_features)]
return func(x2)
lk = golden_section_for_line_search(f, 0, distance, epsilon=1e-6) # lk = lambda_k
# 更新当前点坐标
x1 = [x0[j] + lk * float(pk[j][0]) for j in range(n_features)]
# 计算g_{k+1},若模长小于精度要求时,则停止迭代
nabla = partial_derivative(func, x1)
g1 = np.matrix([nabla]).T # g0 = g_{k+1}
# 当梯度的模长小于精度要求时,停止迭代
if pow(sum([nabla[i] ** 2 for i in range(n_features)]), 0.5) < epsilon:
return x1
# 计算G_{k+1}
yk = g1 - g0
dk = np.matrix([[lk * float(pk[j][0]) for j in range(n_features)]]).T
D1 = D0 + (I - (dk * yk.T) / (yk.T * dk)) * D0 * (I - (yk * dk.T) / (yk.T * dk)) + (dk * dk.T) / (yk.T * dk)
D0 = D1
x0 = x1
【源码地址】测试
>>> from code.newton_method import bfgs_algorithm_with_sherman_morrison
>>> bfgs_algorithm_with_sherman_morrison(lambda x: x[0] ** 2, 1, epsilon=1e-6)
[0]
>>> bfgs_algorithm_with_sherman_morrison(lambda x: ((x[0] + 3) ** 2 + (x[1] + 4) ** 2) / 2, 2, epsilon=1e-6)
[-3.0000000000105342, -4.000000000014043]
B.5 Broyden 类算法
【源码地址】code.newton_method.broyden_algorithm
# https://github.com/ChangxingJiang/Data-Mining-HandBook/blob/master/code/newton_method/_broyden_algorithm.py
import numpy as np
from ..gradient_descent import golden_section_for_line_search # code.gradient_descent.golden_section_for_line_search
from ..gradient_descent import partial_derivative # code.gradient_descent.partial_derivative
def broyden_algorithm(func, n_features, alpha=0.5, epsilon=1e-6, distance=3, maximum=1000):
"""Broyden算法
:param func: [function] n元目标函数
:param n_features: [int] 目标函数元数
:param alpha: [float] Broyden算法的alpha参数
:param epsilon: [int/float] 学习精度
:param distance: [int/float] 每次一维搜索的长度范围(distance倍梯度的模)
:param maximum: [int] 最大学习次数
:return: [list] 结果点坐标
"""
G0 = np.identity(n_features) # 构造初始矩阵G0(单位矩阵)
I = np.identity(n_features)
x0 = [0] * n_features # 取初始点x0
for _ in range(maximum):
# 计算梯度 gk
nabla = partial_derivative(func, x0)
g0 = np.matrix([nabla]).T # g0 = g_k
# 当梯度的模长小于精度要求时,停止迭代
if pow(sum([nabla[i] ** 2 for i in range(n_features)]), 0.5) < epsilon:
return x0
# 计算pk
pk = - G0 * g0
# 一维搜索求lambda_k
def f(x):
"""pk 方向的一维函数"""
x2 = [x0[j] + x * float(pk[j][0]) for j in range(n_features)]
return func(x2)
lk = golden_section_for_line_search(f, 0, distance, epsilon=1e-6) # lk = lambda_k
# 更新当前点坐标
x1 = [x0[j] + lk * float(pk[j][0]) for j in range(n_features)]
# 计算g_{k+1},若模长小于精度要求时,则停止迭代
nabla = partial_derivative(func, x1)
g1 = np.matrix([nabla]).T # g0 = g_{k+1}
# 当梯度的模长小于精度要求时,停止迭代
if pow(sum([nabla[i] ** 2 for i in range(n_features)]), 0.5) < epsilon:
return x1
# 计算G_{k+1}
yk = g1 - g0
dk = np.matrix([[lk * float(pk[j][0]) for j in range(n_features)]]).T
G_DFP = G0 + (dk * dk.T) / (dk.T * yk) + (G0 * yk * yk.T * G0) / (yk.T * G0 * yk)
G_BFGS = G0 + (I - (dk * yk.T) / (yk.T * dk)) * G0 * (I - (yk * dk.T) / (yk.T * dk)) + (dk * dk.T) / (yk.T * dk)
G0 = alpha * G_DFP + (1 - alpha) * G_BFGS
x0 = x1
【源码地址】测试
>>> from code.newton_method import broyden_algorithm
>>> broyden_algorithm(lambda x: x[0] ** 2, 1, epsilon=1e-6)
[0]
>>> broyden_algorithm(lambda x: ((x[0] + 3) ** 2 + (x[1] + 4) ** 2) / 2, 2, epsilon=1e-6)
[-3.0000000003324554, -3.9999999998511546]