首先要知道加法模型和指数损失函数。
加法模型
f ( x ) = ∑ m = 1 M α m G m ( x ) f(x) = \sum\limits_{m=1}^{M}\alpha_mG_{m}(x) f(x)=m=1∑MαmGm(x)
加法模型是一个加和模型,每一列训练一个分类器 G m ( x ) G_{m}(x) Gm(x),并且基于这个分类器的误差,得到这个分类器的权重 α m \alpha_m αm。
指数损失函数
L ( y , f ( x ) ) = exp [ − y f ( x ) ] L(y,f(x)) = \exp[-yf(x)] L(y,f(x))=exp[−yf(x)]
对于分类模型而言,上述损失函数,在分类正确的时候,指数部分为负数;在分类错误的时候,指数部分为正数,符合损失函数的意义。
由加法模型的定义:
f
(
x
)
=
∑
m
=
1
M
α
m
G
m
(
x
)
f(x) = \sum\limits_{m=1}^{M}\alpha_mG_{m}(x)
f(x)=m=1∑MαmGm(x)
得到迭代公式:
f
m
(
x
)
=
f
m
−
1
(
x
)
+
α
m
G
m
(
x
)
f_m(x) = f_{m-1}(x) + \alpha_mG_{m}(x)
fm(x)=fm−1(x)+αmGm(x)
此时认为函数
f
m
−
1
(
x
)
f_{m-1}(x)
fm−1(x) 是已知的,每一轮迭代求的是分类器
G
m
(
x
)
G_{m}(x)
Gm(x) 和这个分类器的权重
α
m
\alpha_m
αm。将上式代入损失函数得:
L
(
y
,
f
(
x
)
)
=
∑
i
=
1
N
exp
[
−
y
i
(
f
m
−
1
(
x
)
+
α
m
G
m
(
x
)
)
]
L(y,f(x)) = \sum_{i=1}^{N}\exp[-y_i(f_{m-1}(x) + \alpha_mG_{m}(x))]
L(y,f(x))=i=1∑Nexp[−yi(fm−1(x)+αmGm(x))]
把指数加法因子展开,变成指数的乘积,得到:
L
(
y
,
f
(
x
)
)
=
∑
i
=
1
N
[
exp
[
−
y
i
(
f
m
−
1
(
x
)
]
[
exp
(
−
y
i
α
m
G
m
(
x
)
)
]
]
L(y,f(x)) = \sum_{i=1}^{N}[\exp[-y_i(f_{m-1}(x)][\exp(-y_i\alpha_mG_{m}(x))]]
L(y,f(x))=i=1∑N[exp[−yi(fm−1(x)][exp(−yiαmGm(x))]]
由于
y
i
y_i
yi 和
f
m
−
1
(
x
)
f_{m-1}(x)
fm−1(x) 已知,可以令
w
‾
m
i
=
exp
[
−
y
i
f
m
−
1
(
x
i
)
]
\overline w_{mi} =\exp[-y_if_{m-1}(x_i)]
wmi=exp[−yifm−1(xi)],于是
L
(
y
,
f
(
x
)
)
=
∑
i
=
1
N
w
‾
m
i
exp
(
−
y
i
α
m
G
m
(
x
)
)
L(y,f(x)) = \sum_{i=1}^{N}\overline w_{mi} \exp(-y_i\alpha_mG_{m}(x))
L(y,f(x))=i=1∑Nwmiexp(−yiαmGm(x))
于是分类器
G
m
(
x
)
G_{m}(x)
Gm(x) 和这个分类器的权重
α
m
\alpha_m
αm 可以表示成:
(
α
m
,
G
m
(
x
)
)
=
a
r
g
  
m
i
n
  
⎵
α
,
G
∑
i
=
1
N
w
‾
m
i
exp
(
−
y
i
α
m
G
m
(
x
)
)
(\alpha_m,G_{m}(x)) = \underbrace{arg\;min\;}_{\alpha,G} \sum_{i=1}^{N}\overline w_{mi} \exp(-y_i\alpha_mG_{m}(x))
(αm,Gm(x))=α,G
argmini=1∑Nwmiexp(−yiαmGm(x))
先求
G
m
(
x
)
G_{m}(x)
Gm(x),看上面的式子,分类器的权重
α
m
\alpha_m
αm 可以认为是一个确定的数,
G
m
(
x
)
G_{m}(x)
Gm(x) 是使得分错的(带权重的)样本里损失函数最小的那个,可以写成:
G
m
∗
(
x
)
=
a
r
g
  
m
i
n
  
⎵
G
∑
i
=
1
N
w
‾
m
i
I
(
y
i
≠
G
(
x
i
)
)
G_{m}^*(x) = \underbrace{arg\;min\;}_{G} \sum_{i=1}^{N}\overline w_{mi} I(y_i \neq G(x_i))
Gm∗(x)=G
argmini=1∑NwmiI(yi̸=G(xi))
注意:重点理解上面两个式子的等价性。得到
G
m
∗
(
x
)
G_{m}^{*}(x)
Gm∗(x) 以后,再求
α
m
∗
\alpha_m^*
αm∗。还是看损失函数,可以写成:
L
(
y
,
f
(
x
)
)
=
∑
i
=
1
N
w
‾
m
i
exp
(
−
y
i
α
m
G
m
(
x
)
)
=
∑
y
i
=
G
m
(
x
i
)
w
‾
m
i
e
−
α
+
∑
y
i
≠
G
m
(
x
i
)
w
‾
m
i
e
α
=
∑
y
i
=
G
m
(
x
i
)
w
‾
m
i
e
−
α
+
∑
y
i
≠
G
m
(
x
i
)
w
‾
m
i
e
−
α
−
∑
y
i
≠
G
m
(
x
i
)
w
‾
m
i
e
−
α
+
∑
y
i
≠
G
m
(
x
i
)
w
‾
m
i
e
α
=
e
−
α
∑
i
=
1
N
w
‾
m
i
+
(
e
α
−
e
−
α
)
∑
y
i
≠
G
m
(
x
i
)
w
‾
m
i
=
e
−
α
∑
i
=
1
N
w
‾
m
i
+
(
e
α
−
e
−
α
)
∑
i
=
1
N
w
‾
m
i
I
(
y
i
≠
G
m
(
x
i
)
)
\begin {aligned} L(y,f(x)) &= \sum_{i=1}^{N}\overline w_{mi} \exp(-y_i\alpha_mG_{m}(x)) \\ &=\sum_{y_i = G_m(x_i)}\overline w_{mi} e^{-\alpha} + \sum_{y_i \neq G_m(x_i)}\overline w_{mi} e^{\alpha} \\ &=\sum_{y_i = G_m(x_i)}\overline w_{mi} e^{-\alpha} + \sum_{y_i \neq G_m(x_i)}\overline w_{mi} e^{-\alpha} - \sum_{y_i \neq G_m(x_i)}\overline w_{mi} e^{-\alpha} + \sum_{y_i \neq G_m(x_i)}\overline w_{mi} e^{\alpha} \\ &= e^{-\alpha}\sum_{i=1}^{N}\overline w_{mi} + (e^{\alpha} - e^{-\alpha})\sum_{y_i \neq G_m(x_i)}\overline w_{mi} \\ &= e^{-\alpha}\sum_{i=1}^{N}\overline w_{mi} + (e^{\alpha} - e^{-\alpha})\sum_{i=1}^N \overline w_{mi} I(y_i \neq G_m(x_i)) \end {aligned}
L(y,f(x))=i=1∑Nwmiexp(−yiαmGm(x))=yi=Gm(xi)∑wmie−α+yi̸=Gm(xi)∑wmieα=yi=Gm(xi)∑wmie−α+yi̸=Gm(xi)∑wmie−α−yi̸=Gm(xi)∑wmie−α+yi̸=Gm(xi)∑wmieα=e−αi=1∑Nwmi+(eα−e−α)yi̸=Gm(xi)∑wmi=e−αi=1∑Nwmi+(eα−e−α)i=1∑NwmiI(yi̸=Gm(xi))
把上式对
α
\alpha
α 求导,再令导函数为
0
0
0,得:
−
e
−
α
∑
i
=
1
N
w
‾
m
i
+
(
e
α
+
e
−
α
)
∑
i
=
1
N
w
‾
m
i
I
(
y
i
≠
G
m
(
x
i
)
)
=
0
-e^{-\alpha}\sum_{i=1}^{N}\overline w_{mi} + (e^{\alpha} + e^{-\alpha})\sum_{i=1}^N \overline w_{mi} I(y_i \neq G_m(x_i)) = 0
−e−αi=1∑Nwmi+(eα+e−α)i=1∑NwmiI(yi̸=Gm(xi))=0
上式两边同时除以
∑
i
=
1
N
w
‾
m
i
\sum_{i=1}^{N}\overline w_{mi}
∑i=1Nwmi,得:
−
e
−
α
+
(
e
α
+
e
−
α
)
∑
i
=
1
N
w
‾
m
i
I
(
y
i
≠
G
m
(
x
i
)
)
∑
i
=
1
N
w
‾
m
i
=
0
-e^{-\alpha} + (e^{\alpha} + e^{-\alpha})\cfrac{\sum_{i=1}^N \overline w_{mi} I(y_i \neq G_m(x_i))}{\sum_{i=1}^{N}\overline w_{mi}} = 0
−e−α+(eα+e−α)∑i=1Nwmi∑i=1NwmiI(yi̸=Gm(xi))=0
令
∑
i
=
1
N
w
‾
m
i
I
(
y
i
≠
G
m
(
x
i
)
)
∑
i
=
1
N
w
‾
m
i
=
e
m
\cfrac{\sum_{i=1}^N \overline w_{mi} I(y_i \neq G_m(x_i))}{\sum_{i=1}^{N}\overline w_{mi}} = e_m
∑i=1Nwmi∑i=1NwmiI(yi̸=Gm(xi))=em,则有:
−
e
−
α
+
(
e
α
+
e
−
α
)
e
m
=
0
(
e
α
+
e
−
α
)
e
m
=
e
−
α
(
e
2
α
+
1
)
e
m
=
1
e
2
α
+
1
=
1
e
m
e
2
α
=
1
e
m
−
1
=
1
−
e
m
e
m
2
α
=
log
1
−
e
m
e
m
α
=
1
2
log
1
−
e
m
e
m
\begin {aligned} -e^{-\alpha} + (e^{\alpha} + e^{-\alpha})e_m &= 0 \\ (e^{\alpha} + e^{-\alpha})e_m &= e^{-\alpha}\\ (e^{2\alpha} + 1)e_m &= 1 \\ e^{2\alpha} + 1 &= \cfrac{1}{e_m} \\ e^{2\alpha} &= \cfrac{1}{e_m} -1 = \cfrac{1 - e_m}{e_m} \\ 2\alpha &= \log\cfrac{1 - e_m}{e_m} \\ \alpha &= \cfrac{1}{2}\log\cfrac{1 - e_m}{e_m} \end {aligned}
−e−α+(eα+e−α)em(eα+e−α)em(e2α+1)eme2α+1e2α2αα=0=e−α=1=em1=em1−1=em1−em=logem1−em=21logem1−em
于是得到使得损失函数最小的
α
m
∗
=
1
2
log
1
−
e
m
e
m
\alpha_m^* = \cfrac{1}{2}\log\cfrac{1 - e_m}{e_m}
αm∗=21logem1−em
这里
e
m
=
∑
i
=
1
N
w
‾
m
i
I
(
y
i
≠
G
m
(
x
i
)
)
∑
i
=
1
N
w
‾
m
i
=
∑
i
=
1
N
w
m
i
I
(
y
i
≠
G
m
(
x
i
)
)
e_m =\cfrac{\sum_{i=1}^N \overline w_{mi} I(y_i \neq G_m(x_i))}{\sum_{i=1}^{N}\overline w_{mi}} =\sum_{i=1}^N w_{mi} I(y_i \neq G_m(x_i))
em=∑i=1Nwmi∑i=1NwmiI(yi̸=Gm(xi))=i=1∑NwmiI(yi̸=Gm(xi))
权重更新公式
那么权重如何更新呢?已知:
f
m
(
x
)
=
f
m
−
1
(
x
)
+
α
m
G
m
(
x
)
f_m(x) = f_{m-1}(x) + \alpha_mG_{m}(x)
fm(x)=fm−1(x)+αmGm(x)
和我们之前的定义:
w
‾
m
i
=
exp
[
−
y
i
f
m
−
1
(
x
i
)
]
\overline w_{mi} =\exp[-y_if_{m-1}(x_i)]
wmi=exp[−yifm−1(xi)]
于是就有:
w
‾
m
+
1
,
i
=
exp
[
−
y
i
f
m
(
x
i
)
]
=
exp
[
−
y
i
[
f
m
−
1
(
x
i
)
+
α
m
G
m
(
x
i
)
]
]
=
exp
[
−
y
i
f
m
−
1
(
x
i
)
]
exp
[
−
y
i
α
m
G
m
(
x
i
)
]
=
w
‾
m
,
i
exp
[
−
y
i
α
m
G
m
(
x
i
)
]
\begin {aligned} \overline w_{m+1,i} &=\exp[-y_if_{m}(x_i)] \\ &=\exp[-y_i[f_{m-1}(x_i) + \alpha_mG_{m}(x_i)]]\\ &=\exp[-y_if_{m-1}(x_i)]\exp[-y_i\alpha_mG_{m}(x_i)]\\ &=\overline w_{m,i}\exp[-y_i\alpha_mG_{m}(x_i)] \end {aligned}
wm+1,i=exp[−yifm(xi)]=exp[−yi[fm−1(xi)+αmGm(xi)]]=exp[−yifm−1(xi)]exp[−yiαmGm(xi)]=wm,iexp[−yiαmGm(xi)]