近端梯度和Nesterov加速近端梯度以及DMM算法是十分经典的优化算法,本文首先对原算法进行了讲解推导,然后进行了编程实现。
目录
- 1、Lasso问题
- 2、PG算法
- 3、APG算法
- 4、ADMM算法
- 5、程序
- 6、结果
- 6.1 实现细节
- 6.2 优化结果
- 7.3 总结讨论
1、Lasso问题
令,则
是光滑的,而
在x=0处不光滑,
是凸函数。
当易求得
当则,
2、PG算法
Proximal gradient算法
Initialize:
Iterate:
for k = 1,2,…,
if ,then stop.
otherwise,
end
3、APG算法
Accelerated proximal gradient算法
Initialize:
Iterate:
for k = 1,2,…,
if ,then stop.
otherwise,
end
4、ADMM算法
Alternating Direction Method of Multipliers算法
令
得:
将lasso问题等价变形为:
Repeat:
for k = 1,2,3…,end
5、程序
def lasso_value(matrixA,vectorb,lamb,x):
return np.dot((np.dot(matrixA,x)-vectorb).T,(np.dot(matrixA,x)-vectorb))
def proximalGradiant(x_k,matrixA,vectorb,lamb,maxiter,epsilon):
value_list = []
eigenvalue,featurevector=np.linalg.eig(np.dot(matrixA.T,matrixA))
l = np.max(eigenvalue)
for i in range(maxiter):
value_old = lasso_value(matrixA,vectorb,lamb,x_k)
nabla_h = np.dot(matrixA.T,(np.dot(matrixA,x_k)-vectorb))
temp_x = x_k-nabla_h/l
temp = np.zeros(temp_x.shape)
temp = np.concatenate((np.abs(temp_x)-lamb/l,temp),axis=1)
x_k = np.sign(temp_x)*np.max(temp,axis=1).reshape(temp_x.shape)
value_new = lasso_value(matrixA,vectorb,lamb,x_k)
# print(value_new)
if i%1==0:
print("the %dth iter 's value= %f"%(i*1,value_new[0][0]))
# break
value_list.append(value_new)
if (abs(value_old[0][0]-value_new[0][0]))<=epsilon:
return x_k,value_new,value_list
return x_k,value_new,value_list
# proximalGradiant(x_k,matrixA,vectorb,lamb,100000,0.01)
def APG(x_k,matrixA,vectorb,lamb,maxiter,epsilon):
x_kminus1 = x_k
eigenvalue,featurevector=np.linalg.eig(np.dot(matrixA.T,matrixA))
l = np.max(eigenvalue)
# print(l[0][0])
value_list = []
for i in range(1,maxiter+1):
value_old = lasso_value(matrixA,vectorb,lamb,x_k)
x_k_hat =x_k + (i-2)/(i+1)*(x_k-x_kminus1)
# print(np.linalg.norm(x_k-x_kminus1,ord=2))
x_kminus1 = x_k
nabla_h = np.dot(matrixA.T,(np.dot(matrixA,x_k_hat)-vectorb))
temp_x = x_k_hat-nabla_h/l
temp = np.zeros(temp_x.shape)
temp = np.concatenate((np.abs(temp_x)-lamb/l,temp),axis=1)
x_k = np.sign(temp_x)*np.max(temp,axis=1).reshape(temp_x.shape)
value_new = lasso_value(matrixA,vectorb,lamb,x_k)
if i%1==0:
print("the %dth iter 's value= %f"%(i*1,value_new[0][0]))
# break
value_list.append(value_new)
if (abs(value_old[0][0]-value_new[0][0]))<=epsilon:
return x_k,value_new,value_list
return x_k,value_new,value_list
# APG(x_k,matrixA,vectorb,lamb,10000,0.01)
z_k = x_k
w_k = np.random.rand(matrixA.shape[1],1)
def ADMM(x_k,matrixA,vectorb,lamb,maxiter,epsilon):
value_list = []
eigenvalue,featurevector=np.linalg.eig(np.dot(matrixA.T,matrixA))
l = np.max(eigenvalue)
z_k = x_k
w_k = np.random.rand(matrixA.shape[1],1)
for i in range(1,maxiter+1):
value_old = lasso_value(matrixA,vectorb,lamb,x_k)
pho_i=np.identity(matrixA.shape[1])*l
x_k = np.dot(np.linalg.inv(np.dot(matrixA.T,matrixA)+pho_i),(np.dot(matrixA.T,vectorb)+l*(z_k-w_k)))
temp = np.zeros(z_k.shape)
temp = np.concatenate((np.abs(x_k+w_k)-lamb/l,temp),axis=1)
z_k = np.sign(x_k+w_k)*np.max(temp,axis=1).reshape(z_k.shape)
w_k = w_k+x_k-z_k
value_new = lasso_value(matrixA,vectorb,lamb,x_k)
if i%1==0:
print("the %dth iter 's value= %f"%(i*1,value_new[0][0]))
# break
value_list.append(value_new)
if (abs(value_old[0][0]-value_new[0][0]))<=epsilon:
return x_k,value_new,value_list
return x_k,value_new,value_list
# ADMM(x_k,matrixA,vectorb,lamb,10000,0.01)
import time
import pickle
def test():
start = time.time()
apg_x_k,apg_value_new,apg_value_list = APG(x_k,matrixA,vectorb,lamb,10000,0.01)
end = time.time()
print("APG:%f"%(end-start))
with open('params.pkl','wb') as f:
pickle.dump((apg_x_k,apg_value_new,apg_value_list), f)
start = time.time()
pg_x_k,pg_value_new,pg_value_list = proximalGradiant(x_k,matrixA,vectorb,lamb,100000,0.01)
end = time.time()
print("PG:%f"%(end-start))
with open('params.pkl','wb') as f:
pickle.dump((apg_x_k,apg_value_new,apg_value_list, pg_x_k,pg_value_new,pg_value_list), f)
start = time.time()
admm_x_k,admm_value_new,admm_value_list = ADMM(x_k,matrixA,vectorb,lamb,100000,0.01)
end = time.time()
print("ADMM:%f"%(end-start))
with open("params.pkl","wb") as f:
pickle.dump((pg_x_k,pg_value_new,pg_value_list,apg_x_k,apg_value_new,apg_value_list,admm_x_k,admm_value_new,admm_value_list), f)
test()这里的matrixA和vectorb是随机生成的,读者自行生成即可。
6、结果
6.1 实现细节
算法运行过程中生成的最优点,最小值,以及每一步的最小值存储在了params.pkl文件中。三个算法都采用了固定步长的方式控制变量。循环结束条件设置为迭代前后的的差值小于0.01。
6.2 优化结果
算法 | 计算时间(秒) | 迭代步数 | |
PG | 363.1331 | 12209 | 50.795972 |
APG | 16.4934 | 501 | 68.5204 |
ADMM | 6074.3267 | 12212 | 50.7995 |

此处为了更好地展示去掉了ADMM前10次迭代,原因如下。此外,将ADMM单独展示,因为其余PG的曲线过于相似,基本重合,不好区分。
ADMM的在第一次迭代并没有下降,反而值变大了很多,这是因为ADMM在优化初期根据初始点的不同,第一步迭代直接走出了最优点。简单说就是步子太大,扯到蛋了。
the 1th iter 's value= 608801.673736
the 2th iter 's value= 281612362.361569
the 3th iter 's value= 70603992.884212
the 4th iter 's value= 17851002.334822
the 5th iter 's value= 4661976.098038
the 6th iter 's value= 1363870.661902
the 7th iter 's value= 538518.606626
the 8th iter 's value= 331344.502170
the 9th iter 's value= 278722.487831
the 10th iter 's value= 264742.589034
7.3 总结讨论
从迭代步数来看APG远大于其余两种算法,PG与ADMM相似,这是因为APG采用了Nesterov加速的方式,迭代复杂度为。从计算时间来看,APG大于其余两种算法,PG大于ADMM,一方面因为迭代步数的原因,另一方面,ADMM在迭代过程中要计算矩阵的逆,这一步是时间复杂度是很高。从最终
的值来看,总体相差不大,但是APG劣于其他算法,这个直观上理解是因为APG每一次迭代会沿着惯性多走一些,这会导致其在到达最优点的时候也多走一点。
















