小批量梯度下降法如何更新权值小批量梯度下降python

转载

mob64ca13fc220d 2024-04-17 17:43:13

文章标签 小批量梯度下降法如何更新权值梯度下降迭代初始化梯度下降法 文章分类 深度学习人工智能

BGD Batch Gradient Descent批量梯度下降法

先推一波参考文献，我真的看了一天这个，其实还没研究透，蠢蠢的，所以先总结一下吧

话不多说，show you the code

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

def plot_data(population,revenue):

    plt.xlabel('Population of City in 10,000s',color='black',fontsize=20)
    plt.ylabel('Profit in $10,000s',color='black',fontsize=20)
    plt.title('plot_data',fontsize=20,color="purple")
    plt.grid(True)
    plt.scatter(population,revenue)
    plt.show()

def compute_cost(X, Y, theta): #这个function是计算loss function的值， J(theta)公式
    m = Y.size
    cost = 0
    for i in range(0,m):
        x = X[i,1]
        y = Y[i]
        cost += (y-(theta[1]*x+theta[0])) ** 2
    cost = cost/(2*float(m))
    return cost

def gradient_descent(X, Y, theta, alpha, num_iters):
    m = Y.size
    J_history = np.zeros(num_iters)
    t = theta.size

    for i in range(num_iters):
        for j in range(t):   BGD的特点就是每次迭代都使用所有的样本
            theta[j] = theta[j] - alpha * np.sum((np.dot(X,theta)-y)*X[:,j]) / m
        J_history[i] = compute_cost(X, Y, theta)
    return theta, J_history




df_data = np.loadtxt('street&profits.txt',delimiter=',',usecols=(0,1)) ##usecols只使用的列数
X = df_data[:,0]
y = df_data[:,1]
m = X.size  #样本数


plot_data(X,y)

# ===================== Part 2: Gradient descent =====================
print('Running Gradient Descent...')

X = np.c_[np.ones(m),X]  
c_按列叠加两个矩阵,左右。  r是上下，其实都是列向量，竖着看，可以用shape看一下。
这里是为了b值，即theta0乘的值为1
theta = np.zeros(2)  #初始化参数值  theta[0]是b，theta[1]是w，拟合函数为 y=theta0+theta1*x



iterations = 1500
alpha = 0.01  #learning rate

print('Initial cost : ' + str(compute_cost(X, y, theta)) + ' (This value should be about 32.07)')

theta, J_history = gradient_descent(X, y, theta, alpha, iterations)

print('Theta found by gradient descent: ' + str(theta.reshape(2)))
# Plot the linear fit
plt.figure(0)
line1, = plt.plot(X[:, 1], np.dot(X, theta), label='Linear Regression',color='red')
plt.legend(handles=[line1])
plot_data(df_data[:,0], df_data[:,1])

# Predict values for population sizes of 35,000 and 70,000
predict1 = np.dot(np.array([1, 3.5]), theta)
print('For population = 35,000, we predict a profit of {:0.3f} (This value should be about 4519.77)'.format(predict1*10000))
predict2 = np.dot(np.array([1, 7]), theta)
print('For population = 70,000, we predict a profit of {:0.3f} (This value should be about 45342.45)'.format(predict2*10000))

小批量梯度下降法如何更新权值小批量梯度下降python_迭代

小批量梯度下降法如何更新权值小批量梯度下降python_小批量梯度下降法如何更新权值_02

这是预测一个街道，如果人数规模增加之后，它能产出的利润会提高到多少。

第一步是画图，看看数据的分布

第二步是梯度下降，重点内容！

小批量梯度下降法如何更新权值小批量梯度下降python_初始化_03

小批量梯度下降法如何更新权值小批量梯度下降python_梯度下降法_04

小批量梯度下降法如何更新权值小批量梯度下降python_小批量梯度下降法如何更新权值_05

必须弄懂这3条公式，第一条是计算loss function的最小值

第二条是你的model。 theta0是b，theta1是w

第三天是更新theta值

BGD的特点就是每次迭代都使用所有的样本，所以一定要一次性把(h(x)-y)*x算完，用matrix

稍微简单易懂的方式

上面是使用矩阵的方式去计算，目的是为了每次迭代都把所有sample都考虑进去，其实下面将要介绍的方法，只是换了种写法，道理是一样的，不过我认为会好理解很多，所以也顺便记录下来吧！

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# y = b + wx
df_data = np.loadtxt('street&profits.txt',delimiter=',',usecols=(0,1)) ##usecols只使用的列数
X = df_data[:,0]
Y = df_data[:,1]
m = Y.size  #样本数

alpha = 0.0001  # learning rate
#初始化参数值
b = 0
w = 0

diff0 = 0   #初始化微分值
diff1 = 0

#初始化误差
error_pre=0
error_post=0
#推出迭代的两个误差差值
threshold = 5


while True:

    for i in range(m):
        diff0 += w*X[i]+b-Y[i]
        diff1 += (w*X[i]+b-Y[i])*X[i]
    b = b-alpha/m*diff0
    w = w-alpha/m*diff1

    for j in range(len(X)):
        error_post += (Y[j]-(b+w*X[j])) **2 /(2*m)
    if abs(error_post-error_pre)<threshold:
        break
    else:
        
        error_pre = error_post


plt.plot(X,[ w*x+b for x in X])
plt.plot(X,Y,'bo')
print(w,b)
plt.show()

我相信多数人看这个就比较好理解的，而且这里的迭代次数是根据error的差值来决定是否跳出迭代，不过这个差值比较难调整，如果太小会一直运行，因为不容易跳出去，或者就没法计算出那么小的值出来，因为gradient descent不是一定会让loss function出来的值一直在变小，有可能因为learning rate或者feature的scale问题而导致Loss会变大。下面是结果图

小批量梯度下降法如何更新权值小批量梯度下降python_梯度下降_06