python 线性回归
In the area of Machine Learning, one of the first algorithms that someone can come across is Linear Regression. In general, Linear Regression lies in category of supervised types of learning algorithms, where we consider a number of X observations, accompanied with the corresponded same number of Y target values, while we will try to model relationships between input and output features.
在机器学习领域,线性回归是人们可以遇到的最早的算法之一。 通常,线性回归属于监督型学习算法的类别,在该算法中,我们考虑了许多X观察值以及相应数量的Y目标值,同时我们将尝试对输入和输出特征之间的关系进行建模。
Any related problems can be categorised in Regression and Classification. In this post I will cover in few simple steps how we can approach and implement Linear Regression in Python where we will try to predict samples with continuous output.
任何相关问题都可以归类为回归和分类 。 在这篇文章中,我将通过几个简单的步骤介绍如何在Python中实现和实现线性回归,其中我们将尝试预测具有连续输出的样本。
(Basic Function)
Linear regression is called linear because the simplest model involves a linear combination of the input variables that can be described in a polynomial function
线性回归称为线性回归,因为最简单的模型涉及可以在多项式函数中描述的输入变量的线性组合
Within the above, Linear Regression comes with a strong assumption of the depended variables which will make us to turn in more complex models such Neural Network in other kind of problems.
在上文中,线性回归对因变量进行了强有力的假设,这将使我们在更复杂的模型(如神经网络)中遇到其他问题。
实作 (Implementation)
In this example, a simple model learns to draw a straight line that fits the distributed data. Learning the data is a repetitive process where in every learning cycle, an assessment of the model is made in respect of parameter optimisation used for training and the residual error minimisation for each prediction.
在此示例中,一个简单的模型学习绘制一条适合分布式数据的直线。 学习数据是一个重复的过程,其中在每个学习周期中,都会针对用于训练的参数优化和每个预测的残差最小化对模型进行评估。
(Dataset)
Construct a toy dataset with numpy library, and plot a random line.
使用numpy库构建玩具数据集,并绘制一条随机线。
Create data + Plot a line
创建数据+画一条线
In order to fit the line on the data, we need to measure the residual error between the line and the dataset, in simplified languaged the distance from the purple line to the actual data.
为了使线适合数据,我们需要用简化的语言测量从紫色线到实际数据的距离,测量线和数据集之间的残留误差。
(Loss Function)
To quantify the distance, we need a performance metric in other words a Loss/Cost function. This metric will also measure how good or bad our model is doing with learning the data. As our problem is linear regression, meaning that the values we are trying to predict are continuous values, we are going to you use Mean Squared Error (MSE). Of course there are other loss functions that can be used in a linear regression problem such Mean Absolute Error (MAE) or Huber Loss, but for a toy example we can keep it simple.
为了量化距离,我们需要一个性能指标,即损失/成本函数。 该指标还将衡量我们的模型在学习数据方面做得如何。 由于我们的问题是线性回归,这意味着我们尝试预测的值是连续值,因此我们将使用均方误差 (MSE)。 当然,还有其他损失函数可用于线性回归问题,例如平均绝对误差 (MAE)或Huber损失 ,但对于玩具示例,我们可以使其保持简单。
Mean Squared Error (MSE) 均方误差(MSE)
MSE: 685.9313 MSE:685.9313
Calculating the loss function, is the first step to keep track of the model performance. Now our goal is to minimise it, somehow.
计算损失函数是跟踪模型性能的第一步。 现在,我们的目标是以某种方式将其最小化。
Looking closer to the Loss function, we can see that the function is depended to w and b. In order to observe their impact in training process, we will plot the MSE keeping one of the two values constant interchangeably for both w and b.
靠近损耗函数,我们可以看到该函数取决于w和b 。 为了观察它们在训练过程中的影响,我们将绘制MSE,使w和b的两个值之一保持不变。
w perfomance — Keep b constant 性能-保持b不变
W and b Loss — Initial values vs Best W和b损失-初始值与最佳值
The above figures depict the loss change between the initial values and the minimum(best) values of b and w.
上图描述了b和w的初始值与最小(最佳)值之间的损耗变化。
The minima was found by just calculating the loss from a hardcoded range of values. Although, we need to find a smarter way in order to navigate from the initial position to the best position (lowest minima), optimising simultaneously both b and w.
通过仅从硬编码的值范围计算损耗来找到最小值。 虽然,我们需要找到一种更智能的方法,以便从初始位置导航到最佳位置(最低最小值),同时优化b和w。
梯度下降 (Gradient Descent)
We can tackle this problem using Gradient Descent algorithm.Gradient descent computes the gradients in each of the coefficients b and w, which is actually the slope at the current position. Given the slope, we know the direction to follow in order to reduce(minimise) the cost function.
我们可以使用Gradient Descent算法解决此问题.Gradient Descent计算系数b和w中的每个梯度,实际上是当前位置的斜率 。 给定斜率,我们知道减少(最小化)成本函数的方向。
Partial derivatives 偏导数
At each step the weight vector is moved in the direction of the greatest rate of decrease of the error function[1]. In other words we update the previous values of w and b with the new ones, by a defined strategy.
在每一步,权重矢量都以误差函数最大减少率的方向移动[1]。 换句话说,我们通过定义的策略用新的值更新 w和b的先前值。
Update 更新资料
Here, a hyperparameter a for learning rate is introduced, which is a factor that defines how big or small will be the update towards minimizing the error[2]. Very small or high values of learning rate will or might never make the model to converge and reach the best possible minima, while it might need a large number of epochs or it will contantly keep missing the minima accordingly.
在这里,介绍了一种用于学习率的超参数a ,它是一个定义将更新的大小,以使误差最小化的因素[2]。 很小或很高的学习率值将使模型收敛或永远不会达到最佳可能的最小值,而它可能需要大量的时间,或者将不断地缺少相应的最小值。
After updating with the new values, we then calculate the MSE after the new prediction is made. Τhese steps are parts of an iterative process which continue until the loss function stops decreasing, hope it finds a local minimum until it converges. Each learning cycle is called an epoch and the process is referred as training.
用新值更新后,我们便会在做出新预测后计算MSE。 这些步骤是迭代过程的一部分,该过程一直持续到损失函数停止减小为止,希望它找到局部最小值,直到收敛为止。 每个学习周期都被称为一个时期 ,这个过程称为培训 。
# Make a new prediction
def predict(x):
return w * x + b
Intuitive plot for Learning Rate 直观的学习率图
Gradient Descent is usually described as a plateau that we always seek the lowest minima.
梯度下降通常被描述为一个平台,我们一直在寻求最低的最小值。
http://www.bdhammel.com/learning-rates/ http://www.bdhammel.com/learning-rates/
Now that we have drilled down the problem we can develop the algorithm that optimises on partial derivatives of w and b.
现在,我们已经深入研究了问题,可以开发针对w和b的偏导数进行优化的算法。
Compute partial derivatives 计算偏导数
Optimization/Training process ends when the MSE eventually stops reducing.
当MSE最终停止减少时,优化/训练过程结束。
Learning Process:
学习过程:
Initialise w,b with random valuesFor a range of epochs:
- Predict a new line
- Compute partial derivatives (slope)
- Update w_new, b_newEvaluate with Loss Function (MSE)
Initialise with random values
用随机值初始化
# Random initialisation
w = np.random.random()
b = np.random.random()# Compute derivatives
def compute_derivatives(x,y):
dw = 0
db = 0
N = len(x)
for i in range(N):
x_i = x[i]
y_i = y[i]
y_hat = predict(x_i)
dw += -(2/N) * x_i * (y_i - y_hat)
db += -(2/N) * (y_i - y_hat)
return dw,db# Update with new values
def update(x,y, a=0.0002):
dw,db = compute_derivatives(x,y)
# Update previous w,b
new_w = w - (a*dw)
new_b = b - (a*db)
return new_w, new_b
Now that we have formed the training algorithm, let’s put them all in a Linear_Regression class to fully automate the process.
现在我们已经形成了训练算法,让我们将它们全部放入Linear_Regression类中以完全自动化该过程。
And that’s it. Full implementation of the article can be found in my github repository. In later posts the problem will be approached with Tensorflow’s API as long with a Classification implementation. Feel free to comment for any oversights made.
就是这样。 这篇文章的完整实现可以在我的github仓库中找到。 在以后的帖子中,只要使用了分类实现,就会使用Tensorflow的API处理该问题。 如有任何疏漏,请随时发表评论。
Many thanks to Thanos Tagaris[3] for the amazing repository and work.
非常感谢Thanos Tagaris [3]提供了惊人的资料库和工作。
[1] Christopher Bishop, Pattern Recognition and Machine Learning,Springer 2007
[1] Christopher Bishop,模式识别和机器学习,Springer 2007
[2] https://machinelearningmastery.com/linear-regression-for-machine-learning/
[2] https://machinelearningmastery.com/linear-regression-for-machine-learning/
[3] https://github.com/djib2011
[3] https://github.com/djib2011