java 多元线性回归代码多元线性回归数据处理

转载

mob6454cc6ba5a5 2023-08-02 12:58:16

文章标签 java 多元线性回归代码多元线性回归线性回归数据项 文章分类 Java 后端开发

1.多元线性回归算法

前面学习了简单的线性回归算法，简单的线性回归算法的数据值得是属性只有一项，同时也对应有一个类别值。那么这样的数据就是在二维空间上就是一个点，那么回归后在一个二维空间上就是一条回归直线，就有回归方程。那么对应的，在碰到数据不止一个属性值（N>=2），此时就归为了多元回归算法，那么就是在N维空间中找到那个可以映射的回归图形。

2.数学理论

如前面所说，简单的线性回归算法里面就是会有二维空间的数据点(x,y)，而在多元线性回归算法中，每一个数据项的组成形式是一个向量 **x** = (x1 , x2 , x3 , ... , xn)总共有n项。而如果m个数据项的话，那么数据集将会是一个(m * n)的矩阵，每一行就是一个数据项。
从结果上考虑，那么我们要找的回归方程是 y_hat = θ0 + θ1 * x1 + θ2 * x2 + θ3 * x3 + ... + θn * xn，其中待预测的数据项**x** =  (x1 , x2 , x3 , ... , xn)，问题就转化为是求参数**θ** = (θ0 + θ1 + θ2 + θ3 + ... + θn)，而找到这个最佳的标准就是线性回归方程是最优的。而我们判断优劣是通过比较方程的总残差，即

java 多元线性回归代码多元线性回归数据处理_数据项

那么通过把 y_hat = θ0 + θ1 * x1 + θ2 * x2 + θ3 * x3 + … + θn * xn = X_b · θ代入，使下式最小：

java 多元线性回归代码多元线性回归数据处理_多元线性回归_02

其中X_b是在x矩阵的基础上加多一列，这一列全是1，原因是我们默认有一个x0属性，矩阵相乘后会有θ0出现，那么自然x0就全是1了。

java 多元线性回归代码多元线性回归数据处理_数据项_03

再求θ向量的极值，就会有（具体的数学推理还没有去看）

java 多元线性回归代码多元线性回归数据处理_数据项_04

附上一张草图哈哈（给自己回忆）：

java 多元线性回归代码多元线性回归数据处理_多元线性回归_05

3.多元线性回归（LinearRegression）代码：

import numpy as np
from SLR import metric
#多元线性回归分类器类
class LinearRegression:

    #interception是截距，即θ0，而相当于y = a * x + b 中的b
    #coef是系数，即最终方程的(θ0 ， θ1 ， θ2 ， ... ，θn)
    #theta是θ
    def __init__(self):
        self.coef_ = None
        self.interception_ = None
        self.theta_ = None

    def fit_normal(self , x_train , y_train):

        assert  x_train.shape[0] == y_train.shape[0] , "The shape of x must be equal to the y."

        #往原先的属性数据集中加入一列，以方便对应点乘出现θ0这个参数
        X_b = np.hstack( [np.ones((len(x_train) , 1)),x_train])

        #运用多元线性回归公式
        self.theta_ =  np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y_train)
        self.coef_ = self.theta_[1:]
        self.interception_ = self.theta_[0]
        
        return self

    def predict_normal(self , x_predict):

        assert x_predict.shape[1] == len(self.coef_) , "The shape of x must be equal to the y."

        assert self.theta_ is not None , "The model need to be trained at first."

        #基于y_hat = X_b * θ,而x_predict是可能有很多行的矩阵，同时少了一个常数列，要加上一列再和θ做点乘
        X_b = np.hstack( [np.ones((len(x_predict) , 1)),x_predict])
        y_predict = X_b.dot(self.theta_)

        return y_predict

    def r2_score(self , y_true , y_predict):

        assert y_true.shape[0] == y_predict.shape[0] , "The shape of y_true is not equal to the shape of y_predict."

        return 1 - ((metric.mean_squared_error(y_true , y_predict)) / (np.var(y_true)))

4.几种评判机器学习模型的标准
（1）.MSE–mean_squared_error（均方误差）
（2）.RMSE–root_mean_squared_error（均方根误差）
（3）.MAE–mean_absolute_error（平均绝对误差）
（4）.R2–R_squared（）

import numpy as np

def mean_squared_error(y_true , y_predict):

    assert y_true.shape == y_predict.shape , "The shape of y_true is not equal to the shape of y_predict."

    return ((y_true - y_predict).dot(y_true - y_predict)) / y_true.shape

def root_mean_squared_error(y_true , y_predict):

    assert y_true.shape == y_predict.shape , "The shape of y_true is not equal to the shape of y_predict."

    return np.sqrt(mean_squared_error(y_true , y_predict))

def mean_absolute_error(y_true , y_predict):

    assert y_true.shape == y_predict.shape, "The shape of y_true is not equal to the shape of y_predict."

    return np.sum(np.abs(y_true - y_predict)) / (y_true.shape)
    
 def r2_score(y_true , y_predict):

    assert y_true.shape == y_predict.shape, "The shape of y_true is not equal to the shape of y_predict."

    return 1 - ((mean_squared_error(y_true , y_predict))/(np.var(y_true)))

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。