python sklearn 非线性回归 sklearn 多元非线性回归

转载

mob64ca14017c37 2024-08-08 16:48:01

文章标签 机器学习监督学习多项式回归 Sklearn Python 文章分类 Python 后端开发

文章目录

1. 多项式回归
2. Sklearn 实现
参考资料

1. 多项式回归

对于非线性数据，也可以用线性模型来拟合。一个简单的方法就是将每个特征的幂次方添加为一个新特征，然后在这个拓展多的特征集上训练线性模型。这种方法被称为多项式回归。

回归模型

$python sklearn 非线性回归 sklearn 多元非线性回归_机器学习$

称为一元二阶（或一元二次）多项式模型，其中， $python sklearn 非线性回归 sklearn 多元非线性回归_监督学习_02$ 。

为了反应回归系数所对应的自变量次数，我们通常将多项式回归模型中的系数表示称下面模型中的情形：

$python sklearn 非线性回归 sklearn 多元非线性回归_Python_03$

模型式 (2) 的回归函数 $python sklearn 非线性回归 sklearn 多元非线性回归_多项式回归_04$ 是一条抛物线，通常称称为二项式回归函数。回归系数 $python sklearn 非线性回归 sklearn 多元非线性回归_Python_05$ 称为线性效应系数， $python sklearn 非线性回归 sklearn 多元非线性回归_多项式回归_06$ 为二次效应系数。

相应地，回归模型

$python sklearn 非线性回归 sklearn 多元非线性回归_多项式回归_07$

称为一元三次多项式模型。^[1]

2. Sklearn 实现

对于非线性的数据，我们将利用 sklearn.preprocessing.PolynomialFeatures 将非线性数据通过多项式变换为线性数据，然后就可以重复监督学习 | 线性回归之多元线性回归原理及Sklearn实现中的方法完成回归。

PolynomialFeatures(degree=2, interaction_only=False, include_bias=True, order=‘C’)

参数设置：

degree: integer

The degree of the polynomial features. Default = 2.

interaction_only: boolean, default = False

If true, only interaction features are produced: features that are products of at most degree distinct input features (so not x[1] ** 2, x[0] * x[2] ** 3, etc.).

include_bias: boolean

If True (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model).

order: str in {‘C’, ‘F’}, default ‘C’

Order of output array in the dense case. ‘F’ order is faster to compute, but may slow down subsequent estimators

方法：

powers_: array, shape (n_output_features, n_input_features)

powers_[i, j] is the exponent of the jth input in the ith output.

n_input_features_: int

The total number of input features.

n_output_features_: int

The total number of polynomial output features. The number of output features is computed by iterating over all suitably sized combinations of input features.

首先基于二项式回归函数制造一些非线性数据（并添加随机噪声）。

import numpy as np
import numpy.random as rnd
import matplotlib.pyplot as plt

np.random.seed(42)

m = 100
X = 6 * np.random.rand(m, 1) - 3
y = 0.5 * X**2 + X + 2 + np.random.randn(m, 1)

plt.plot(X, y, "b.")
plt.xlabel("$x_1$", fontsize=18)
plt.ylabel("$y$", rotation=0, fontsize=18)
plt.axis([-3, 3, 0, 10])
plt.show()

python sklearn 非线性回归 sklearn 多元非线性回归_多项式回归_08

图1 生成的非线性带噪声数据集

显然，直线永远不可能拟合这个数据。所以我们使用 PolynomialFeatures 类来对训练数据进行转换，将每个特征的平方（二次多项式）作为新特征加入训练集（这个例子中只有一个特征）：

from sklearn.preprocessing import PolynomialFeatures
poly_features = PolynomialFeatures(degree=2, include_bias=False)
X_poly = poly_features.fit_transform(X)
X[0]

array([-0.75275929])

X_poly[0]

array([-0.75275929,  0.56664654])

X_poly 现在包含原本的特征 X 和该特征的平方。现在对这个拓展后的特征集匹配一个 LinearRegression 模型。

from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)
lin_reg.intercept_, lin_reg.coef_

(array([1.78134581]), array([[0.93366893, 0.56456263]]))

还不错，模型预估 $python sklearn 非线性回归 sklearn 多元非线性回归_监督学习_09$ ，而实际上原来的函数是 $python sklearn 非线性回归 sklearn 多元非线性回归_机器学习_10$

注意，当存在多个特征时，多项式回归能够发现特征和特征之间的关系（纯线性回归模型做不到这一点）。这是因为 PolynomialFeatures 会在给定的多项式阶数下，添加所有特征组合。这是因为 PolynomialFeatures 会在给定的多项式阶数下，添加所有特征组合（interaction_only = False）。例如，有两个特征 a 和 b ，阶数 degree=3，PolynomialFeatures 不会只添加特征 $python sklearn 非线性回归 sklearn 多元非线性回归_多项式回归_11$ ，还会添加组合 $python sklearn 非线性回归 sklearn 多元非线性回归_Python_12$ 以及 $python sklearn 非线性回归 sklearn 多元非线性回归_机器学习_13$ 。^[2]

X_new=np.linspace(-3, 3, 100).reshape(100, 1)
X_new_poly = poly_features.transform(X_new)
y_new = lin_reg.predict(X_new_poly)
plt.plot(X, y, "b.")
plt.plot(X_new, y_new, "r-", linewidth=2, label="Predictions")
plt.xlabel("$x_1$", fontsize=18)
plt.ylabel("$y$", rotation=0, fontsize=18)
plt.legend(loc="upper left", fontsize=14)
plt.axis([-3, 3, 0, 10])
plt.show()

python sklearn 非线性回归 sklearn 多元非线性回归_机器学习_14