数据预处理:

读取数据:

import pandas as pd
data=pd.read_csv(r'C:\Users\Administrator\Desktop\insurance.csv',encoding=('utf-8'))

筛选数据:

# 去除噪点
data_1 = data.query('age<=40 & charges<=10000')           # 40岁以下     且 10000元以下
data_2 = data.query('age>40 & age<=50 & charges<=12000')  # 40~50岁      且 12000元以下
data_3 = data.query('age>50 & charges<=17500')            # 50岁以上     且 17500元以下
new_data = pd.concat([data_1, data_2, data_3], axis=0)    #合并数据,axis=0(以列名相同合并) axis=1(以行名合并)

 

按照内容筛选行:

x_1=data[data[4]=='Iris-setosa'].values           #筛选第四列中,内容为'Iris'的所有行,提取出来
x_2=data[data[4]=='Iris-versicolor'].values
x_3=data[data[4]=='Iris-virginica'].values

 

选用特定列:

X = new_data.iloc[:, 0:1].values
y = new_data['charges'].values
y = data['charges'].values
data_1 = data.drop(['charges'], axis = 1)              #去除charges这一列,axis=0表示跨行,axis=1表示跨列
X = data_1.values

 

特征缩放:

from sklearn.preprocessing import StandardScaler            # 特征缩放
sc_x = StandardScaler()                                     #标准化
x_train = sc_x.fit_transform(x_train)          #转化
x_test = sc_x.transform(x_test)                #转化
sc_y = StandardScaler()
y_train = np.ravel(sc_y.fit_transform(y_train.reshape(-1, 1)))    #转化
···························模型区域
·························
y_pred = regressor.predict(x_test) #预测
y_pred
= sc_y.inverse_transform(y_pred) # y_pred变回特征缩放之前的