大模型zenme设置temperature参数模型参数调整方法

转载

小蝌蚪 2024-04-26 11:44:22

文章标签 机器学习 bootstrap 交叉验证建模 文章分类 机器学习人工智能

注：本文主要方便自己查阅，如有问题欢迎留言

模型默认的参数有时并不是最优的参数，为了寻找最优的参数，在这里使用RandomizedSearchCV和GridSearchCV.

1 RandomizedSearchCV

RandomizedSearchCV函数可以帮助我们在候选集组合中，不断的随机选择一组合适的参数来建模，并且求其交叉验证后的评估结果。如果按照每个参数进行遍历，那么计算量将非常的大，假设模型有5个参数待定，每个参数都有10种候选值，这将是一个巨大的数据量，几小时能完成一次建模就已经不错了，所以我们很难遍历到所有的可能，随机变成了一种策略，让我们大致能得到比较合适的参数组合。

1.1 设置随机搜索的参数

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import RandomizedSearchCV

# 建立树的个数
n_estimators = [int(x) for x in range(200, 2000, 200)]

#最大特征选择方式
max_features = ["auto", "sqrt"]

# 树的最大深度
max_depth = [int(x) for x in range(10, 20, 2)]
max_depth.append(None)

# 节点分裂需要最小的样本的个数
min_samples_split = [2, 5, 10]

# 叶子节点最小样本数，任何分裂不能让其子节点样本数少于此值
min_samples_leaf = [1, 2, 4]
# 样本采样方法
bootstrap = [True, False]

# Random grid
random_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
               'bootstrap': bootstrap}

1.2 进行训练

from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor()

model_random = RandomizedSearchCV(estimator=model, param_distributions=random_grid,
                              n_iter = 100, scoring='neg_mean_absolute_error', 
                              cv = 3, verbose=2, random_state=42, n_jobs=-1)

model_random.fit(train_features, train_labels)

RandomizedSearchCV中参数解释

- Estimator：RandomizedSearchCV这个方法是一个通用的，并不是专为随机森林设计的，所以我们需要指定选择的算法模型是什么。
- Distributions：参数的候选空间，我们之间已经用字典格式给出了所需的参数分布。
- n_iter：随机寻找参数组合的个数，比如在这里我们赋值了100代表接下来要随机找100组参数的组合，在其中找到最好的一个。
- Scoring：评估方法，按照该方法去找到最好的参数组合
- Cv：交叉验证。
- Verbose：打印信息的数量。
- random_state：随机种子，为了使得咱们的结果能够一致，排除掉随机成分的干扰，一般我们都会指定成一个值。
- n_jobs：多线程来跑这个程序，如果是-1就会用所有的，但是可能会有点卡。

即便我把n_jobs设置成了-1，程序运行的还是很慢，因为我们建立100次模型来选择参数，并且还是带有3折交叉验证的，那就相当于300个任务了

model_random.best_params_

输出：
{'n_estimators': 1400,
 'min_samples_split': 5,
 'min_samples_leaf': 4,
 'max_features': 'auto',
 'max_depth': 10,
 'bootstrap': True}

2.GridSearchCV

GridSearchCV是进行网络搜索，说白了就是一个一个的遍历，就像我们之前说的组合有多少种，就全部走一遍

from sklearn.model_selection import GridSearchCV

# 对RandomizedSearchCV得到的最好的参数在一定范围内微调
param_grid = {'n_estimators': [800, 900, 1000, 1100, 1200],
             'min_samples_split': [2, 3, 4, 5, 6],
             'min_samples_leaf': [3, 4, 5, 6],
             'max_features': ['auto'],
             'max_depth': [8, 9, 10, 11],
             'bootstrap': [True]}

model2 = RandomForestRegressor()

grid_search = GridSearchCV(estimator=model2, param_grid=param_grid, scoring =   'neg_mean_absolute_error', cv = 3, 
                           n_jobs = -1, verbose = 2)

grid_search.fit(train_features, train_labels)

grid_search.best_params_

输出：

{'bootstrap': True,
 'max_depth': 11,
 'max_features': 'auto',
 'min_samples_leaf': 6,
 'min_samples_split': 2,
 'n_estimators': 800}

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：lua创建指针创建指针数组

下一篇：cmd echo 换行符 cmd里换行

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯