1.13.1. Removing features with low variance
移除方差较小的feature
from sklearn.feature_selection import VarianceThreshold X = [[0, 0, 1], [0, 1, 0], [1, 0, 0], [0, 1, 1], [0, 1, 0], [0, 1, 1]] sel = VarianceThreshold(threshold=(.8 * (1 - .8))) sel.fit_transform(X)
1.13.2. Univariate feature selection
变量特征选择
1.13.3. Recursive feature elimination
递归特征剔除
1.13.4. Feature selection using SelectFromModel
使用 SelectFromModel进行筛选
1.13.4.1. L1-based feature selection 基于L1正则的特征选择
1.13.4.2. Tree-based feature selection 基于树的特征选择
1.13.5. Feature selection as part of a pipeline¶
使用管道流程控制进行特征选择
clf = Pipeline([ ('feature_selection', SelectFromModel(LinearSVC(penalty="l1"))), ('classification', RandomForestClassifier()) ]) clf.fit(X, y)
https://scikit-learn.org/stable/modules/feature_selection.html#univariate-feature-selection