评估聚类模型

轮廓系数
聚类评估:轮廓系数(Silhouette Coefficient):
​​​ https://www.jianshu.com/p/6352d9d468f8​

si接近1,则说明样本i聚类合理。
si接近-1,则说明样本i更应该分类到另外的簇。
若si近似为0,则说明样本i在两个簇的边界上。

silhouette_score 返回的是平均轮廓系数

# 评估聚类模型
import numpy as np
from sklearn.metrics import silhouette_score
from sklearn import datasets
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

# generate feature matrix
features, _ = make_blobs(n_samples = 1000,
n_features = 10,
centers = 2,
cluster_std = 0.5,
shuffle = True,
random_state = 1)

# cluster data using k-means to predict classes
# 使用KMeans 对数据进行聚类,预测分类
model = KMeans(n_clusters=2, random_state=1).fit(features)

# get predicted classes
# 获取预测分类
target_predicted = model.labels_

# evaluate model 评估模型 轮廓系数
silhouette_score(features, target_predicted)
0.8916265564072142