Java 类名:com.alibaba.alink.operator.batch.evaluation.EvalClusterBatchOp

Python 类名:EvalClusterBatchOp

功能介绍

聚类评估是对聚类算法的预测结果进行效果评估,支持下列评估指标。

ALINK(四十一):模型评估(六)聚类评估 (EvalClusterBatchOp)_类名

 

 ALINK(四十一):模型评估(六)聚类评估 (EvalClusterBatchOp)_lua_02

 

 ALINK(四十一):模型评估(六)聚类评估 (EvalClusterBatchOp)_java_03

 

 

参数说明

名称

中文名称

描述

类型

是否必须?

默认值

predictionCol

预测结果列名

预测结果列名

String

 

labelCol

标签列名

输入表中的标签列名

String

 

null

vectorCol

向量列名

输入表中的向量列名

String

 

null

distanceType

距离度量方式

距离类型

String

 

"EUCLIDEAN"

代码示例

Python 代码

from pyalink.alink import *
import pandas as pd
useLocalEnv(1)
df = pd.DataFrame([
    [0, "0 0 0"],
    [0, "0.1,0.1,0.1"],
    [0, "0.2,0.2,0.2"],
    [1, "9 9 9"],
    [1, "9.1 9.1 9.1"],
    [1, "9.2 9.2 9.2"]
])
inOp = BatchOperator.fromDataframe(df, schemaStr='id int, vec string')
metrics = EvalClusterBatchOp().setVectorCol("vec").setPredictionCol("id").linkFrom(inOp).collectMetrics()
print("Total Samples Number:", metrics.getCount())
print("Cluster Number:", metrics.getK())
print("Cluster Array:", metrics.getClusterArray())
print("Cluster Count Array:", metrics.getCountArray())
print("CP:", metrics.getCp())
print("DB:", metrics.getDb())
print("SP:", metrics.getSp())
print("SSB:", metrics.getSsb())
print("SSW:", metrics.getSsw())
print("CH:", metrics.getVrc())

Java 代码

import org.apache.flink.types.Row;
import com.alibaba.alink.operator.batch.BatchOperator;
import com.alibaba.alink.operator.batch.evaluation.EvalClusterBatchOp;
import com.alibaba.alink.operator.batch.source.MemSourceBatchOp;
import com.alibaba.alink.operator.common.evaluation.ClusterMetrics;
import org.junit.Test;
import java.util.Arrays;
import java.util.List;
public class EvalClusterBatchOpTest {
  @Test
  public void testEvalClusterBatchOp() throws Exception {
    List <Row> df = Arrays.asList(
      Row.of(0, "0 0 0"),
      Row.of(0, "0.1,0.1,0.1"),
      Row.of(0, "0.2,0.2,0.2"),
      Row.of(1, "9 9 9"),
      Row.of(1, "9.1 9.1 9.1"),
      Row.of(1, "9.2 9.2 9.2")
    );
    BatchOperator <?> inOp = new MemSourceBatchOp(df, "id int, vec string");
    ClusterMetrics metrics = new EvalClusterBatchOp().setVectorCol("vec").setPredictionCol("id").linkFrom(inOp)
      .collectMetrics();
    System.out.println("Total Samples Number:" + metrics.getCount());
    System.out.println("Cluster Number:" + metrics.getK());
    System.out.println("Cluster Array:" + Arrays.toString(metrics.getClusterArray()));
    System.out.println("Cluster Count Array:" + Arrays.toString(metrics.getCountArray()));
    System.out.println("CP:" + metrics.getCp());
    System.out.println("DB:" + metrics.getDb());
    System.out.println("SP:" + metrics.getSp());
    System.out.println("SSB:" + metrics.getSsb());
    System.out.println("SSW:" + metrics.getSsw());
    System.out.println("CH:" + metrics.getVrc());
  }
}

运行结果

Total Samples Number: 6
Cluster Number: 2
Cluster Array: ['0', '1']
Cluster Count Array: [3.0, 3.0]
CP: 0.11547005383792497
DB: 0.014814814814814791
SP: 15.588457268119896
SSB: 364.5
SSW: 0.1199999999999996
CH: 12150.000000000042