Elasticsearch实战（四）：Springboot实现Elasticsearch指标聚合与下钻分析open-API

原创

秃了也弱了 2023-09-30 19:12:05 ©著作权

文章标签 elasticsearch spring boot jenkins Elastic 字段 文章分类 Python 后端开发

©著作权归作者所有：来自51CTO博客作者秃了也弱了的原创作品，请联系作者获取转载授权，否则将追究法律责任

文章目录

系列文章索引
一、指标聚合与分类

1、什么是指标聚合（Metric）
2、Metric聚合分析分为单值分析和多值分析两类
3、概述

二、单值分析API设计

1、Avg(平均值)

（1）对所有文档进行avg聚合（DSL）
（2）对筛选后的文档聚合
（3）根据Script计算平均值
（4）总结

2、Max(最大值)

（1）统计所有文档
（2）统计过滤后的文档

3、Min(最小值)

（1）统计所有文档
（2）统计筛选后的文档

4、Sum(总和)

（1）统计所有文档汇总

5、Cardinality(唯一值)

（1）统计所有文档
（2）统计筛选后的文档

三、多值分析API设计

1、Stats Aggregation

（1）统计所有文档
（2）统计筛选文档

2、扩展状态统计

（1）统计所有文档
（2）统计筛选后的文档

3、百分位度量/百分比统计

（1）统计所有文档
（2）统计筛选后的文档

4、百分位等级/百分比排名聚合

（1）统计所有文档
（2）统计过滤后的文档

四、JavaAPI实现

一、指标聚合与分类

1、什么是指标聚合（Metric）

聚合分析是数据库中重要的功能特性，完成对某个查询的数据集中数据的聚合计算，
如：找出某字段（或计算表达式的结果）的最大值、最小值，计算和、平均值等。
ES作为搜索引擎兼数据库，同样提供了强大的聚合分析能力。
对一个数据集求最大值、最小值，计算和、平均值等指标的聚合，在ES中称为指标聚合。

2、Metric聚合分析分为单值分析和多值分析两类

1、单值分析，只输出一个分析结果
min,max,avg,sum,cardinality（cardinality 求唯一值，即不重复的字段有多少（相当于mysql中的distinct）
2、多值分析，输出多个分析结果
stats,extended_stats,percentile,percentile_rank

3、概述

官网：https://www.elastic.co/guide/en/elasticsearch/reference/7.4/search-aggregations-metrics.html
语法：

"aggregations" : {
	"<aggregation_name>" : { <!--聚合的名字 -->
		"<aggregation_type>" : { <!--聚合的类型 -->
			<aggregation_body> <!--聚合体：对哪些字段进行聚合 -->
		}
		[,"meta" : { [<meta_data_body>] } ]? <!--元 -->
		[,"aggregations" : { [<sub_aggregation>]+ } ]? <!--在聚合里面在定义子聚合-->
	}
	[,"<aggregation_name_2>" : { ... } ]* <!--聚合的名字 -->
}

openAPI设计目标与原则：
1、DSL调用与语法进行高度抽象，参数动态设计
2、Open API通过结果转换器支持上百种组合调用qurey,constant_score,match/matchall/filter/sort/size/frm/higthlight/_source/includes
3、逻辑处理公共调用，提升API业务处理能力
4、保留原生API与参数的用法

二、单值分析API设计

1、Avg(平均值)

从聚合文档中提取的价格的平均值。

（1）对所有文档进行avg聚合（DSL）

POST product_list_info/_search
{
	"size": 0,
	"aggs": {
		"result": {
			"avg": {
				"field": "price"
			}
		}
	}
}

以上汇总计算了所有文档的平均值。
“size”: 0, 表示只查询文档聚合数量，不查文档，如查询50，size=50
aggs：表示是一个聚合
result：可自定义，聚合后的数据将显示在自定义字段中

OpenAPI查询参数设计：

{
    "indexName": "product_list_info",
    "map": {
        "size": 0,
        "aggs": {
            "result": {
                "avg": {
                    "field": "price"
                }
            }
        }
    }
}

（2）对筛选后的文档聚合

POST product_list_info/_search
{
    "size": 0,
    "query": {
        "term": {
            "onelevel": "手机通讯"
        }
    },
    "aggs": {
        "result": {
            "avg": {
                "field": "price"
            }
        }
    }
}

OpenAPI查询参数设计：

{
    "indexName": "product_list_info",
    "map": {
        "size": 0,
        "query": {
            "term": {
                "onelevel": "手机通讯"
            }
        },
        "aggs": {
            "result": {
                "avg": {
                    "field": "price"
                }
            }
        }
    }
}

（3）根据Script计算平均值

es所使用的脚本语言是painless这是一门安全-高效的脚本语言,基于jvm的

#统计所有
POST product_list_info/_search?size=0
{
    "aggs": {
        "result": {
            "avg": {
                "script": {
                    "source": "doc.evalcount.value"
                }
            }
        }
    }
}
结果："value" : 599929.2282791147
"source": "doc['evalcount']"
"source": "doc.evalcount"

#有条件
POST product_list_info/_search?size=0
{
    "query": {
        "term": {
            "onelevel": "手机通讯"
        }
    },
    "aggs": {
        "czbk": {
            "avg": {
                "script": {
                    "source": "doc.evalcount"
                }
            }
        }
    }
}
结果："value" : 600055.6935087288

OpenAPI查询参数设计：

{
    "indexName": "product_list_info",
    "map": {
        "size": 0,
        "aggs": {
            "czbk": {
                "avg": {
                    "script": {
                        "source": "doc.evalcount"
                    }
                }
            }
        }
    }
}

（4）总结

avg平均
1、统一avg（所有文档）
2、有条件avg（部分文档）
3、脚本统计（所有）
4、脚本统计（部分）

2、Max(最大值)

计算从聚合文档中提取的数值的最大值。

（1）统计所有文档

POST product_list_info/_search
{
    "size": 0,
    "aggs": {
        "result": {
            "max": {
                "field": "price"
            }
        }
    }
}

结果： “value” : 9.9999999E7

OpenAPI查询参数设计：

{
    "indexName": "product_list_info",
    "map": {
        "size": 0,
        "aggs": {
            "result": {
                "max": {
                    "field": "price"
                }
            }
        }
    }
}

（2）统计过滤后的文档

POST product_list_info/_search
{
    "size": 0,
    "query": {
        "term": {
            "onelevel": "手机通讯"
        }
    },
    "aggs": {
        "result": {
            "max": {
                "field": "price"
            }
        }
    }
}

结果： “value” : 2474000.0

OpenAPI查询参数设计：

{
    "indexName": "product_list_info",
    "map": {
        "size": 0,
        "query": {
            "term": {
                "onelevel": "手机通讯"
            }
        },
        "aggs": {
            "czbk": {
                "max": {
                    "field": "price"
                }
            }
        }
    }
}

结果： “value” : 2474000.0

3、Min(最小值)

计算从聚合文档中提取的数值的最小值。

（1）统计所有文档

POST product_list_info/_search
{
    "size": 0,
    "aggs": {
        "result": {
            "min": {
                "field": "price"
            }
        }
    }
}

结果：“value”: 0.0

OpenAPI查询参数设计：

{
    "indexName": "product_list_info",
    "map": {
        "size": 0,
        "aggs": {
            "result": {
                "min": {
                    "field": "price"
                }
            }
        }
    }
}

（2）统计筛选后的文档

POST product_list_info/_search
{
    "size": 1,
    "query": {
        "term": {
            "onelevel": "手机通讯"
        }
    },
    "aggs": {
        "czbk": {
            "min": {
                "field": "price"
            }
        }
    }
}

结果：“value”: 0.0

参数size=1；可查询出金额为0的数据

OpenAPI查询参数设计：

{
    "indexName": "product_list_info",
    "map": {
        "size": 1,
        "query": {
            "term": {
                "onelevel": "手机通讯"
            }
        },
        "aggs": {
            "result": {
                "min": {
                    "field": "price"
                }
            }
        }
    }
}

4、Sum(总和)

（1）统计所有文档汇总

POST product_list_info/_search
{
    "size": 0,
    "query": {
        "constant_score": {
            "filter": {
                "match": {
                    "threelevel": "手机"
                }
            }
        }
    },
    "aggs": {
        "result": {
            "sum": {
                "field": "price"
            }
        }
    }
}

结果：“value” : 3.433611809E7

OpenAPI查询参数设计：

{
    "indexName": "product_list_info",
    "map": {
        "size": 0,
        "query": {
            "constant_score": {
                "filter": {
                    "match": {
                        "threelevel": "手机"
                    }
                }
            }
        },
        "aggs": {
            "result": {
                "sum": {
                    "field": "price"
                }
            }
        }
    }
}

5、Cardinality(唯一值)

Cardinality Aggregation，基数聚合。它属于multi-value，基于文档的某个值（可以是特定的字段，也可以通过脚本计算而来），计算文档非重复的个数（去重计数），相当于sql中的distinct。

cardinality 求唯一值，即不重复的字段有多少（相当于mysql中的distinct）

（1）统计所有文档

POST product_list_info/_search
{
    "size": 0,
    "aggs": {
        "result": {
            "cardinality": {
                "field": "storename"
            }
        }
    }
}

结果：“value” : 103169

OpenAPI查询参数设计：

{
    "indexName": "product_list_info",
    "map": {
        "size": 0,
        "aggs": {
            "result": {
                "cardinality": {
                    "field": "storename"
                }
            }
        }
    }
}

（2）统计筛选后的文档

POST product_list_info/_search
{
    "size": 0,
    "query": {
        "constant_score": {
            "filter": {
                "match": {
                    "threelevel": "手机"
                }
            }
        }
    },
    "aggs": {
        "result": {
            "cardinality": {
                "field": "storename"
            }
        }
    }
}

OpenAPI查询参数设计：

{
    "indexName": "product_list_info",
    "map": {
        "size": 0,
        "query": {
            "constant_score": {
                "filter": {
                    "match": {
                        "threelevel": "手机"
                    }
                }
            }
        },
        "aggs": {
            "result": {
                "cardinality": {
                    "field": "storename"
                }
            }
        }
    }
}

三、多值分析API设计

1、Stats Aggregation

Stats Aggregation，统计聚合。它属于multi-value，基于文档的某个值（可以是特定的数值型字段，也可以通过脚本计算而来），计算出一些统计信息（min、max、sum、count、avg 5个值）

（1）统计所有文档

POST product_list_info/_search
{
    "size": 0,
    "aggs": {
        "result": {
            "stats": {
                "field": "price"
            }
        }
    }
}

返回
"aggregations" : {
	"result" : {
		"count" : 5072447,
		"min" : 0.0,
		"max" : 9.9999999E7,
		"avg" : 920.1537270512633,
		"sum" : 4.66743101232E9

OpenAPI查询参数设计：

{
    "indexName": "product_list_info",
    "map": {
        "size": 0,
        "aggs": {
            "result": {
                "stats": {
                    "field": "price"
                }
            }
        }
    }
}

（2）统计筛选文档

POST product_list_info/_search
{
    "size": 0,
    "query": {
        "constant_score": {
            "filter": {
                "match": {
                    "threelevel": "手机"
                }
            }
        }
    },
    "aggs": {
        "result": {
            "stats": {
                "field": "price"
            }
        }
    }
}

OpenAPI查询参数设计：

{
    "indexName": "product_list_info",
    "map": {
        "size": 0,
        "query": {
            "constant_score": {
                "filter": {
                    "match": {
                        "threelevel": "手机"
                    }
                }
            }
        },
        "aggs": {
            "result": {
                "stats": {
                    "field": "price"
                }
            }
        }
    }
}

2、扩展状态统计

Extended Stats Aggregation，扩展统计聚合。它属于multi-value，比stats多4个统计结果：平方和、方差、标准差、平均值加/减两个标准差的区间

（1）统计所有文档

POST product_list_info/_search
{
	"size": 0,
	"aggs": {
		"result": {
			"extended_stats": {
				"field": "price"
			}
		}
	}
}
返回：
aggregations" : {
	"result" : {
		"count" : 5072447,
		"min" : 0.0,
		"max" : 9.9999999E7,
		"avg" : 920.1537270512633,
		"sum" : 4.66743101232E9,
		"sum_of_squares" : 2.0182209054045464E16,
		"variance" : 3.9779448262354884E9,
		"std_deviation" : 63070.950731977144,
		"std_deviation_bounds" : {
			"upper" : 127062.05519100555,
			"lower" : -125221.74773690302
		}

sum_of_squares:平方和
variance：方差
std_deviation：标准差
std_deviation_bounds：标准差的区间

OpenAPI查询参数设计：

{
    "indexName": "product_list_info",
    "map": {
        "size": 0,
        "aggs": {
            "result": {
                "extended_stats": {
                    "field": "price"
                }
            }
        }
    }
}

（2）统计筛选后的文档

POST product_list_info/_search
{
    "size": 1,
    "query": {
        "constant_score": {
            "filter": {
                "match": {
                    "threelevel": "手机"
                }
            }
        }
    },
    "aggs": {
        "result": {
            "extended_stats": {
                "field": "price"
            }
        }
    }
}

结果;
aggregations" : {
	"result" : {
		"count" : 12402,
		"min" : 0.0,
		"max" : 2474000.0,
		"avg" : 2768.595233833253,
		"sum" : 3.433611809E7,
		"sum_of_squares" : 6.445447222627729E12,
		"variance" : 5.120451870452684E8,
		"std_deviation" : 22628.41547800615,
		"std_deviation_bounds" : {
		"upper" : 48025.42618984555,
		"lower" : -42488.23572217905

sum_of_squares:平方和
variance：方差
std_deviation：标准差
std_deviation_bounds：标准差的区间

OpenAPI查询参数设计：

{
    "indexName": "product_list_info",
    "map": {
        "size": 1,
        "query": {
            "constant_score": {
                "filter": {
                    "match": {
                        "threelevel": "手机"
                    }
                }
            }
        },
        "aggs": {
            "czbk": {
                "extended_stats": {
                    "field": "price"
                }
            }
        }
    }
}

3、百分位度量/百分比统计

Percentiles Aggregation，百分比聚合。它属于multi-value，对指定字段（脚本）的值按从小到大累计每个值对应的文档数的占比（占所有命中文档数的百分比），返回指定占比比例对应的值。默认返回[1, 5, 25, 50, 75, 95, 99 ]分位上的值。

它们表示了人们感兴趣的常用百分位数值。

（1）统计所有文档

POST product_list_info/_search
{
    "size": 0,
    "aggs": {
        "result": {
            "percentiles": {
                "field": "price"
            }
        }
    }
}

返回:
aggregations" : {
	"result" : {
		"values" : {
			"1.0" : 0.0,
			"5.0" : 15.021825109603165,
			"25.0" : 58.669333121791,
			"50.0" : 139.7398105623917,
			"75.0" : 388.2363222057536,
			"95.0" : 3630.78148822216,
			"99.0" : 12561.562823894474
		}
	}

OpenAPI查询参数设计：

{
    "indexName": "product_list_info",
    "map": {
        "size": 0,
        "aggs": {
            "result": {
                "percentiles": {
                    "field": "price"
                }
            }
        }
    }
}

（2）统计筛选后的文档

POST product_list_info/_search
{
    "size": 0,
    "query": {
        "constant_score": {
            "filter": {
                "match": {
                    "threelevel": "手机"
                }
            }
        }
    },
    "aggs": {
        "result": {
            "percentiles": {
                "field": "price"
            }
        }
    }
}

OpenAPI查询参数设计：

{
    "indexName": "product_list_info",
    "map": {
        "size": 0,
        "query": {
            "constant_score": {
                "filter": {
                    "match": {
                        "threelevel": "手机"
                    }
                }
            }
        },
        "aggs": {
            "result": {
                "percentiles": {
                    "field": "price"
                }
            }
        }
    }
}

4、百分位等级/百分比排名聚合

百分比排名聚合：这里有另外一个紧密相关的度量叫 percentile_ranks 。 percentiles 度量告诉我们落在某个百分比以下的所有文档的最小值。

（1）统计所有文档

统计价格在15元之内统计价格在30元之内文档数据占有的百分比

tips：
统计数据会变化
这里的15和30；完全可以理解万SLA的200；比较字段不一样而已

POST product_list_info/_search
{
    "size": 0,
    "aggs": {
        "result": {
            "percentile_ranks": {
                "field": "price",
                "values": [
                    15,
                    30
                ]
            }
        }
    }
}

返回：
价格在15元之内的文档数据占比是4.92%
价格在30元之内的文档数据占比是12.72%
aggregations" : {
	"result" : {
		"values" : {
			"15.0" : 4.92128378837021,
			"30.0" : 12.724827959646579
		}
	}
}

OpenAPI查询参数设计：

{
    "indexName": "product_list_info",
    "map": {
        "size": 0,
        "aggs": {
            "result": {
                "percentile_ranks": {
                    "field": "price",
                    "values": [
                        15,
                        30
                    ]
                }
            }
        }
    }
}

（2）统计过滤后的文档

POST product_list_info/_search
{
    "size": 0,
    "query": {
        "constant_score": {
            "filter": {
                "match": {
                    "threelevel": "手机"
                }
            }
        }
    },
    "aggs": {
        "result": {
            "percentile_ranks": {
                "field": "price",
                "values": [
                    15,
                    30
                ]
            }
        }
    }
}

OpenAPI查询参数设计：

{
    "indexName": "product_list_info",
    "map": {
        "size": 0,
        "query": {
            "constant_score": {
                "filter": {
                    "match": {
                        "threelevel": "手机"
                    }
                }
            }
        },
        "aggs": {
            "result": {
                "percentile_ranks": {
                    "field": "price",
                    "values": [
                        15,
                        30
                    ]
                }
            }
        }
    }
}

四、JavaAPI实现

调用metricAgg方法，传参CommonEntity 。

/*
 * @Description: 指标聚合(Open)
 * @Method: metricAgg
 * @Param: [commonEntity]
 * @Update:
 * @since: 1.0.0
 * @Return: java.util.Map<java.lang.String,java.lang.Long>
 *
 */
public Map<Object, Object> metricAgg(CommonEntity commonEntity) throws Exception {
    //查询公共调用,将参数模板化
    SearchResponse response = getSearchResponse(commonEntity);
    //定义返回数据
    Map<Object, Object> map = new HashMap<Object, Object>();
    // 此处完全可以返回ParsedAggregation ，不用instance，弊端是返回的数据字段多、get的时候需要写死，下面循环map为的是动态获取key
    Map<String, Aggregation> aggregationMap = response.getAggregations().asMap();
    // 将查询出来的数据放到本地局部线程变量中
    SearchTools.setResponseThreadLocal(response);
    //此处循环一次，目的是动态获取client端传来的【result】
    for (Map.Entry<String, Aggregation> m : aggregationMap.entrySet()) {
        //处理指标聚合
        metricResultConverter(map, m);

    }
    //公共数据处理
    mbCommonConverter(map);
    return map;
}
/*
 * @Description: 查询公共调用,参数模板化
 * @Method: getSearchResponse
 * @Param: [commonEntity]
 * @Update:
 * @since: 1.0.0
 * @Return: org.elasticsearch.action.search.SearchResponse
 *
 */
private SearchResponse getSearchResponse(CommonEntity commonEntity) throws Exception {
    //定义查询请求
    SearchRequest searchRequest = new SearchRequest();
    //指定去哪个索引查询
    searchRequest.indices(commonEntity.getIndexName());
    //构建资源查询构建器，主要用于拼接查询条件
    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    //将前端的dsl查询转化为XContentParser
    XContentParser parser = SearchTools.getXContentParser(commonEntity);
    //将parser解析成功查询API
    sourceBuilder.parseXContent(parser);
    //将sourceBuilder赋给searchRequest
    searchRequest.source(sourceBuilder);
    //执行查询
    SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
    return response;
}
/*
 * @Description: 指标聚合结果转化器
 * @Method: metricResultConverter
 * @Param: [map, m]
 * @Update:
 * @since: 1.0.0
 * @Return: void
 *
 */
private void metricResultConverter(Map<Object, Object> map, Map.Entry<String, Aggregation> m) {
    //平均值
    if (m.getValue() instanceof ParsedAvg) {
        map.put("value", ((ParsedAvg) m.getValue()).getValue());
    }
    //最大值
    else if (m.getValue() instanceof ParsedMax) {
        map.put("value", ((ParsedMax) m.getValue()).getValue());
    }
    //最小值
    else if (m.getValue() instanceof ParsedMin) {
        map.put("value", ((ParsedMin) m.getValue()).getValue());
    }
    //求和
    else if (m.getValue() instanceof ParsedSum) {
        map.put("value", ((ParsedSum) m.getValue()).getValue());
    }
    //不重复的值
    else if (m.getValue() instanceof ParsedCardinality) {
        map.put("value", ((ParsedCardinality) m.getValue()).getValue());
    }
    //扩展状态统计
    else if (m.getValue() instanceof ParsedExtendedStats) {
        map.put("count", ((ParsedExtendedStats) m.getValue()).getCount());
        map.put("min", ((ParsedExtendedStats) m.getValue()).getMin());
        map.put("max", ((ParsedExtendedStats) m.getValue()).getMax());
        map.put("avg", ((ParsedExtendedStats) m.getValue()).getAvg());
        map.put("sum", ((ParsedExtendedStats) m.getValue()).getSum());
        map.put("sum_of_squares", ((ParsedExtendedStats) m.getValue()).getSumOfSquares());
        map.put("variance", ((ParsedExtendedStats) m.getValue()).getVariance());
        map.put("std_deviation", ((ParsedExtendedStats) m.getValue()).getStdDeviation());
        map.put("lower", ((ParsedExtendedStats) m.getValue()).getStdDeviationBound(ExtendedStats.Bounds.LOWER));
        map.put("upper", ((ParsedExtendedStats) m.getValue()).getStdDeviationBound(ExtendedStats.Bounds.UPPER));
    }
    //状态统计
    else if (m.getValue() instanceof ParsedStats) {
        map.put("count", ((ParsedStats) m.getValue()).getCount());
        map.put("min", ((ParsedStats) m.getValue()).getMin());
        map.put("max", ((ParsedStats) m.getValue()).getMax());
        map.put("avg", ((ParsedStats) m.getValue()).getAvg());
        map.put("sum", ((ParsedStats) m.getValue()).getSum());
    }

    //百分位等级
    else if (m.getValue() instanceof ParsedTDigestPercentileRanks) {
        for (Iterator<Percentile> iterator = ((ParsedTDigestPercentileRanks) m.getValue()).iterator(); iterator.hasNext(); ) {
            Percentile p = (Percentile) iterator.next();
            map.put(p.getValue(), p.getPercent());
        }
    }
    //百分位度量
    else if (m.getValue() instanceof ParsedTDigestPercentiles) {
        for (Iterator<Percentile> iterator = ((ParsedTDigestPercentiles) m.getValue()).iterator(); iterator.hasNext(); ) {
            Percentile p = (Percentile) iterator.next();
            map.put(p.getPercent(), p.getValue());

        }
    }


}

/*
 * @Description: 公共数据处理(指标聚合、桶聚合)
 * @Method: mbCommonConverter
 * @Param: []
 * @Update:
 * @since: 1.0.0
 * @Return: void
 *
 */
private void mbCommonConverter(Map<Object, Object> map) {
    if (!CollectionUtils.isEmpty(ResponseThreadLocal.get())) {
        //从线程中取出数据
        map.put("list", ResponseThreadLocal.get());
        //清空本地线程局部变量中的数据，防止内存泄露
        ResponseThreadLocal.clear();
    }

}