Elasticsearch 单字符串多字段查询:Dis Max Query详解
原创
©著作权归作者所有:来自51CTO博客作者ghostwritten的原创作品,请联系作者获取转载授权,否则将追究法律责任
文章目录
1. 单字符串查询
DEMO
PUT /blogs/_doc/1
{
"title": "Quick brown rabbits",
"body": "Brown rabbits are commonly seen."
}
PUT /blogs/_doc/2
{
"title": "Keeping pets healthy",
"body": "My quick brown fox eats rabbits on a regular basis."
}
//查询语句
POST /blogs/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"title": "Brown fox"
}
},
{
"match": {
"body": "Brown fox"
}
}
]
}
}
}
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.90425634,
"hits" : [
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.90425634, // 因为2个字段都有brown
"_source" : {
"title" : "Quick brown rabbits",
"body" : "Brown rabbits are commonly seen."
}
},
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.77041256,
"_source" : {
"title" : "Keeping pets healthy",
"body" : "My quick brown fox eats rabbits on a regular basis."
}
}
]
}
2. 算分过程
- 查询 should 语句中的两个查询
- 加和两个查询的评分
- 乘以匹配语句的总数
- 除以所有语句的总数
结果
3. Disjunction Max Query 查询
上列中,title
和 body
相互竞争
- 不应该将分数简单叠加,而是应该找个单个最佳匹配的字段的评分
Disjunction Max Query
- 将任何与任一查询匹配的文档作为结果返回。采用字段上最匹配的评分返回
POST /blogs/_search
{
"query": {
"dis_max": {
"queries": [
{
"match": {
"title": "Quick fox"
}
},
{
"match": {
"body": "Quick fox"
}
}
]
}
}
}
结果
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.2199391,
"hits" : [
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.2199391,
"_source" : {
"title" : "Keeping pets healthy",
"body" : "My quick brown fox eats rabbits on a regular basis."
}
},
{
"_index" : "blogs",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.6931472,
"_source" : {
"title" : "Quick brown rabbits",
"body" : "Brown rabbits are commonly seen."
}
}
]
}
}
4. 最佳字段查询调优
有一些情况下,同时匹配 title 和 body 字段的文档比只与一个字段匹配的文档的相关度更高
但 disjunction max query 查询指挥简单的使用单个最佳匹配语句的评分_scoce 作为整体评分
5. 通过 Tie Breaker 参数调整
- 获得最佳匹配语句的评分
- 将其他匹配语句的评分 与 tie_breaker 相乘
- 对以上评分求和并规范化
- Tie Breanker 是一个介于 0-1 之间的浮点数。0 代表使用最佳匹配 l;1 代表所有语句同等重要
POST blogs/_search
{
"query": {
"dis_max": {
"queries": [
{ "match": { "title": "Quick pets" }},
{ "match": { "body": "Quick pets" }}
],
"tie_breaker": 0.2
}
}
}
参考资料:
极客时间:Elasticsearch核心技术与实战