Elasticsearch 单字符串多字段查询: multi_match详解

原创

ghostwritten 2022-07-10 00:44:35 博主文章分类：elk ©著作权

文章标签 elasticsearch es 字段 analyzer 文章分类 虚拟化云计算

©著作权归作者所有：来自51CTO博客作者ghostwritten的原创作品，请联系作者获取转载授权，否则将追究法律责任

文章目录

1. 三种场景
2. Multi Match Query
3. 查询案例
4. 重新设置 mapping
5. 使用多字段匹配解决
6. 跨字段搜索

1. 三种场景

最佳字段（Best Fields）
当字段之间相互竞争，又相互关联。例如 title 和 body 这样的字段，评分来自最匹配字段
多数字段（Most Fields）
处理英文内容时：一种常见的手段是，在主字段（English
Analyzer），抽取词干，加入同义词，以匹配更多的文档。相同的文本，加入子字段（Standard Analyzer），以提供更加精确的匹配。其他字段作为匹配文档提高性相关度的信号。匹配字段越多越好
混合字段（Cross Field）
对于某些实体，例如人名，地址，图书信息。需要在多个字段中确定信息，单个字段只能作为整体的一部分。希望在任何这些列出的字段中尽可能找出多的词

2. Multi Match Query

Best Fields 是默认类型，可不指定
Minimum should match 等参数可以传递到生成的 query 中

POST blogs/_search
{
  "query": {
    "multi_match": {
      "type": "best_fields",
      "query": "Quick pets",
      "fields": ["title","body"],
      "tie_breaker": 0.2,
      "minimum_should_match": "20%"
    }
  }
}

3. 查询案例

PUT /titles
{
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "analyzer": "english"
      }
    }
  }
}

POST titles/_bulk
{"index":{"_id":1}}
{"title":"My dog barks"}
{"index":{"_id":2}}
{"title":"I see a lot of barking dogs on the road "}

GET titles/_search
{
  "query": {
    "match": {
      "title": "barking dogs"
    }
  }
}
//结果 因为是english 分词 ，且短 则 id 排第一个
"hits" : [
      {
        "_index" : "titles",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.24399278,
        "_source" : {
          "title" : "My dog barks"
        }
      },
      {
        "_index" : "titles",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.1854345,
        "_source" : {
          "title" : "I see a lot of barking dogs on the road "
        }
      }
    ]

4. 重新设置 mapping

DELETE titles

PUT /titles
{
  "mappings": {
    "properties": {
      "title":{
        "type": "text",
        "analyzer": "english",
        "fields": {
          "std":{
            "type":"text",
            "analyzer":"standard"
          }
        }
      }
    }
  }
}
POST titles/_bulk
{"index":{"_id":1}}
{"title":"My dog barks"}
{"index":{"_id":2}}
{"title":"I see a lot of barking dogs on the road "}
//multi_match 查询
GET titles/_search
{
  "query": {
    "multi_match": {
      "query": "barking dogs",
      "type": "most_fields", //默认是best_fields
      "fields": ["title","title.std"]//累计叠加
    }
  }
}
//返回
"hits" : [
      {
        "_index" : "titles",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.4569323,
        "_source" : {
          "title" : "I see a lot of barking dogs on the road "
        }
      },
      {
        "_index" : "titles",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.42221838,
        "_source" : {
          "title" : "My dog barks"
        }
      }
    ]

5. 使用多字段匹配解决

用广度匹配字段 title 包括尽可能多的文档 - 以提高召回率，同时又使用字段 title.std
作为信息将相关度更高的文档结至于文档顶部
每个字段对于最终评分的贡献可以通过自定义值 boost 来控制。比如，使 title 字段更为重要，这样同时也降低了其他信号字段的作用

GET titles/_search
{
  "query": {
    "multi_match": {
      "query": "barking dogs",
      "type": "most_fields", 
      "fields": ["title^10","title.std"]
    }
  }
}

结果

{
  "took" : 6,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 4.344906,
    "hits" : [
      {
        "_index" : "titles",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 4.344906,
        "_source" : {
          "title" : "I see a lot of barking dogs on the road "
        }
      },
      {
        "_index" : "titles",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 4.2221837,
        "_source" : {
          "title" : "My dog barks"
        }
      }
    ]
  }
}

6. 跨字段搜索

most_fields 无法使用 opeartor
可以用copy_to 解决，但是需要额外的储存空间
cross_fields 可以支持 operator
与copy_to 相比，其中一个优势就是可以在搜索时为某个字段提升权重

PUT address/_doc/1
{
  "street":"5 Poland Street",
  "city" : "Lodon",
  "country":"United Kingdom",
  "postcode" : "W1V 3DG"
}

POST address/_search
{
  "query":{
    "multi_match": {
      "query": "Poland Street W1V",
      "type": "cross_fields",  //most_fields查询为空
      "operator": "and", 
      "fields": ["street","city","country","postcode"]
    }
  }
}
"hits" : [
      {
        "_index" : "address",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.8630463,
        "_source" : {
          "street" : "5 Poland Street",
          "city" : "Lodon",
          "country" : "United Kingdom",
          "postcode" : "W1V 3DG"
        }
      }
    ]