es搜索特殊符号搜不到 es搜索特殊符号搜不到了

转载

mob64ca14048514 2024-05-21 19:37:22

文章标签 es搜索特殊符号搜不到 elasticsearch ik 空格加号 文章分类 架构后端开发

再补充一条

自定义不拆分词的时候，里面不能出现“ ”（空格），否则，ik会把它拆分掉，比如你定了了 “蚂蚁搬家”和“蚂蚁搬家” ，其中“蚂蚁搬家”会被拆分，而“蚂蚁搬家”不会被拆分

实际上ik还会对类似“c++”等进行忽略，一个解决方案就是，把这些符号替换为文字，包括创建索引的时候和搜索进行拆词的时候。

后来发现也可以在ik/custom/my.dic里面写入c++，这样c++就不会被忽略，但是依然会被拆分，比如定义“c++服务器开发工程师”为不拆分，依然会被拆分为“c++”和其他词

我在创建索引的时候

索引结构如下：

{
     "C_INDEX_APPKEYWORD": {
         "_all": {
             "analyzer": "ik_syno",
             "search_analyzer": "ik_smart",
             "term_vector": "no",
             "store": "false"
         },
         "properties": {
             "keyword": {
                 "type": "string",
                 "analyzer": "ik_smart",
                 "search_analyzer": "ik_smart"
             }
         }
     }
 }

我在搜索的时候，搜索语句如下

{
   "from" : 0,
   "size" : 1000,
   "query" : {
     "bool" : {
       "must" : [{
         "query_string" : {
           "query" : "网络",
           "fields" : [ "keyword" ],
           "analyzer" : "ik_smart"
         }
       } ]
     }
   }
 }

结果查询的时候_all的查询内容和指定查询"keyword"的内容就是差距很大，经过多方打探终于得到了答案，看上方的红色部分，实际上

ik_max_word

ik_syno_smart= ik_smart

也就是说创建索引的时候，指定了两个不同颗粒度的分词方式，结果是肯定是不一样的。附两份查到的资料和一份官方文档

官方
https:///guide/cn/elasticsearch/guide/current/mapping-analysis.html#mapping-analysis

ik_syno 和 ik_syno_smart 、ik_max_word 和 ik_smart 关系

elasticsearch-analysis-ik分词器

2、配置ik同义词

Elasticsearch 自带一个名为 synonym 的同义词 filter。为了能让 IK 和 synonym 同时工作，我们需要定义新的 analyzer，用 IK 做 tokenizer，synonym 做 filter。听上去很复杂，实际上要做的只是加一段配置。

打开 /config/elasticsearch.yml 文件，加入以下配置：

1. index:  
2.   analysis:  
3.     analyzer:  
4.       ik_syno:  
5.           type: custom  
6.           tokenizer: ik_max_word  
7.           filter: [my_synonym_filter]  
8.       ik_syno_smart:  
9.           type: custom  
10.           tokenizer: ik_smart  
11.           filter: [my_synonym_filter]  
12.     filter:  
13.       my_synonym_filter:  
14.           type: synonym  
15.           synonyms_path: analysis/synonym.txt

以上配置定义了 ik_syno 和 ik_syno_smart 这两个新的 analyzer，分别对应 IK 的 ik_max_word 和 ik_smart 两种分词策略。根据 IK 的文档，二者区别如下：

ik_max_word：会将文本做最细粒度的拆分，例如「中华人民共和国国歌」会被拆分为「中华人民共和国、中华人民、中华、华人、人民共和国、人民、人、民、共和国、共和、和、国国、国歌」，会穷尽各种可能的组合；
ik_smart：会将文本做最粗粒度的拆分，例如「中华人民共和国国歌」会被拆分为「中华人民共和国、国歌」；

ik_syno 和 ik_syno_smart 都会使用 synonym filter 实现同义词转换。

3、创建/config/analysis/synonym.txt 文件，输入一些同义词并存为 utf-8 格式。例如

es搜索特殊符号搜不到 es搜索特殊符号搜不到了_es搜索特殊符号搜不到

到此同义词配置已经完成，重启ES即可，搜索时指定分词为ik_syno或ik_syno_smart。

创建Mapping映射。执行curl命令如下

1. curl -XPOST  http://192.168.1.99:9200/goodsindex/goods/_mapping -d'{  
2.   "goods": {  
3.     "_all": {  
4.       "enabled": true,  
5.       "analyzer": "ik_max_word",  
6.       "search_analyzer": "ik_max_word",  
7.       "term_vector": "no",  
8.       "store": "false"  
9.     },  
10.     "properties": {  
11.       "title": {  
12.         "type": "string",  
13.         "term_vector": "with_positions_offsets",  
14.         "analyzer": "ik_syno",  
15.         "search_analyzer": "ik_syno"  
16.       },  
17.       "content": {  
18.         "type": "string",  
19.         "term_vector": "with_positions_offsets",  
20.         "analyzer": "ik_syno",  
21.         "search_analyzer": "ik_syno"  
22.       },  
23.       "tags": {  
24.         "type": "string",  
25.         "term_vector": "no",  
26.         "analyzer": "ik_syno",  
27.         "search_analyzer": "ik_syno"  
28.       },  
29.       "slug": {  
30.         "type": "string",  
31.         "term_vector": "no"  
32.       },  
33.       "update_date": {  
34.         "type": "date",  
35.         "term_vector": "no",  
36.         "index": "no"  
37.       }  
38.     }  
39.   }  
40. }'

以上代码为 test 索引下的 article 类型指定了字段特征： title 、 content 和 tags 字段使用 ik_syno 做为 analyzer，说明它使用 ik_max_word 做为分词，并且应用 synonym 同义词策略； slug 字段没有指定 analyzer，说明它使用默认分词；而 update_date 字段则不会被索引。

elasticsearch 自定义索引 _all 设置规则

ElasticSearch的_all域

ElasticSearch默认为每个被索引的文档都定义了一个特殊的域 - '_all'，它自动包含被索引文档中一个或者多个域中的内容，在进行搜索时，如果不指明要搜索的文档的域，ElasticSearch则会去搜索_all域。_all带来搜索方便，其代价是增加了系统在索引阶段对CPU和存储空间资源的开销。

默认情况，ElasticSarch自动使用_all所有的文档的域都会被加到_all中进行索引。可以使用"_all" : {"enabled":false} 开关禁用它。如果某个域不希望被加到_all中，可以使用 "include_in_all":false。例如：

1. {  
2.    "person": {  
3.       "_all": { "enabled": true }  
4.       "properties": {  
5.          "name": {  
6.             "type": "object",  
7.             "dynamic": false,  
8.             "properties": {  
9.                "first": {  
10.                   "type": "string",  
11.                   "store": true,  
12.                   "include_in_all": false  
13.                },  
14.                "last": {  
15.                   "type": "string",  
16.                   "index": "not_analyzed"  
17.                }  
18.             }  
19.          },  
20.          "address": {  
21.             "type": "object",  
22.             "include_in_all": false,  
23.             "properties": {  
24.                "first": {  
25.                   "properties": {  
26.                      "location": {  
27.                         "type": "string",  
28.                         "store": true,  
29.                         "index_name": "firstLocation"  
30.                      }  
31.                   }  
32.                },  
33.                "last": {  
34.                   "properties": {  
35.                      "location": {  
36.                         "type": "string"  
37.                      }  
38.                   }  
39.                }  
40.             }  
41.          },  
42.          "simple1": {  
43.             "type": "long",  
44.             "include_in_all": true  
45.          },  
46.          "simple2": {  
47.             "type": "long",  
48.             "include_in_all": false  
49.          }  
50.       }  
51.    }  
52. }

查询时，_all和其它域一样使用：

1. GET /profiles/_search  
2. {  
3.     "query": {  
4.         "match": {  
5.            "_all": "food"  
6.         }  
7.     }  
8. }

或者在不提供搜索域的情况下，默认会搜索_all，例如：

1. GET /profiles/_search  
2. {  
3.     "query": {  
4.         "query_string": {  
5.             "query": "food"  
6.         }  
7.     }  
8. }

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：ESXI 运行Xshell登录 esxi登录不进去

下一篇：lua脚本先set后get拿不到值 lua调用其他lua脚本

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

es搜索特殊符号搜不到 es搜索特殊符号搜不到了

es搜索特殊符号搜不到 es搜索特殊符号搜不到了

51CTO博客