内置的analyzer

fingerprint

指纹分析器实现了一个指纹算法,OpenRefine项目使用该算法来协助聚类。

内部的流程为


  1. 转换小写
  2. 去掉扩展字符
  3. 排序
  4. 删除重复字符
  5. 删除配置的停止(stop)单词

示例如下

POST _analyze
{
"analyzer": "fingerprint",
"text": "Yes yes, Gödel said this sentence is consistent and."
}
[ and consistent godel is said sentence this yes ]

keyword

关键词分析器,什么事情都没做,直接返回原来的字符串。

POST _analyze
{
"analyzer": "keyword",
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
[ The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. ]

Language

语言分析器,一组旨在分析特定语言文本的分析器。 支持以下类型:阿拉伯语,亚美尼亚语,巴斯克语,孟加拉语,巴西语,保加利亚语,加泰罗尼亚语,cjk,捷克语,丹麦语,荷兰语,英语,爱沙尼亚语,芬兰语,法语,加利西亚语,德语,希腊语,印地语,印地语,匈牙利语,印度尼西亚语,爱尔兰语, 意大利语,拉脱维亚语,立陶宛语,挪威语,波斯语,葡萄牙语,罗马尼亚语,俄语,索拉尼语,西班牙语,瑞典语,土耳其语,泰语。


arabic, armenian, basque, bengali, brazilian, bulgarian, catalan, cjk, czech, danish, dutch, english, estonian, finnish, french, galician, german, greek, hindi, hungarian, indonesian, irish, italian, latvian, lithuanian, norwegian, persian, portuguese, romanian, russian, sorani, spanish, swedish, turkish, thai


pattern

正则分析器,默认正则表达式是​​\W+​

POST _analyze
{
"analyzer": "pattern",
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
[ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]

simple


  • 去掉非字母字符
  • 转换小写

POST _analyze
{
"analyzer": "simple",
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
[ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ]

standard

标准分析器是默认的,如果不指定就是这个。

POST _analyze
{
"analyzer": "standard",
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
[ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog's, bone ]

stop

停止分析器基本是和simple一样的,只是配置上增加了​​stopwords​

POST _analyze
{
"analyzer": "stop",
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
[ quick, brown, foxes, jumped, over, lazy, dog, s, bone ]

whitespace

空格分析器在遇到空格字符时会将文本分解为多个词

POST _analyze
{
"analyzer": "whitespace",
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}
[ The, 2, QUICK, Brown-Foxes, jumped, over, the, lazy, dog's, bone. ]

需要注意的点


  • 分析器默认配置可以直接用在搜索上
  • 如果需要额外的配置比如​​stopwords​​需要自定义分析器
  • 搜索会是使用分析器的处理结果作为查询的条件,这样做相当于自己在搜索之前处理了用户的输入

使用分析器查询

GET _search
{
"query": {
"match": {
"message": {
"query": "a Pose",
"analyzer": "stop"

}
}
}
}

参考资料

​​​​​