Elasticsearch使用推荐器,给用户的输入推荐相似的内容,其实现原理是:把用户输入的文本转换成token,去索引中查询相似的term进行返回。推荐器模式有三种:missing 当搜索不到时候进行推荐;popular 推荐相似的且出现频率最高的文档中的词;always 总是推荐。推荐器有 term推荐器、phrase推荐器和completion推荐器等,term推荐器:根据输入的term推荐相似的term,默认首字母不匹配不推荐,可以设置prefix_length进行推荐;phrase推荐器:根据输入的phrase推荐相似的phrase,可以设置最大允许出错的词的数量;completion推荐器,进行自动补全,根据用户输入的字符,按前缀进行匹配推荐,它没有使用倒排索引,而是使用一种特殊的数据结构,对analyze数据进行编码,全部加载到内存,性能非常高。在使用completion推荐器时候,需要在mapping中设置字段类型为completion,如果想提升搜索的精准性,可以在mapping的context中指定一个分类类型,在进行索引和搜索时候使用这个类型提升精准性。
三种推荐器比较:
精准性:
completion > phrase > term
召回率:
term > phrase > completion
性能:
completion > phrase >term
初始化数据
DELETE articles
POST articles/_bulk
{ "index" : { } }
{ "body": "lucene is very cool"}
{ "index" : { } }
{ "body": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "body": "Elasticsearch rocks"}
{ "index" : { } }
{ "body": "elastic is the company behind ELK stack "}
{ "index" : { } }
{ "body": "Elk stack rocks"}
{ "index" : {} }
{ "body": "elasticsearch is rock solid"}
term 推荐器,类型为只有不存在才推荐,lucen 推荐 lucene,rock没有推荐
post articles/_search
{
"size":1,
"query":{
"match": {
"body": "lucen rock"
}
},
"suggest":{
"term-suggestion":{
"text":"lucen rock",
"term":{
"suggest_mode":"missing",
"field":"body"
}
}
}
}
term 推荐器类型设置为popular,rock也给出了推荐出文档出现最多的rocks
post articles/_search
{
"suggest":{
"term-suggestion":{
"text":"lucen rock",
"term":{
"suggest_mode":"popular",
"field":"body"
}
}
}
}
term 推荐器 类型设置为always 无论是否存在都推荐
post articles/_search
{
"suggest":{
"term-suggestion":{
"text":"lucen rock",
"term":{
"suggest_mode":"always",
"field":"body"
}
}
}
}
和上面一致,查询hocks 默认首字母不一致不把hocks推荐为rocks 没有被推荐为rocks
post articles/_search
{
"suggest":{
"term-suggestion":{
"text":"lucen hocks",
"term":{
"suggest_mode":"always",
"field":"body"
}
}
}
}
和上面一致 查询hocks出现了推荐,prefix_length 首字母不匹配也会推荐,当有多个推荐词时默认是按acore排序,指定sort 会按推荐词的频率排序
post articles/_search
{
"suggest":{
"term-suggestion":{
"text":"lucen hocks",
"term":{
"suggest_mode":"always",
"field":"body",
"prefix_length":0,
"sort":"frequency"
}
}
}
}
phrase 推荐器,最多允许2个词错误,confidence影响返回数量,值越大返回越少
post articles/_search
{
"suggest":{
"my-suggestion":{
"text":"lucne and elasticsear rock hello world",
"phrase":{
"field":"body",
"max_errors":2,
"confidence":2,
"direct_generator":[{
"suggest_mode":"always",
"field":"body"
}],
"highlight":{
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}
completion推荐器,进行自动补全:
设置索引的字段类型为completion
DELETE articles
PUT articles
{
"mappings": {
"properties": {
"title_completion":{
"type": "completion"
}
}
}
}
初始化数据
POST articles/_bulk
{ "index" : { } }
{ "title_completion": "lucene is very cool"}
{ "index" : { } }
{ "title_completion": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "title_completion": "Elasticsearch rocks"}
{ "index" : { } }
{ "title_completion": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "title_completion": "Elk stack rocks"}
{ "index" : {} }
前缀为elk 自动补全
post articles/_search
{
"size":0,
"suggest":{
"article-suggestr":{
"prefix":"elk",
"completion":{
"field":"title_completion"
}
}
}
}
给字段comment_autocomplete 指定类型为自动补全 并添加上下文类型
delete comments
put comments
put comments/_mapping
{
"properties":{
"comment_autocomplete":{
"type":"completion",
"contexts":[{
"type":"category",
"name":"comment_category"
}]
}
}
}
初始化数据 指定上下文类型为movies
POST comments/_doc
{
"comment":"I love the star war movies",
"comment_autocomplete":{
"input":["star wars"],
"contexts":{
"comment_category":"movies"
}
}
}
初始化数据 指定上下文类型为coffee
POST comments/_doc
{
"comment":"Where can I find a Starbucks",
"comment_autocomplete":{
"input":["starbucks"],
"contexts":{
"comment_category":"coffee"
}
}
}
按前缀自动补全 指定类型为coffee
post comments/_search
{
"suggest":{
"my-suggestion":{
"prefix":"sta",
"completion":{
"field":"comment_autocomplete",
"contexts":{
"comment_category":"coffee"
}
}
}
}
}