Elasticsearch使用推荐器,给用户的输入推荐相似的内容,其实现原理是:把用户输入的文本转换成token,去索引中查询相似的term进行返回。推荐器模式有三种:missing 当搜索不到时候进行推荐;popular 推荐相似的且出现频率最高的文档中的词;always 总是推荐。推荐器有 term推荐器、phrase推荐器和completion推荐器等,term推荐器:根据输入的term推荐相似的term,默认首字母不匹配不推荐,可以设置prefix_length进行推荐;phrase推荐器:根据输入的phrase推荐相似的phrase,可以设置最大允许出错的词的数量;completion推荐器,进行自动补全,根据用户输入的字符,按前缀进行匹配推荐,它没有使用倒排索引,而是使用一种特殊的数据结构,对analyze数据进行编码,全部加载到内存,性能非常高。在使用completion推荐器时候,需要在mapping中设置字段类型为completion,如果想提升搜索的精准性,可以在mapping的context中指定一个分类类型,在进行索引和搜索时候使用这个类型提升精准性。

三种推荐器比较:

精准性:

completion > phrase > term

召回率:

term > phrase > completion

性能:

completion > phrase >term

初始化数据
DELETE articles
POST articles/_bulk
{ "index" : { } }
{ "body": "lucene is very cool"}
{ "index" : { } }
{ "body": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "body": "Elasticsearch rocks"}
{ "index" : { } }
{ "body": "elastic is the company behind ELK stack "}
{ "index" : { } }
{ "body": "Elk stack rocks"}
{ "index" : {} }
{  "body": "elasticsearch is rock solid"}

term 推荐器,类型为只有不存在才推荐,lucen 推荐 lucene,rock没有推荐
post articles/_search
{
  "size":1,
  "query":{
    "match": {
      "body": "lucen rock"
    }
  },
  "suggest":{
    "term-suggestion":{
      "text":"lucen rock",
      "term":{
        "suggest_mode":"missing",
        "field":"body"
      }
    }
  }
}

term 推荐器类型设置为popular,rock也给出了推荐出文档出现最多的rocks
post articles/_search
{
  "suggest":{
    "term-suggestion":{
      "text":"lucen rock",
      "term":{
        "suggest_mode":"popular",
        "field":"body"
      }
    }
  }
}

term 推荐器 类型设置为always 无论是否存在都推荐
post articles/_search
{
  "suggest":{
    "term-suggestion":{
      "text":"lucen rock",
      "term":{
        "suggest_mode":"always",
        "field":"body"
      }
    }
  }
}

和上面一致,查询hocks 默认首字母不一致不把hocks推荐为rocks 没有被推荐为rocks
post articles/_search
{
  "suggest":{
    "term-suggestion":{
      "text":"lucen hocks",
      "term":{
        "suggest_mode":"always",
        "field":"body"
      }
    }
  }
}

和上面一致 查询hocks出现了推荐,prefix_length 首字母不匹配也会推荐,当有多个推荐词时默认是按acore排序,指定sort 会按推荐词的频率排序
post articles/_search
{
  "suggest":{
    "term-suggestion":{
      "text":"lucen hocks",
      "term":{
        "suggest_mode":"always",
        "field":"body",
        "prefix_length":0,
        "sort":"frequency"
      }
    }
  }
}

phrase 推荐器,最多允许2个词错误,confidence影响返回数量,值越大返回越少
post articles/_search
{
  "suggest":{
    "my-suggestion":{
      "text":"lucne and elasticsear rock hello world",
      "phrase":{
        "field":"body",
        "max_errors":2,
        "confidence":2,
        "direct_generator":[{
          "suggest_mode":"always",
          "field":"body"
        }],
        "highlight":{
          "pre_tag": "<em>",
          "post_tag": "</em>" 
        }
      }
    }
  }
}

completion推荐器,进行自动补全:

设置索引的字段类型为completion
DELETE articles
PUT articles
{
  "mappings": {
    "properties": {
      "title_completion":{
        "type": "completion"
      }
    }
  }
}

初始化数据
POST articles/_bulk
{ "index" : { } }
{ "title_completion": "lucene is very cool"}
{ "index" : { } }
{ "title_completion": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "title_completion": "Elasticsearch rocks"}
{ "index" : { } }
{ "title_completion": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "title_completion": "Elk stack rocks"}
{ "index" : {} }

前缀为elk 自动补全
post articles/_search
{
  "size":0,
  "suggest":{
    "article-suggestr":{
      "prefix":"elk",
      "completion":{
        "field":"title_completion"
      }
    }
  }
}

给字段comment_autocomplete 指定类型为自动补全 并添加上下文类型
delete comments
put comments
put comments/_mapping
{
  "properties":{
    "comment_autocomplete":{
      "type":"completion",
      "contexts":[{
          "type":"category",
          "name":"comment_category"
        }]
    }
  }
}

初始化数据 指定上下文类型为movies
POST comments/_doc
{
  "comment":"I love the star war movies",
  "comment_autocomplete":{
    "input":["star wars"],
    "contexts":{
      "comment_category":"movies"
    }
  }
}
初始化数据 指定上下文类型为coffee
POST comments/_doc
{
  "comment":"Where can I find a Starbucks",
  "comment_autocomplete":{
    "input":["starbucks"],
    "contexts":{
      "comment_category":"coffee"
    }
  }
}

按前缀自动补全 指定类型为coffee
post comments/_search
{
  "suggest":{
    "my-suggestion":{
      "prefix":"sta",
      "completion":{
        "field":"comment_autocomplete",
        "contexts":{
          "comment_category":"coffee"
        }
      }
    }
  }
}