ElasticSearch中profile API的使用

转载

mb5fdcad0be2e90 2019-05-08 17:00:00

文章标签 通用实践全文搜索 elasticsearch 性能优化 lucene 文章分类 运维

1. 前言

profile API 是 Elasticsearch 5.x 的一个新接口。通过这个功能，可以看到一个搜索聚合请求，是如何拆分成底层的 Lucene 请求，并且显示每部分的耗时情况。

2. profile API 使用

可以通过在 query 部分上方提供 “profile: true” 来启用Profile API。

GET /ljjtest/book/_search
{
  "profile":"true",
  "query":{
    "match":{
      "author":"鲁迅"
    }
  }
}

3. profile API响应说明

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.3728157,
    "hits": [ ... ]
  },
  "profile": {
    "shards": [
      {
        "id": "[0mFoaNASRaGO050a_a28gA][ljjtest][0]",
        "searches": [
          {
            "query": [
              {
                "type": "BooleanQuery",
                "description": "author:鲁 author:迅",
                "time": "0.5203070000ms",
                "time_in_nanos": 520307,
                "breakdown": {
                  "score": 18400,
                  "build_scorer_count": 1,
                  "match_count": 0,
                  "create_weight": 213200,
                  "next_doc": 28200,
                  "match": 0,
                  "create_weight_count": 1,
                  "next_doc_count": 3,
                  "score_count": 2,
                  "build_scorer": 260500,
                  "advance": 0,
                  "advance_count": 0
                },
                "children": [
                  {
                    "type": "TermQuery",
                    "description": "author:鲁",
                    "time": "0.3040070000ms",
                    "time_in_nanos": 304007,
                    "breakdown": {
                      "score": 9100,
                      "build_scorer_count": 1,
                      "match_count": 0,
                      "create_weight": 118200,
                      "next_doc": 14500,
                      "match": 0,
                      "create_weight_count": 1,
                      "next_doc_count": 3,
                      "score_count": 2,
                      "build_scorer": 162200,
                      "advance": 0,
                      "advance_count": 0
                    }
                  },
                  {
                    "type": "TermQuery",
                    "description": "author:迅",
                    "time": "0.1005070000ms",
                    "time_in_nanos": 100507,
                    "breakdown": {
                      "score": 2600,
                      "build_scorer_count": 1,
                      "match_count": 0,
                      "create_weight": 63500,
                      "next_doc": 2200,
                      "match": 0,
                      "create_weight_count": 1,
                      "next_doc_count": 3,
                      "score_count": 2,
                      "build_scorer": 32200,
                      "advance": 0,
                      "advance_count": 0
                    }
                  }
                ]
              }
            ],
            "rewrite_time": 327100,
            "collector": [
              {
                "name": "CancellableCollector",
                "reason": "search_cancelled",
                "time": "0.04830000000ms",
                "time_in_nanos": 48300,
                "children": [
                  {
                    "name": "SimpleTopScoreDocCollector",
                    "reason": "search_top_hits",
                    "time": "0.03680000000ms",
                    "time_in_nanos": 36800
                  }
                ]
              }
            ]
          }
        ],
        "aggregations": []
      }
    ]
  }
}

Profile API响应说明：

上面的响应显示的是单个分片。每个分片都被分配一个唯一的ID，ID的格式是[nodeID][indexName][shardID]。现在在"shards"数组里还有另外三个元素，它们是：

query
rewrrite_time
collector

Query

Query 段由构成Query的元素以及它们的时间信息组成。Profile API结果中Query 部分的基本组成是：

type —— 它向我们显示了哪种类型的查询被触发。此处是布尔值。因为多个关键字匹配查询被分成两个布尔查询。
description —— 该字段显示启动查询的lucene方法。这里是 "author:鲁 author:迅"
time —— lucene 执行此查询所用的时间。单位是毫秒。
time_in_nanos —— lucene 执行此查询所用的时间。单位是微秒。
breakdown —— 有关查询的更详细的细节，主要与lucene参数有关。
children —— 具有多个关键字的查询被拆分成相应术语的布尔查询，每个查询都作为单独的查询来执行。每个子查询的详细信息将填充到Profile API输出的子段中。在上面的章节中，可以看到第一个子元素查询是"鲁"，下面给出查询时间和其他breakdown参数等详细信息。同样，对于第二个关键字，有一个名为"迅"的子元素具有与其兄弟相同的信息。从查询中的子段中，我们可以得到关于哪个搜索项在总体搜索中造成最大延迟的信息。

Rewrite Time

由于多个关键字会分解以创建个别查询，所以在这个过程中肯定会花费一些时间。将查询重写一个或多个组合查询的时间被称为“重写时间”。(以纳秒为单位)。

Collectors

在Lucene中，收集器是负责收集原始结果，收集和组合结果，执行结果排序等的过程。例如，在上面的执行的查询中，当查询语句中给出size:0时，使用的收集器是"totalHitCountCollector"。这只返回搜索结果的数量（search_count），不返回文档。此外，收集者所用的时间也一起给出了。