Elasticsearch集群监控指标

原创

食品安全辛吉飞 2023-10-31 17:25:23 博主文章分类：运维 ©著作权

文章标签 Elastic 监控 elasticsearch 文章分类 运维 指尖人生

©著作权归作者所有：来自51CTO博客作者食品安全辛吉飞的原创作品，请联系作者获取转载授权，否则将追究法律责任

前言

Elasticsearch 集群在运行的过程中，由于各种原因，经常会出现健康问题。比较直观的是：kibana监控、head插件监控显示集群非绿色（红色或者黄色）。

Elasticsearch集群监控指标_Elastic

或者通过命令查看健康状态：

http://192.168.2.135:9200/_cluster/health?pretty#

Elasticsearch集群监控状态指标解读

Elasticsearch集群监控状态指标分三个级别

集群级别：集群级别的监控主要是针对整个Elasticsearch集群来说，包括集群的健康状况、集群的状态等。
节点级别：节点级别的监控主要是针对每个Elasticsearch实例的监控，其中包括每个实例的查询索引指标和物理资源使用指标。
索引级别：索引级别的监控主要是针对每个索引来说，主要包括每个索引的性能指标。

Elasticsearch集群级别监控指标

查看集群健康状态(GET _cluster/health)

http://192.168.2.135:9200/_cluster/health?pretty#
{
  "cluster_name" : "elasticsearch",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 8,
  "number_of_data_nodes" : 8,
  "active_primary_shards" : 373,
  "active_shards" : 752,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

关键指标说明：

- status：集群状态，分为green(每个索引的primary shard 和replica 都是active状态，ES集群正常)、yellow（每个索引的primary shard 是 active状态，但是部分的replica shard 不是active,ES集群可以正常使用）、red（不是所有索引的primary shard 都是active状态，部分索引数据丢失，集群不可用）
- number_of_nodes/number_of_data_nodes：集群的节点数和数据节点数。
- active_primary_shards：集群中所有活跃的主分片数。
- active_shards：集群中所有活跃的分片数。
- relocating_shards：当前节点迁往其他节点的分片数量，通常为0，当有节点加入或者退出时该值会增加。
- initializing_shards：正在初始化的分片。
- unassigned_shards：未分配的分片数，通常为0，当有某个节点的副本分片丢失该值就会增加。
- number_of_pending_tasks：是指主节点创建索引并分配shards等任务。
- active_shards_percent_as_number：集群分片健康度，活跃分片数占总分片数比例。

查看集群状态信息 (GET _cluster/stats?pretty)
集群状态信息，整个集群的一些统计信息，例如文档数、分片数、资源使用情况等信息。

http:// 192.168.2.135:9200/_cluster/stats?pretty
显示信息较多，主要有indices、docs、store、query_cache、segments、nodes、mem、process、jvm、fs等信息。

关键指标说明：

- indices.count：索引总数。 
- indices.shards.total：分片总数。 
- indices.shards.primaries：主分片数量。 
- docs.count：文档总数。 
- store.size_in_bytes：数据总存储容量。 
- segments.count：段总数。 
- nodes.count.total：总节点数。 
- nodes.count.data：数据节点数。 
- nodes. process. cpu.percent：节点CPU使用率。 
- fs.total_in_bytes：文件系统使用总容量。 
- fs.free_in_bytes：文件系统剩余总容量

Elasticsearch节点级别监控指标

节点监控, 即node线程组等状态 (GET _nodes/stats?pretty)

http://9.16.108.37:9200/_nodes/stats?pretty
输出信息较多部分省略
indices: {
docs: {
count: 0,
deleted: 0
},
store: {
size_in_bytes: 0
},
indexing: {
index_total: 0,
index_time_in_millis: 0,
index_current: 0,
index_failed: 0,
delete_total: 0,
delete_time_in_millis: 0,
delete_current: 0,
noop_update_total: 0,
is_throttled: false,
throttle_time_in_millis: 0
},
…………
search: {
open_contexts: 0,
query_total: 0,
query_time_in_millis: 0,
query_current: 0,
fetch_total: 0,
fetch_time_in_millis: 0,
fetch_current: 0,
scroll_total: 0,
scroll_time_in_millis: 0,
scroll_current: 0,
suggest_total: 0,
suggest_time_in_millis: 0,
suggest_current: 0
},
merges: {
current: 0,
current_docs: 0,
current_size_in_bytes: 0,
total: 0,
total_time_in_millis: 0,
total_docs: 0,
total_size_in_bytes: 0,
total_stopped_time_in_millis: 0,
total_throttled_time_in_millis: 0,
total_auto_throttle_in_bytes: 0
},
…………
fielddata: {
memory_size_in_bytes: 0,
evictions: 0
},
…………
segments: {
count: 0,
memory_in_bytes: 0,
terms_memory_in_bytes: 0,
stored_fields_memory_in_bytes: 0,
term_vectors_memory_in_bytes: 0,
norms_memory_in_bytes: 0,
points_memory_in_bytes: 0,
doc_values_memory_in_bytes: 0,
index_writer_memory_in_bytes: 0,
version_map_memory_in_bytes: 0,
fixed_bit_set_memory_in_bytes: 0,
max_unsafe_auto_id_timestamp: -9223372036854776000,
file_sizes: { }
},
translog: {
operations: 0,
size_in_bytes: 0,
uncommitted_operations: 0,
uncommitted_size_in_bytes: 0,
earliest_last_modified_age: 0
},
request_cache: {
memory_size_in_bytes: 0,
evictions: 0,
hit_count: 0,
miss_count: 0
},
recovery: {
current_as_source: 0,
current_as_target: 0,
throttle_time_in_millis: 0
}
},
os: {
timestamp: 1697597490543,
cpu: {
percent: 4,
load_average: {
1m: 0.35,
5m: 0.46,
15m: 0.58
}
},
mem: {
total_in_bytes: 33566769152,
free_in_bytes: 9410244608,
used_in_bytes: 24156524544,
free_percent: 28,
used_percent: 72
},
swap: {
total_in_bytes: 12880703488,
free_in_bytes: 10536681472,
used_in_bytes: 2344022016
},
…………
cpu: {
control_group: "/",
cfs_period_micros: 100000,
cfs_quota_micros: -1,
…………
memory: {
control_group: "/",
limit_in_bytes: "9223372036854771712",
usage_in_bytes: "22675898368"
}
}
},
process: {
timestamp: 1697597490544,
open_file_descriptors: 2096,
max_file_descriptors: 131072,
cpu: {
percent: 0,
total_in_millis: 772163670
},
mem: {
total_virtual_in_bytes: 27340054528
}
},
jvm: {
timestamp: 1697597490546,
uptime_in_millis: 1472206810,
mem: {
heap_used_in_bytes: 8560137888,
heap_used_percent: 53,
heap_committed_in_bytes: 15992750080,
heap_max_in_bytes: 15992750080,
non_heap_used_in_bytes: 148567352,
non_heap_committed_in_bytes: 162770944,
pools: {
young: {
used_in_bytes: 494578408,
max_in_bytes: 907345920,
peak_used_in_bytes: 907345920,
peak_max_in_bytes: 907345920
},
survivor: {
used_in_bytes: 10615936,
max_in_bytes: 113377280,
peak_used_in_bytes: 113377280,
peak_max_in_bytes: 113377280
},
old: {
used_in_bytes: 8054943544,
max_in_bytes: 14972026880,
peak_used_in_bytes: 11245893752,
peak_max_in_bytes: 14972026880
}
}
},
threads: {
count: 139,
peak_count: 258
},
gc: {
collectors: {
young: {
collection_count: 21985,
collection_time_in_millis: 1527935
},
old: {
collection_count: 6,
collection_time_in_millis: 56776
}
}
},
…………
thread_pool: {
analyze: {
threads: 0,
queue: 0,
active: 0,
rejected: 0,
largest: 0,
completed: 0
},
…………
fs: {
timestamp: 1697597490546,
total: {
total_in_bytes: 322066976768,
free_in_bytes: 319637000192,
available_in_bytes: 319637000192
},
data: [
{
path: "/data/elasticsearch/data/nodes/0",
mount: "/data (/dev/mapper/vg--data-lv--data)",
type: "xfs",
total_in_bytes: 322066976768,
free_in_bytes: 319637000192,
available_in_bytes: 319637000192
}
],
io_stats: {
devices: [
{
device_name: "dm-2",
operations: 203729,
read_operations: 163,
write_operations: 203566,
read_kilobytes: 2788,
write_kilobytes: 1547381
}
],
…………

关键指标说明：

- indices.docs.count：索引文档数。 
- segments.count：段总数。 
- jvm.heap_used_percent：内存使用百分比。 
- thread_pool.{bulk, index, get, search}.{active, queue, rejected}：线程池的一些信息，包括bulk、index、get和search线程池，主要指标有active（激活）线程数，线程queue（队列）数和rejected（拒绝）线程数量。 
 
以下一些指标是一个累加值，当节点重启之后会清零 
- indices.indexing.index_total：索引文档数。 
- indices.indexing.index_time_in_millis：索引总耗时。 
- indices.get.total：get请求数。 
- indices.get.time_in_millis：get请求总耗时。 
- indices.search.query_total：search总请求数。 
- indices.search.query_time_in_millis：search请求总耗时。 
- indices.search.fetch_total：fetch操作总数量，即提取总数。  
- indices.search.fetch_time_in_millis：fetch请求总耗时，即花费在提取上的总时间。 
- jvm.gc.collectors.young.collection_count：年轻代垃圾回收次数。 
- jvm.gc.collectors.young.collection_time_in_millis：年轻代垃圾回收总耗时。 
- jvm.gc.collectors.old.collection_count：老年代垃圾回收次数。 
- jvm.gc.collectors.old.collection_time_in_millis：老年代垃圾回收总耗时。

Elasticsearch索引级别监控指标

可以查看所有index的相关信息 (GET _stats)

http://9.16.108.37:9200/_stats
输出信息较多，此处省略

关键指标说明：

- indexname.primaries.docs.count：索引文档数量。

以下一些指标是一个累加值，当节点重启之后会清零 
- indexname.primaries.indexing.index_total：索引文档数，即索引的总文件数。 
- indexname.primaries.indexing.index_time_in_millis：索引总耗时，即索引文档的总时间数。 
- indexname.primaries.get.total：get请求数。 
- indexname.primaries.get.time_in_millis：get请求总耗时。 
- indexname.primaries.search.query_total：search总请求数。 
- indexname.primaries.search.query_time_in_millis：search请求总耗时。
- indices.search.fetch_total：fetch操作总数量。 
- indexname.primaries.search.fetch_time_in_millis：fetch请求总耗时。 
- indexname.primaries.refresh.total：refresh请求总量。 
- indexname.primaries.refresh.total_time_in_millis：refresh请求总耗时。 
- indexname.primaries.flush.total：flush请求总量。 
- indexname.primaries.flush.total_time_in_millis：flush请求总耗时。