如何查询es本地磁盘使用率 elasticsearch 磁盘空间

转载

网络安全战士 2024-05-20 21:34:17

文章标签 如何查询es本地磁盘使用率 elasticsearch 故障磁盘配置文件 文章分类 架构后端开发

昨天接到客户投诉说ES数据目录已达到100%了，当时第一想法监控组怎么监控的80%报警为什么没有监控出来，经问监测组是几天前接到报警，并忽略了，我当时那个崩溃，并有想骂人的冲动，这是现网，不是测试时环境，竟然有这么不负责的同事我也是醉的，我登录服务器一看磁盘空间最糟的100%，另几个99%，刚搭建两个月的集群出现空间不足，当时说没有这么大的数据量，我自己给自己挖了一个大坑，把数据存在一个磁盘上。

当时能想到的两个方案：

方案一把其他磁盘做一个大的LVM卷组，之后做一个大的盘，好处处理方便，但是缺点会导致数据集中存储，会影响后续的插入性能和查询性能，当时就把此方案作为万不得已的方案了。

方案二：通过配置path.logs 配置多个磁盘，但是由于对ES了解不是很深，紧会搭建和简单的使用，此方案在我理解的层面上应该是可行的，但是客户说给出可行的证据，无奈只能在搭建一个ES集群进行测试，罗列证据了。（最终是用该方案解决的）

1、停止上层应用、ES 集群并停止定时计划

2、备份已有的ES数据到新的磁盘里下（1个小时 500G）---为了回退做准备

3、修改ES配置文件，path.logs配置多个磁盘

4、ES自动同步数据文件到其他磁盘空间（11个小时完成）

测试步骤

测试1：只启动一个节点，ES配置文件里配置路径只写一个目录

/data1 ，向ES中添加数据，ES数据只存在该节点上指定目录下

客户端添加一条索引

curl -XPUT 'datanode10:9200/customer/external/1?pretty' -d '
  {
    "name": "John Doe"
  }'

通过head插件看到的数据如下：

如何查询es本地磁盘使用率 elasticsearch 磁盘空间_配置文件

结论是：分片都在一台机器上，且分片状态是UNASSIGNED

测试2：只启动一个节点，修改ES配置文件里的数据目录位两个磁盘 /data01， /data02 ，启动ES，ES数据会自动同步两个数据目录下，后有添加一个新的索引，之后ES会自动平衡2个数据目录下。

图还是如上图，只不过数据是分片存储在两个机器上

测试datanode10 放两个目录/data01,/data02目录下：看如下目录结构记清楚

[root@datanode10 nodes]# tree  /data01/es5/data/nodes/0/indices/ 
/data01/es5/data/nodes/0/indices/
├── _2nXGRdBQWqEtALngH_xaQ

│   ├──  0
│   │  ├──index
│   │  │  ├── segments_2
│   │  │  └── write.lock
│   │  ├──_state
│   │  │  └── state-1.st
│   │  └──translog
│   │       ├──translog-1.ckp
│   │       ├──translog-1.tlog
│   │       ├──translog-2.tlog
│   │       └──translog.ckp
│   ├──  1
│   │  ├──index
│   │  │  ├── segments_2
│   │  │  └── write.lock
│   │  ├──_state
│   │  │  └── state-1.st
│   │  └──translog
│   │       ├──translog-1.ckp
│   │       ├──translog-1.tlog
│   │       ├──translog-2.tlog
│   │       └──translog.ckp
│   ├──  2
│   │  ├──index
│   │  │  ├── segments_2
│   │  │  └── write.lock
│   │  ├──_state
│   │  │  └── state-1.st
│   │  └──translog
│   │       ├──translog-1.ckp
│   │       ├──translog-1.tlog
│   │       ├──translog-2.tlog
│   │       └──translog.ckp
│   ├──  3
│   │  ├──index
│   │  │  ├── _0.cfe
│   │  │  ├── _0.cfs
│   │  │  ├── _0.si
│   │  │  ├── segments_4
│   │  │  └── write.lock
│   │  ├──_state
│   │  │  └── state-1.st
│   │  └──translog
│   │       ├──translog-2.ckp
│   │       ├──translog-2.tlog
│   │       ├──translog-3.tlog
│   │       └──translog.ckp
│   ├──  4
│   │  ├──index
│   │  │  ├── segments_2
│   │  │  └── write.lock
│   │  ├──_state
│   │  │  └── state-1.st
│   │  └──translog
│   │       ├──translog-1.ckp
│   │       ├──translog-1.tlog
│   │       ├──translog-2.tlog
│   │       └──translog.ckp
│   └──  _state
│       └──state-12.st
└── ZytkImACR4CkB53vG4XyNw
0
   ├──index
   │  ├──segments_2
   │  └──write.lock
   ├──_state
   │  └──state-0.st
   └──translog
       ├──translog-1.tlog
       └──translog.ckp
3
   ├──index
   │  ├──_0.cfe
   │  ├──_0.cfs
   │  ├──_0.si
   │  ├──segments_1
   │  └──write.lock
   ├──_state
   │  └──state-0.st
   └──translog
       ├──translog-1.tlog
       └──translog.ckp
_state
state-7.st
 
32 directories, 53 files
[root@datanode10 nodes]# 
[root@datanode10 nodes]# tree  /data02/es5/data/nodes/0/indices/ 
/data02/es5/data/nodes/0/indices/
├── _2nXGRdBQWqEtALngH_xaQ
│   └──  _state
│       └──state-12.st
└── ZytkImACR4CkB53vG4XyNw
1
   ├──index
   │  ├──segments_1
   │  └──write.lock
   ├──_state
   │  └──state-0.st
   └──translog
       ├──translog-1.tlog
       └──translog.ckp
2
   ├──index
   │  ├──segments_2
   │  └──write.lock
   ├──_state
   │  └──state-0.st
   └──translog
       ├──translog-1.tlog
       └──translog.ckp
4
   ├──index
   │  ├──segments_2
   │  └──write.lock
   ├──_state
   │  └──state-0.st
   └──translog
       ├──translog-1.tlog
       └──translog.ckp
_state
state-7.st
 
16 directories, 17 files

测试3：启动2个节点，前一个ES配置文件

/data2去掉，后添加ES机器修改ES配置文件里的数据目录位 /data1 ，启动ES，之前ES里的两个索引会数据会自动同步两台机器的 /data1下。

如何查询es本地磁盘使用率 elasticsearch 磁盘空间_故障_02

[root@datanode10 nodes]# tree  /data01/es5/data/nodes/0/indices/ 
/data01/es5/data/nodes/0/indices/
├── _2nXGRdBQWqEtALngH_xaQ
│   ├──  0
│   │  ├──index
│   │  │  ├── segments_2
│   │  │  └── write.lock
│   │  ├──_state
│   │  │  └── state-2.st
│   │  └──translog
│   │       ├──translog-1.ckp
│   │       ├──translog-1.tlog
│   │       ├──translog-2.ckp
│   │       ├──translog-2.tlog
│   │       ├──translog-3.tlog
│   │       └──translog.ckp
│   ├──  1
│   │  ├──index
│   │  │  ├── segments_2
│   │  │  └── write.lock
│   │  ├──_state
│   │  │  └── state-2.st
│   │  └──translog
│   │       ├──translog-1.ckp
│   │       ├──translog-1.tlog
│   │       ├──translog-2.ckp
│   │       ├──translog-2.tlog
│   │       ├──translog-3.tlog
│   │       └──translog.ckp
│   ├──  2
│   │  ├──index
│   │  │  ├── segments_2
│   │  │  └── write.lock
│   │  ├──_state
│   │  │  └── state-2.st
│   │  └──translog
│   │       ├──translog-1.ckp
│   │       ├──translog-1.tlog
│   │       ├──translog-2.ckp
│   │       ├──translog-2.tlog
│   │       ├──translog-3.tlog
│   │       └──translog.ckp
│   ├──  3
│   │  ├──index
│   │  │  ├── _0.cfe
│   │  │  ├── _0.cfs
│   │  │  ├── _0.si
│   │  │  ├── segments_4
│   │  │  └── write.lock
│   │  ├──_state
│   │  │  └── state-2.st
│   │  └──translog
│   │       ├──translog-2.ckp
│   │       ├──translog-2.tlog
│   │       ├──translog-3.ckp
│   │       ├──translog-3.tlog
│   │       ├──translog-4.tlog
│   │       └──translog.ckp
│   ├──  4
│   │  ├──index
│   │  │  ├── segments_2
│   │  │  └── write.lock
│   │  ├──_state
│   │  │  └── state-2.st
│   │  └──translog
│   │       ├──translog-1.ckp
│   │       ├──translog-1.tlog
│   │       ├──translog-2.ckp
│   │       ├──translog-2.tlog
│   │       ├──translog-3.tlog
│   │       └──translog.ckp
│   └──  _state
│       └──state-21.st
└── ZytkImACR4CkB53vG4XyNw
0
   ├──index
   │  ├──segments_2
   │  └──write.lock
   ├──_state
   │  └──state-1.st
   └──translog
       ├──translog-1.tlog
       └──translog.ckp
1
   ├──index
   │  ├──segments_3
   │  └──write.lock
   ├──_state
   │  └──state-0.st
   └──translog
       ├──translog-1.tlog
       └──translog.ckp
2
   ├──index
   │  ├──segments_2
   │  └──write.lock
   ├──_state
   │  └──state-0.st
   └──translog
       ├──translog-1.tlog
       └──translog.ckp
3
   ├──index
   │  ├──_0.cfe
   │  ├──_0.cfs
   │  ├──_0.si
   │  ├──segments_3
   │  └──write.lock
   ├──_state
   │  └──state-1.st
   └──translog
       ├──translog-2.ckp
       ├──translog-2.tlog
       ├──translog-3.tlog
       └──translog.ckp
4
   ├──index
   │  ├──segments_2
   │  └──write.lock
   ├──_state
   │  └──state-0.st
   └──translog
       ├──translog-1.tlog
       └──translog.ckp
_state
state-16.st
 
44 directories, 80 files /data02里的分片都自动同步到/data01上了
You have new mail in /var/spool/mail/root
 
[root@datanode08 opt]# tree  /data01/es5/data/nodes/0/indices/ 
/data01/es5/data/nodes/0/indices/
├── _2nXGRdBQWqEtALngH_xaQ
│   ├──  0
│   │  ├──index
│   │  │  ├── segments_4
│   │  │  └── write.lock
│   │  ├──_state
│   │  │  └── state-1.st
│   │  └──translog
│   │       ├──translog-1.tlog
│   │       └──translog.ckp
│   ├──  1
│   │  ├──index
│   │  │  ├── segments_4
│   │  │  └── write.lock
│   │  ├──_state
│   │  │  └── state-1.st
│   │  └──translog
│   │       ├──translog-1.tlog
│   │       └──translog.ckp
│   ├──  2
│   │  ├──index
│   │  │  ├── segments_4
│   │  │  └── write.lock
│   │  ├──_state
│   │  │  └── state-1.st
│   │  └──translog
│   │       ├──translog-1.tlog
│   │       └──translog.ckp
│   ├──  3
│   │  ├──index
│   │  │  ├── _0.cfe
│   │  │  ├── _0.cfs
│   │  │  ├── _0.si
│   │  │  ├── segments_6
│   │  │  └── write.lock
│   │  ├──_state
│   │  │  └── state-1.st
│   │  └──translog
│   │       ├──translog-1.tlog
│   │       └──translog.ckp
│   ├──  4
│   │  ├──index
│   │  │  ├── segments_4
│   │  │  └── write.lock
│   │  ├──_state
│   │  │  └── state-1.st
│   │  └──translog
│   │       ├──translog-1.tlog
│   │       └──translog.ckp
│   └──  _state
│       └──state-13.st
└── ZytkImACR4CkB53vG4XyNw
0
   ├──index
   │  ├──segments_1
   │  └──write.lock
   ├──_state
   │  └──state-1.st
   └──translog
       ├──translog-1.ckp
       ├──translog-1.tlog
       ├──translog-2.tlog
       └──translog.ckp
1
   ├──index
   │  ├──segments_2
   │  └──write.lock
   ├──_state
   │  └──state-1.st
   └──translog
       ├──translog-1.ckp
       ├──translog-1.tlog
       ├──translog-2.tlog
       └──translog.ckp
2
   ├──index
   │  ├──segments_1
   │  └──write.lock
   ├──_state
   │  └──state-1.st
   └──translog
       ├──translog-1.ckp
       ├──translog-1.tlog
       ├──translog-2.tlog
       └──translog.ckp
3
   ├──index
   │  ├──_0.cfe
   │  ├──_0.cfs
   │  ├──_0.si
   │  ├──segments_5
   │  └──write.lock
   ├──_state
   │  └──state-1.st
   └──translog
       ├──translog-1.tlog
       └──translog.ckp
4
   ├──index
   │  ├──segments_1
   │  └──write.lock
   ├──_state
   │  └──state-1.st
   └──translog
       ├──translog-1.ckp
       ├──translog-1.tlog
       ├──translog-2.tlog
       └──translog.ckp
_state
state-16.st

44 directories, 66 files

结论 datanode10 机器上 /data02里的分片都自动同步到/data01上了，且数据被打散到机器datanode08机器的/data01磁盘上，实现了自动平衡数据功能。

测试3：启动3个节点，所有ES配置文件数据目录修改/data2， data1， data3，数据会自动同步到3个目录下，并且是均匀分布的。由此我们昨天选择以上方案降低磁盘空间。

如何查询es本地磁盘使用率 elasticsearch 磁盘空间_磁盘_03

由此可看datanode09机器自动同步数据，并分配相应的分片，之后我们做了一个插入数据操作curl -XPUT 'datanode08:9200/teacher/external/1?pretty' -d '

{

"name": "John Doe"

如何查询es本地磁盘使用率 elasticsearch 磁盘空间_elasticsearch_04

可以看到数据分片存储到不同的机器上，经过反复的测试，我得出来一个很重要的结论，就是ES集群扩容可以线上扩容，只需要修改相应的配置参数即可实现扩容，也不会出现数据倾斜的问题，因为ES自己会平衡已有的数据，和未来插入的数据，虽然这个坑很大，但是通过这个也更清楚的验证的一个理论还是值得的，希望各位不要走我的老路，在搭建集群的时候就要把能想到的问题都考虑清楚，以免给自己挖下坑，避免由自己的失误导致后续故障的发生，记录下自己的处理过程供有需要的人参考，祝好运！

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。