hadoop去掉保护模式

命令hadoop fs –safemode get 查看安全模式状态
命令hadoop fs –safemode enter 进入安全模式状态
命令hadoop fs –safemode leave 离开安全模式状态

  hdfs dfsadmin -safemode leave

hadoop 打开和关闭safemode 关闭hadoop安全模式_Hadoop

第一步:用hadoop fsck 检查hadoop文件系统

命令:Hadoop fsck

1 hadoop  fsck
 2 
 3 Usage: DFSck <path> [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]
 4         <path>             检查这个目录中的文件是否完整
 5 
 6         -move               破损的文件移至/lost+found目录
 7         -delete             删除破损的文件
 8 
 9         -openforwrite   打印正在打开写操作的文件
10 
11         -files                 打印正在check的文件名
12 
13         -blocks             打印block报告 (需要和-files参数一起使用)
14 
15         -locations         打印每个block的位置信息(需要和-files参数一起使用)
16 
17         -racks               打印位置信息的网络拓扑图 (需要和-files参数一起使用)

观察hadoop集群状态

CORRUPT FILES: 2 #损坏了两个文件
MISSING BLOCKS:

1 Status: CORRUPT
 2  Number of data-nodes:    1
 3  Number of racks:        1
 4  Total dirs:            72
 5  Total symlinks:        0
 6 
 7 Replicated Blocks:
 8  Total size:    574348338 B
 9  Total files:    76
10  Total blocks (validated):    74 (avg. block size 7761464 B)
11   ********************************
12   UNDER MIN REPL'D BLOCKS:    35 (47.2973 %)
13   MINIMAL BLOCK REPLICATION:    1
14   CORRUPT FILES:    35
15   MISSING BLOCKS:    35
16   MISSING SIZE:        327840971 B
17   ********************************
18  Minimally replicated blocks:    39 (52.7027 %)
19  Over-replicated blocks:    0 (0.0 %)
20  Under-replicated blocks:    38 (51.351353 %)
21  Mis-replicated blocks:        0 (0.0 %)
22  Default replication factor:    3
23  Average block replication:    0.527027
24  Missing blocks:        35
25  Corrupt blocks:        0
26  Missing replicas:        160 (38.46154 %)
27 
28 Erasure Coded Block Groups:
29  Total size:    0 B
30  Total files:    0
31  Total block groups (validated):    0
32  Minimally erasure-coded block groups:    0
33  Over-erasure-coded block groups:    0
34  Under-erasure-coded block groups:    0
35  Unsatisfactory placement block groups:    0
36  Average block group size:    0.0
37  Missing block groups:        0
38  Corrupt block groups:        0
39  Missing internal blocks:    0
40 FSCK ended at Wed Jan 12 20:13:16 CST 2022 in 28 milliseconds

第二步 打印hadoop文件的状态信息

命令:hadoop fsck / -files -blocks -locations -racks

可以看到正常文件后面都有ok字样,有MISSING!字样的就是丢失的文件。

/tmp/hadoop-yarn/staging/history/done_intermediate/root/job_1641978125772_0001_conf.xml_tmp 0 bytes, replicated: replication=3, 0 block(s): OK

/tmp/hadoop-yarn/staging/root/.staging/job_1641780921957_0001/job.split 315 bytes, replicated: replication=10, 1 block(s): MISSING

根据这个的路劲可以在hadoop浏览器界面中找到对应的文件路径,如下图

hadoop fs -chmod -R  777 /tmp

hadoop 打开和关闭safemode 关闭hadoop安全模式_安全模式_02

 

 

 

第三步:修复两个丢失、损坏的文件

 

[root@node03 conf]# hdfs debug recoverLease -path /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/19195239-a205-4462-921d-09e0483a4080 -retries 10

[root@node03 conf]# hdfs debug recoverLease -path /flink-checkpoint/626ea65de810a2ec3b1799b605a6a995/chk-175/_metadata -retries 10

 

..Status: HEALTHY 集群状态:健康

现在重新启动hadoop就不会一直处于安全模式了,hiveserver2也能正常启动了。。

第四:意外状况

如果修复不了,或者提示修复成功但是集群状态还是下面这样:

.............Status: CORRUPT					#Hadoop状态:不正常
 Total size:	273821489 B
 Total dirs:	403
 Total files:	213
 Total symlinks:		0
 Total blocks (validated):	201 (avg. block size 1362295 B)
  ********************************
  UNDER MIN REPL'D BLOCKS:	2 (0.99502486 %)
  dfs.namenode.replication.min:	1
  CORRUPT FILES:	2							#损坏了两个文件
  MISSING BLOCKS:	2							#丢失了两个块
  MISSING SIZE:		6174 B
  CORRUPT BLOCKS: 	2
  ********************************
 Minimally replicated blocks:	199 (99.004974 %)
 Over-replicated blocks:	0 (0.0 %)
 Under-replicated blocks:	0 (0.0 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	3
 Average block replication:	2.8208954
 Corrupt blocks:		2
 Missing replicas:		0 (0.0 %)
 Number of data-nodes:		3
 Number of racks:		1
FSCK ended at Fri Aug 23 10:43:11 CST 2019 in 12 milliseconds

1、如果损坏的文件不重要

首先:将找到的损坏文件备份好

然后:执行[root@node03 export]# hadoop fsck / -delete将损坏文件删除

[root@node03 export]# hadoop fsck / -delete

此命令一次不成功可以多试几次,前提是丢失、损坏的文件不重要!!!!!!!!!!

2、如果损坏的文件很重要不能丢失

可以先执行此命令:hadoop fs –safemode leave 强制离开安全模式状态

[root@node03 export]# hadoop fs –safemode leave

此操作不能完全解决问题,只能暂时让集群能够工作!!!!

而且,以后每次启动hadoop集群都要执行此命令,直到问题彻底解决。