关闭Hadoop集群报错

1. 报错如下:

[root@server4 sbin]# ./stop-yarn.sh 
stopping yarn daemons
no resourcemanager to stop
server5: no nodemanager to stop
server6: no nodemanager to stop
server4: no nodemanager to stop
no proxyserver to stop
[root@server4 sbin]# ./stop-dfs.sh
Stopping namenodes on [server4]
server4: no namenode to stop
server5: no datanode to stop
server6: no datanode to stop
server4: no datanode to stop
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: no secondarynamenode to stop

查看关闭的脚本

[root@server4 sbin]# cat -n yarn-daemon.sh 
1 #!/usr/bin/env bash
2
··· #
17
18
19 # Runs a yarn command as a daemon.
20 #
21 # Environment Variables
22 #
23 # YARN_CONF_DIR Alternate conf dir. Default is ${HADOOP_YARN_HOME}/conf.
24 # YARN_LOG_DIR Where log files are stored. PWD by default.
25 # YARN_MASTER host:path where hadoop code should be rsync'd from
26 # YARN_PID_DIR The pid files are stored. /tmp by default.
27 # YARN_IDENT_STRING A string representing this instance of hadoop. $USER by default
28 # YARN_NICENESS The scheduling priority for daemons. Defaults to 0.
29 ##
30
31 usage="Usage: yarn-daemon.sh [--config <conf-dir>] [--hosts hostlistfile] (start|stop) <yarn-command> "
32
33 # if no args specified, show usage
34 if [ $# -le 1 ]; then
35 echo $usage
36 exit 1
37 fi
38
39 bin=`dirname "${BASH_SOURCE-$0}"`
40 bin=`cd "$bin"; pwd`
41
42 DEFAULT_LIBEXEC_DIR="$bin"/../libexec
43 HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
44 . $HADOOP_LIBEXEC_DIR/yarn-config.sh
45
46 # get arguments
47 startStop=$1
48 shift
49 command=$1
50 shift
51
52 hadoop_rotate_log ()
53 {
54 log=$1;
55 num=5;
56 if [ -n "$2" ]; then
57 num=$2
58 fi
59 if [ -f "$log" ]; then # rotate logs
60 while [ $num -gt 1 ]; do
61 prev=`expr $num - 1`
62 [ -f "$log.$prev" ] && mv "$log.$prev" "$log.$num"
63 num=$prev
64 done
65 mv "$log" "$log.$num";
66 fi
67 }
68
69 if [ -f "${YARN_CONF_DIR}/yarn-env.sh" ]; then
70 . "${YARN_CONF_DIR}/yarn-env.sh"
71 fi
72
73 if [ "$YARN_IDENT_STRING" = "" ]; then
74 export YARN_IDENT_STRING="$USER"
75 fi
76
77 # get log directory
78 if [ "$YARN_LOG_DIR" = "" ]; then
79 export YARN_LOG_DIR="$HADOOP_YARN_HOME/logs"
80 fi
81
82 if [ ! -w "$YARN_LOG_DIR" ] ; then
83 mkdir -p "$YARN_LOG_DIR"
84 chown $YARN_IDENT_STRING $YARN_LOG_DIR
85 fi
86
87 if [ "$YARN_PID_DIR" = "" ]; then
88 YARN_PID_DIR=/tmp
89 fi
90
91 # some variables
92 export YARN_LOGFILE=yarn-$YARN_IDENT_STRING-$command-$HOSTNAME.log
93 export YARN_ROOT_LOGGER=${YARN_ROOT_LOGGER:-INFO,RFA}
94 log=$YARN_LOG_DIR/yarn-$YARN_IDENT_STRING-$command-$HOSTNAME.out
95 pid=$YARN_PID_DIR/yarn-$YARN_IDENT_STRING-$command.pid
96 YARN_STOP_TIMEOUT=${YARN_STOP_TIMEOUT:-5}
97
98 # Set default scheduling priority
99 if [ "$YARN_NICENESS" = "" ]; then
100 export YARN_NICENESS=0
101 fi
102
103 case $startStop in
104
105 (start)
106
107 [ -w "$YARN_PID_DIR" ] || mkdir -p "$YARN_PID_DIR"
108
109 if [ -f $pid ]; then
110 if kill -0 `cat $pid` > /dev/null 2>&1; then
111 echo $command running as process `cat $pid`. Stop it first.
112 exit 1
113 fi
114 fi
115
116 if [ "$YARN_MASTER" != "" ]; then
117 echo rsync from $YARN_MASTER
118 rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $YARN_MASTER/ "$HADOOP_YARN_HOME"
119 fi
120
121 hadoop_rotate_log $log
122 echo starting $command, logging to $log
123 cd "$HADOOP_YARN_HOME"
124 nohup nice -n $YARN_NICENESS "$HADOOP_YARN_HOME"/bin/yarn --config $YARN_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null &
125 echo $! > $pid
126 sleep 1
127 head "$log"
128 # capture the ulimit output
129 echo "ulimit -a" >> $log
130 ulimit -a >> $log 2>&1
131 ;;
132
133 (stop)
134
135 if [ -f $pid ]; then
136 TARGET_PID=`cat $pid`
137 if kill -0 $TARGET_PID > /dev/null 2>&1; then
138 echo stopping $command
139 kill $TARGET_PID
140 sleep $YARN_STOP_TIMEOUT
141 if kill -0 $TARGET_PID > /dev/null 2>&1; then
142 echo "$command did not stop gracefully after $YARN_STOP_TIMEOUT seconds: killing with kill -9"
143 kill -9 $TARGET_PID
144 fi
145 else
146 echo no $command to stop
147 fi
148 rm -f $pid
149 else
150 echo no $command to stop
151 fi
152 ;;
153
154 (*)
155 echo $usage
156 exit 1
157 ;;
158
159 esac

过滤pid字段之后查看

[root@server4 sbin]# cat -n yarn-daemon.sh | grep pid 
26 # YARN_PID_DIR The pid files are stored. /tmp by default.
95 pid=$YARN_PID_DIR/yarn-$YARN_IDENT_STRING-$command.pid
109 if [ -f $pid ]; then
110 if kill -0 `cat $pid` > /dev/null 2>&1; then
111 echo $command running as process `cat $pid`. Stop it first.
125 echo $! > $pid
135 if [ -f $pid ]; then
136 TARGET_PID=`cat $pid`
148 rm -f $pid

2.原因

针对上述的关闭脚本,可以看到这个是去找默认/tmp目录下的​​yarn-$YARN_IDENT_STRING-$command.pid​​这个进程,但是因为/tmp目录会被定时清空,导致无法找到这个.pid文件,所以无法执行关闭Hadoop进程,从而导致Hadoop集群关闭异常。

3.解决办法

3.1 方法一
  • 修改hadoop-env.sh脚本
    修改其中的​​​export HADOOP_PID_DIR=${HADOOP_PID_DIR}​​​值为某个除/tmp之外的固定路径即可【​​The directory where pid files are stored. /tmp by default.​​】。如笔者修改之后的样子如下:
[root@server4 hadoop]# tail -10 hadoop-env.sh 
# NOTE: this should be set to a directory that can only be written to by
# the user that will run the hadoop daemons. Otherwise there is the
# potential for a symlink attack.
#export HADOOP_PID_DIR=${HADOOP_PID_DIR}
#/usr/local/hadoop-2.6.4/pids
export HADOOP_PID_DIR=/usr/local/hadoop-2.6.4/pids
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}

# A string representing this instance of hadoop. $USER by default.
export HADOOP_IDENT_STRING=$USER

但是仅仅hadoop-env.sh脚本中的值还是远远不够的,假设此时我们启动一下hadoop集群,去到/usr/local/hadoop-2.6.4/pids中查看信息,如下:

[root@server4 pids]# ll
total 12
-rw-r--r--. 1 root root 6 Oct 23 16:53 hadoop-root-datanode.pid
-rw-r--r--. 1 root root 6 Oct 23 16:53 hadoop-root-namenode.pid
-rw-r--r--. 1 root root 6 Oct 23 16:54 hadoop-root-secondarynamenode.pid
[root@server4 pids]# pwd
/usr/local/hadoop-2.6.4/pids
[root@server4 pids]#

可以看到这里仅仅只有hadoop(hdfs)的pids,没有yarn相关信息的pid,因为关于yarn的pids存储路径我们没有指定,所以启动集群之后还是会将pids写到/tmp路径下,查看如下:

[root@server4 pids]# cd /tmp
[root@server4 tmp]# ll
total 8
drwxr-xr-x. 3 root root 19 Oct 20 11:47 hbase-root
drwxr-xr-x. 2 root root 71 Oct 23 16:55 hsperfdata_root
drwxr-xr-x. 4 root root 32 Oct 23 16:53 Jetty_0_0_0_0_50070_hdfs____w2cu08
drwxr-xr-x. 4 root root 32 Oct 23 16:54 Jetty_0_0_0_0_50075_datanode____hwtdwq
drwxr-xr-x. 4 root root 32 Oct 23 16:54 Jetty_0_0_0_0_50090_secondary____y6aanv
drwxr-xr-x. 5 root root 46 Oct 23 16:55 Jetty_0_0_0_0_8042_node____19tj0x
drwxr-xr-x. 5 root root 46 Oct 23 16:55 Jetty_server4_8088_cluster____y51xml
drwx------. 3 root root 17 Oct 5 11:03 systemd-private-10dd88eabf284681a53d4e9aa58ca6ca-chronyd.service-9PTHJi
drwx------. 3 root root 17 Oct 5 11:03 systemd-private-10dd88eabf284681a53d4e9aa58ca6ca-cups.service-tbyfMo
drwx------. 3 root root 17 Oct 8 20:55 systemd-private-10dd88eabf284681a53d4e9aa58ca6ca-httpd.service-tZE9aa
drwx------. 2 root root 6 Oct 15 10:54 vmware-root
-rw-r--r--. 1 root root 6 Oct 23 16:54 yarn-root-nodemanager.pid
-rw-r--r--. 1 root root 6 Oct 23 16:54 yarn-root-resourcemanager.pid

所以现在我们需要在文件​​yarn-env.sh​​​中追加关于​​YARN_PID_DIR​​内容

  • 修改​​yarn-env.sh​​​如下:
    在该文件后追加如下一行​​​export YARN_PID_DIR=/usr/local/hadoop-2.6.4/pids​
[root@server4 hadoop]# tail -10 yarn-env.sh 
YARN_OPTS="$YARN_OPTS -Dyarn.home.dir=$YARN_COMMON_HOME"
YARN_OPTS="$YARN_OPTS -Dyarn.id.str=$YARN_IDENT_STRING"
YARN_OPTS="$YARN_OPTS -Dhadoop.root.logger=${YARN_ROOT_LOGGER:-INFO,console}"
YARN_OPTS="$YARN_OPTS -Dyarn.root.logger=${YARN_ROOT_LOGGER:-INFO,console}"
if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
YARN_OPTS="$YARN_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH"
fi
YARN_OPTS="$YARN_OPTS -Dyarn.policy.file=$YARN_POLICYFILE"

export YARN_PID_DIR=/usr/local/hadoop-2.6.4/pids

执行命令​​start-yarn.sh​​,并查看相应文件夹:

[root@server4 shells]# cd /usr/local/hadoop-2.6.4/pids/
[root@server4 pids]# ll
total 8
-rw-r--r--. 1 root root 6 Oct 23 17:16 yarn-root-nodemanager.pid
-rw-r--r--. 1 root root 6 Oct 23 17:16 yarn-root-resourcemanager.pid
[root@server4 pids]# cd /tmp
[root@server4 tmp]# ll
total 0
drwxr-xr-x. 3 root root 19 Oct 20 11:47 hbase-root
drwxr-xr-x. 2 root root 32 Oct 23 17:16 hsperfdata_root
drwxr-xr-x. 4 root root 32 Oct 23 16:53 Jetty_0_0_0_0_50070_hdfs____w2cu08
drwxr-xr-x. 4 root root 32 Oct 23 16:54 Jetty_0_0_0_0_50075_datanode____hwtdwq
drwxr-xr-x. 4 root root 32 Oct 23 16:54 Jetty_0_0_0_0_50090_secondary____y6aanv
drwxr-xr-x. 5 root root 46 Oct 23 17:16 Jetty_0_0_0_0_8042_node____19tj0x
drwxr-xr-x. 5 root root 46 Oct 23 17:16 Jetty_server4_8088_cluster____y51xml
drwx------. 3 root root 17 Oct 5 11:03 systemd-private-10dd88eabf284681a53d4e9aa58ca6ca-chronyd.service-9PTHJi
drwx------. 3 root root 17 Oct 5 11:03 systemd-private-10dd88eabf284681a53d4e9aa58ca6ca-cups.service-tbyfMo
drwx------. 3 root root 17 Oct 8 20:55 systemd-private-10dd88eabf284681a53d4e9aa58ca6ca-httpd.service-tZE9aa
drwx------. 2 root root 6 Oct 15 10:54 vmware-root
[root@server4 tmp]#
3.2 方法二

修改​​hadoop-daemon.sh​​,​​yarn-daemon.sh​​ 文件,直接从jps获取pid,绕过从pid文件中获取值

同理,这种问题肯定不仅仅在hdfs,yarn中存在,hbase中也会存在这种情况。我这里不再赘述了。

4.参考文章