概述:
Yarn支持两种不同容器实现方式,一种是yarn容器,一种是Linux容器,据说还有一种Docker容器(但是Docker其实带不太动CM,不过apache还是没问题的),Linux容器较比Yarn容器具有更好的扩展性和隔离性。本文将讲述Linux容器的配置。
偶然有机会,自己测试Hadoop Yarn Linux Container Executor的配置,但是说实话之前没做过容器这块的东西,所以虽然配置成功但是不太明白这么做有什么意义。因此百度了几篇大神的文章,这里转一下,如有冒犯实在抱歉!
1、关于nodemanager开启不同的Container的介绍:(AlstonWilliams)https://www.jianshu.com/p/e79b6a10dc85
然后我从中看到了Cgroup的影子,我就开始好奇Cgroup又是个啥,继续搜索,找到了这个
2、关于Cgroup的介绍:(刘光华_zhou)
如上,我发现了Cgroup大概的意思就是启动应用的时候限定该应用占用的cpu,因此我看到了另一位大神的一篇文章
3、hadoop升级Cgroup:(哪天改改)
写的很好,我真的看明白了,也说了不少报错的解决!
4、hadoop Cgroup源码解读:(刘光华_zhou)
然后我又看了看源码解读,做了下记录!
然后我就开始了我的测试:
说是测试,其实也没做什么主要是一些报错记录下(毕竟自己水平也是有限)
报错一: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /opt/modules/hadoop-2.5.0/bin/container-executor)
2018-03-14 12:39:23,852 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:192)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:425)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:472)
Caused by: java.io.IOException: Linux container executor not configured properly (error=1)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:175)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:190)
... 3 more
Caused by: ExitCodeException exitCode=1: /opt/modules/hadoop-2.5.0/bin/container-executor: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /opt/modules/hadoop-2.5.0/bin/container-executor)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:169)
... 4 more
这个网上的解决方法不一,大概有两种:
第一种是重新编译(我没试过):
尝试重新编译,找到Hadoop源代码,在$HADOOP_SRC/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager路径下,有一个pom.xml文件和src目录,直接使用maven编译,执行以下命令:
mvn package -Pdist,native -DskipTests -Dtar -Dcontainer-executor.conf.dir=$HADOOP_HOME/etc/hadoop
在编译好的target/native/usr/local/路径下可以找到编译好的container-executor,替换即可。
替换之后可以执行$HADOOP_HOME/bin/container-executor -checksetup,如果没有错误信息,基本上问题就解决了。
第二种是让机器支持GLIBC_2.14:(我用的是这种方式解决的)
报错二:/opt/modules/hadoop-2.5.0/etc/hadoop must be owned by root, but is owned by 500
2018-03-14 13:47:37,961 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:192)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:425)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:472)
Caused by: java.io.IOException: Linux container executor not configured properly (error=24)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:175)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:190)
... 3 more
Caused by: ExitCodeException exitCode=24: File /opt/modules/hadoop-2.5.0/etc/hadoop must be owned by root, but is owned by 500
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:169)
... 4 more
原因:这个是在${HADOOP_HOME}/etc/hadoop/container-executor.cfg配置文件的权限,和他上层目录的user(所属者)必须是root。
解决方式:chown root:root ${HADOOP_HOME}/etc/ (我就改了etc的权限为root:root)
然后我发现,报了这样的错!
Caused by: ExitCodeException exitCode=24: File /opt/modules/hadoop-2.5.0 must be owned by root, but is owned by 500
或者这样
ExitCodeException exitCode=24: File /opt/modules must be owned by root, but is owned by 500
然后我就明白了,应该让这个container-executor.cfg配置文件的所有上层目录都变成root,就是说opt,modules,hadoop-2.5.0,etc,hadoop这几个目录都要变成root:root(注意:其他的不用)
报错三:Caused by: ExitCodeException exitCode=22: Invalid permissions on container-executor binary.
2018-03-14 14:36:21,416 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:192)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:425)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:472)
Caused by: java.io.IOException: Linux container executor not configured properly (error=22)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:175)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:190)
... 3 more
Caused by: ExitCodeException exitCode=22: Invalid permissions on container-executor binary.
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:169)
... 4 more
原因:看起来是执行某一个脚本或者文件的时候,权限有问题!
解决:其实就是在${HADOOP_HOME}/bin下面的container-executor这个的权限有问题
执行 sudo chmod 6050 container-executor
这时候有可能出现另一种报错:
2018-03-15 11:43:09,610 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:192)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:425)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:472)
Caused by: java.io.IOException: Cannot run program "/opt/modules/hadoop-2.5.0/bin/container-executor": error=13, Permission denied
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:485)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:169)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:190)
... 3 more
Caused by: java.io.IOException: error=13, Permission denied
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:186)
at java.lang.ProcessImpl.start(ProcessImpl.java:130)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
... 8 more
这个报错是这样的:
在etc/hadoop/container-executor.cfg这个配置文件里面写了
yarn.nodemanager.linux-container-executor.group=hadoopyarn.nodemanager.local-dirs=/opt/modules/hadoop-2.5.0/yarn/localyarn.nodemanager.log-dirs=/opt/modules/hadoop-2.5.0/yarn/logbanned.users=hdfs,yarn,mapred,bin
min.user.id=1000
所以${HADOOP_HOME}/bin下面的container-executor的权限应该如下:
sudo chown root:hadoop container-executor
sudo chmod 6050 container-executor
====================华丽的分割线=========================
hadoop cgroup cpu资源计算方法:
比如一台机器是nodemanger节点,16核cpu,实际该机器cpu资源为1600%。设置yarn.nodemanager.resource.percentage-physical-cpu-limit为90,则nodemanager可用的cpu资源为16*0.9=14.4,则nodemanger所占用的cpu资源最高为1440%。另外,如果yarn.nodemanager.resource.cpu-vcores配置虚拟cpu核数为12,则这样的话该nodemanger上分配的一个container最多可以使用的cpu资源为1440%/12。