(1) 环境:ubuntu、JDK1.8、hadoop-2.7.2
(2) 问题:每次hadoop跑各种MR应用,运行到running job都卡住了。
配置好伪分布式的hadoop集群,启动集群后,使用自带的pi实例测试集群是否配置成功,使用命令:
$hadoop jar myapp.jar data/ncdc/wc data/result
可是任务运行到running job就卡住了
INFO mapreduce.Job: Running job: job_1403905542893_0004
ResourcesManager浏览器界面显示UNASSIGNED
Tracking UI - UNASSIGNED
Apps Submitted - 1
Apps Pending - 1
Apps Running - 0
Jps输出:
4764 Jps
2148 DataNode
3280 ResourceManager
2053 NameNode
3378 NodeManager
2318 SecondaryNameNode
(3) 解决方法:
从网上查了好多资料,主要有两种方法:一是hosts配置了不相关的主机,修改/etc/hosts文件,删除不相关的主机;二是集群的资源不足,无法分配给新任务的资源,需要调节yarn-site.xml的调度器获得资源的参数。
对于方法一,我的配置文件只配置了本地主机,因此不是hosts文件问题。对于方法二,以前使用Apache的hadoop伪分布式集群时,运行到map 0% reduce 0%卡住,调节下yarn-site.xml参数,可以完美运行了,原yarn-site.xml配置:
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
调节后的配置:
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>3072</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>256</value>
</property>
Yarn配置详细参数可见:http://dongxicheng.org/mapreduce-nextgen/hadoop-yarn-configurations-resourcemanager-nodemanager/
可是这些配置同样在CDH的hadoop上却出现在running job卡住的问题。
(4) 最终解决方案:
把yarn-site.xml中关于资源调节的配置删除即可。
原yarn-site.xml配置:
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2560</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>256</value>
</property>
</configuration>
修改后yarn-site.xml文件:
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>
重启resourcemanager和nodemanager守护进程,再次运行pi实例,会发现作业成功运行!
参考资料:http://stackoverflow.com/questions/24481439/cant-run-a-mapreduce-job-on-hadoop-2-4-0