2019-03-10
本篇文章旨在阐述本人在某一特定情况下遇到 Hive 执行 MapReduce 作业的问题的探索过程与解决方案。不对文章的完全、绝对正确性负责。
解决方案
Hive 的配置文件 hive-site.xml 中的 hive.exec.scratchdir 的目录地址要放在 HDFS 上。
问题现象
本人在使用 Hive 执行 MapReduce 作业时,突然发现所有作业均无法执行。下达 HQL 命令的控制台只有短短几行输出。控制台输出内容如下:
1 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
2 Query ID = chorm_20190310001344_e4ed74d8-4048-4918-aa6f-d3a1a2a60698
3 Total jobs = 1
4 Launching Job 1 out of 1
5 Number of reduce tasks determined at compile time: 1
6 In order to change the average load for a reducer (in bytes):
7 set hive.exec.reducers.bytes.per.reducer=<number>
8 In order to limit the maximum number of reducers:
9 set hive.exec.reducers.max=<number>
10 In order to set a constant number of reducers:
11 set mapreduce.job.reduces=<number>
12 Starting Job = job_1552147755103_0003, Tracking URL = http://m254:8088/proxy/application_1552147755103_0003/
13 Kill Command = /usr/bigdata/hadoop/bin/hadoop job -kill job_1552147755103_0003
14 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
15 2019-03-10 00:13:47,528 Stage-1 map = 0%, reduce = 0%
16 Ended Job = job_1552147755103_0003 with errors
17 Error during job, obtaining debugging information...
18 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
19 MapReduce Jobs Launched:
20 Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL
21 Total MapReduce CPU Time Spent: 0 msec
开始时感觉很莫名其妙,因为之前还好好的,突然就出现这种现象。在网上似乎也搜索不到这个问题的解决方案。所幸经过一番折腾后,找到了问题原因所在。
探索过程
首先基本可以确定,在控制台上打印的信息无法帮助定位问题原因与作出解决方案假设。所以要另寻它法。
1. 检查 Hadoop 与 YARN 是否正常工作。
Hadoop 好检查。直接通过网页 UI 与 CLI 来检查即可。本人这里确认了 Hadoop 没有问题。
随后是 YARN ,也是通过网页 UI 来检查。同时不要忘记检查集群中各机器的对应进程是否在正常运行。本人这块经检查也没有问题。
最后再检查一下 MapReduce ,我这里直接通过 Hadoop 自带的 example.jar 来跑一个 wordcount 例子来检查。经检查也 OK 。
经过上面 3 步的检查,排除了 Hadoop 的问题。
2. 检查 Hive
说实话,这个检查不出什么来。也不知道该怎么来检查。
3. 查看 YARN 中这个作业的日志
老实说,出现问题的时候应该第一时刻就来查看日志的,但是在这里不知怎么犯傻了。
打开 http://yarn-host:8080 网页,找到那条错误的作业记录,点进去,发现有如下错误信息:
1 Diagnostics:
2 Application application_1552147755103_0003 failed 2 times due to AM Container for appattempt_1552147755103_0003_000002 exited with exitCode: -1000
3 For more detailed output, check application tracking page:http://m254:8088/cluster/app/application_1552147755103_0003Then, click on links to logs of each attempt.
4 Diagnostics: File file:/var/bigdata/hive/scratchdir/chorm/46b600b8-9250-48c8-8284-a2f6b649bcae/hive_2019-03-10_00-13-44_741_6711852286526745896-1/-mr-10005/cd1fe621-e494-4ddd-b8f8-a9c80e052c6c/reduce.xml does not exist
5 java.io.FileNotFoundException: File file:/var/bigdata/hive/scratchdir/chorm/46b600b8-9250-48c8-8284-a2f6b649bcae/hive_2019-03-10_00-13-44_741_6711852286526745896-1/-mr-10005/cd1fe621-e494-4ddd-b8f8-a9c80e052c6c/reduce.xml does not exist
6 at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
7 at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
8 at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
9 at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:428)
10 at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
11 at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
12 at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
13 at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
14 at java.security.AccessController.doPrivileged(Native Method)
15 at javax.security.auth.Subject.doAs(Subject.java:422)
16 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1758)
17 at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:359)
18 at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
19 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
20 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
21 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
22 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
23 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
24 at java.lang.Thread.run(Thread.java:748)
25
26 Failing this attempt. Failing the application.
从上面日志中发现一条重要信息 reduce.xml does not exists!
在将 hive.exec.scratchdir 属性指向的目录重新设定到 HDFS 中以后,Hive 的 MapReduce 作业就能正常执行了。