温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看。


1.问题描述


Hive的MapReduce作业无法正常运行,日志如下:

0: jdbc:hive2://localhost:10000>select count(*) from student;

command(queryId=hive_20170902081616_d676f921-c62c-4fac-84b9-272663a2fca0); Timetaken: 10.029 secondsError: Error while processing statement: FAILED: Execution Error,return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)
0: jdbc:hive2://localhost:10000

Yarn的JobHistory目录权限问题导致MapReduce作业异常_hadoop

MapRedecu作业无法正常运行,日志如下:

[root@ip-172-31-6-148 hadoop-mapreduce]# hadoop jar hadoop-mapreduce-examples.jar pi 5 5...
Diagnostics: Exception from container-launch.Container id: container_1504338960864_0005_02_000001Exit code: 1Stack trace: ExitCodeException exitCode=1:
       
at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
       
at org.apache.hadoop.util.Shell.run(Shell.java:504)
       
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
        
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
       
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
       
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
       
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1Failing this attempt. Failing the application.17/09/02 08:19:36 INFO mapreduce.Job: Counters: 0Job Finished in 8.452 secondsjava.io.FileNotFoundException: File does not exist: hdfs://ip-172-31-6-148:8020/user/root/QuasiMonteCarlo_1504340365604_1994724640/out/reduce-out
       
at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1266)
       
at org.apache.hadoop.hdfs.DistributedFileSystem$20.doCall(DistributedFileSystem.java:1258)
       
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
       
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1258)
       
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1820)
       
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1844)
       
at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
       
at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
       
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
       
at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
       
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
       
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       
at java.lang.reflect.Method.invoke(Method.java:606)
       
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
       
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
       
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
       
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
       
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
       
at java.lang.reflect.Method.invoke(Method.java:606)
       
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
       
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)[root@ip-172-31-6-148 hadoop-mapreduce]#

Yarn的JobHistory目录权限问题导致MapReduce作业异常_java_02

通过JobHistory页面无法查看作业的日志:

Yarn的JobHistory目录权限问题导致MapReduce作业异常_java_03


2.问题分析


1.查看Yarn的ResourceManager日志,无法正常创建Container,异常如下:

Exit code: 1Stack trace: ExitCodeException exitCode=1:
       
at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
       
at org.apache.hadoop.util.Shell.run(Shell.java:504)
       
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
       
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
       
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
       
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
       
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
       
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       
at java.lang.Thread.run(Thread.java:745)Container exited with a non-zero exit code 1Failing this attempt. Failing the application.
Container id: container_1504341269835_0001_02_000001Exit code: 1Stack trace: ExitCodeException exitCode=1:
       
at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
       
at org.apache.hadoop.util.Shell.run(Shell.java:504)
       
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
       
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
       
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
       
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
       
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
       
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       
at java.lang.Thread.run(Thread.java:745)

Yarn的JobHistory目录权限问题导致MapReduce作业异常_apache_04


2.查看NodeManager节点日志,异常日志如下:

2017-09-02 08:37:35,317 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1504341269835_0001_01_000001 and exit code: 1ExitCodeException exitCode=1:
       
at org.apache.hadoop.util.Shell.runCommand(Shell.java:601)
       
at org.apache.hadoop.util.Shell.run(Shell.java:504)
       
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786)
       
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213)
       
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
       
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
       
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
       
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       
at java.lang.Thread.run(Thread.java:745)2017-09-02 08:37:35,326 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Exception from container-launch.2017-09-02 08:37:35,326 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: Container id: container_1504341269835_0001_01_000001

Yarn的JobHistory目录权限问题导致MapReduce作业异常_apache_05


3.查看JobHistory服务的log日志

2017-09-02 08:40:31,676 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: Starting scan to move intermediate done files2017-09-02 08:40:32,880 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:PROXY) via mapred (auth:SIMPLE) cause:java.io.FileNotFoundException:
File
does not exist: /user/root/.staging/job_1504341269835_0001/job_1504341269835_0001.summary
       
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
       
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
       
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2037)
       
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2007)
       
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1920)
       
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:572)
       
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:89)
       
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
       
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
       
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
       
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
       
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
       
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
       
at java.security.AccessController.doPrivileged(Native Method)
        at
javax.security.auth.Subject.doAs(Subject.java:415)
       
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
       
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)2017-09-02 08:40:32,882 WARN org.apache.hadoop.mapreduce.v2.hs.KilledHistoryService: Could not process job filesjava.io.FileNotFoundException: File does not exist: /user/root/.staging/job_1504341269835_0001/job_1504341269835_0001.summary
       
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
       
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
       
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2037)
        
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2007)
       
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1920)
       
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:572)
       
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:89)
       
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
       
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
       
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
       
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
       
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
       
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
       
at java.security.AccessController.doPrivileged(Native Method)
        at
javax.security.auth.Subject.doAs(Subject.java:415)
       
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
       
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)
       
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

Yarn的JobHistory目录权限问题导致MapReduce作业异常_hadoop_06


4.查看HDFS的Namenode日志,异常如下:

2017-09-02 08:37:29,445 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /user/root/.staging/job_1504341269835_0001/job.xml is closed by DFSClient_NONMAPREDUCE_478129775_12017-09-02 08:37:29,451 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 172.31.10.118:50010 is added to blk_1073744484_3660 size 1069542017-09-02 08:37:35,265 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode="/user/history":mapred:supergroup:drwxrwx---2017-09-02 08:37:35,265 INFO org.apache.hadoop.ipc.Server: IPC Server handler 29 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 172.31.5.190:46293 Call#5 Retry#0: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode="/user/history":mapred:supergroup:drwxrwx---2017-09-02 08:37:40,188 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode="/user/history":mapred:supergroup:drwxrwx---2017-09-02 08:37:40,188 INFO org.apache.hadoop.ipc.Server: IPC Server handler 17 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 172.31.10.118:49343 Call#5
 
Retry#0: org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=EXECUTE, inode="/user/history":mapred:supergroup:drwxrwx---2017-09-02 08:37:41,200 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/fail/root_appattempt_1504341269835_0001_000002 is closed by DFSClient_NONMAPREDUCE_-860670620_215
2017-09-02
08:37:41,276 INFO BlockStateChange: BLOCK* addToInvalidates: blk_1073744476_3652 172.31.10.118:50010 172.31.9.33:50010 172.31.5.190:50010

Yarn的JobHistory目录权限问题导致MapReduce作业异常_hadoop_07


分析过程:

  1. 查看ResourceManager日志未发现原因
  2. 查看NodeManager日志未发现原因
  3. JobHistory日志无法正常查看,由于MapReduce作业先在(/user/xxx用户/xxxJob)目录下创建临时日志文件,然后将日志文件移至/user/history目录。
  4. 查看HDFS的NameNode日志,作业产生的临时日志文件无法正常写入/user/history目录
  5. 问题原因是由于HDFS的/user/history目录权限低,导致Yarn作业日志无法记录


3.解决方法

修改/user/history目录的权限及属主

sudo -u hdfs hadoop dfs -chmod 777 /user/historysudo –u hdfs hadoop dfs –chown mapred:hadoop /user/history

修改权限前

Yarn的JobHistory目录权限问题导致MapReduce作业异常_apache_08

修改权限后,数据正常写入,MapReduce任务正常

Yarn的JobHistory目录权限问题导致MapReduce作业异常_java_09


醉酒鞭名马,少年多浮夸! 岭南浣溪沙,呕吐酒肆下!挚友不肯放,数据玩的花!

温馨提示:要看高清无码套图,请使用手机打开并单击图片放大查看。




欢迎关注Hadoop实操,第一时间,分享更多Hadoop干货,喜欢请关注分享。

Yarn的JobHistory目录权限问题导致MapReduce作业异常_java_10

原创文章,欢迎转载,转载请注明:转载自微信公众号Hadoop实操