hadoop集群无法启动 hadoop集群运行wordcount

转载

karen 2024-01-23 22:02:51

文章标签 hadoop集群无法启动 Eclipse WordCount MapReduce Hadoop 文章分类 Hadoop 大数据

文件下载

WordCount.java 提取码2kwo
log4j.properties 提取码tpz9
data.txt 提取码zefp

具体步骤

注意：Eclipse连接Hadoop集群执行完所有步骤后方可进行接下来的操作

打开Eclipse，依次点击“File”→“New”→“Map/ReduceProject”，点击“Next”
在弹出的窗口填写项目名，选择项目路径，点击“Finish”
在mapreduce项目的src目录中新建cn.neu包，点击“Finish”
将下载的WordCount.java文件拷贝粘贴至cn.neu包中（直接拖拽即可）
使用Xftp等文件传输软件将远程Hadoop集群安装目录下的hadoop/hadoop-2.6.0/etc/hadoop目录下的core-site.xml和hdfs-site.xml传输到本地

上述两个XML文件和下载的log4j.properties文件一起拷贝到src中
注：若不清楚上述XML文件如何配置，推荐参考多台Linux虚拟机Hadoop集群的安装与部署（超详细版）
若不添加两个XML文件，会产生如下错误

log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/G:/hadoop-2.6.0/share/hadoop/common/lib/hadoop-auth-2.6.0.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/test/input/data.txt
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at java.base/javax.security.auth.Subject.doAs(Unknown Source)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
	at cn.neu.WordCount.main(WordCount.java:60)

右击HDFS根目录，点击“Create new directory”

hadoop集群无法启动 hadoop集群运行wordcount_hadoop集群无法启动

输入test后点击“OK”

hadoop集群无法启动 hadoop集群运行wordcount_Hadoop_02

在Project Explorer框内右击，点击Refresh刷新后，即可看到新建的目录
右击test文件夹，在此文件夹下建立目录input，刷新后如下

hadoop集群无法启动 hadoop集群运行wordcount_hadoop集群无法启动_03

右击input目录，选择Upload files to DFS（HDFS以前也称DFS）

hadoop集群无法启动 hadoop集群运行wordcount_MapReduce_04

选择下载的data.txt文件后，点击“打开”，再次刷新Project Explorer，如下图所示

hadoop集群无法启动 hadoop集群运行wordcount_WordCount_05

WordCount.java代码中有两处参数值，因此需要配置参数

FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
 FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

在代码编辑处右键鼠标，依次点击“Run As”→“Run Configurations”

hadoop集群无法启动 hadoop集群运行wordcount_Eclipse_06

点击Arguments，输入上一步骤中设置的data.txt路径和程序最终的输出路径，点击“Apply”后点击“Run”开始运行程序

注意：不可再程序执行前在test目录中新建output目录，output目录务必不存在！否则会产生目录已存在的错误！

hadoop集群无法启动 hadoop集群运行wordcount_hadoop集群无法启动_07

可能会报出如下错误（若未报该错误，直接跳过此步骤）

Exception in thread "main" java.lang.ExceptionInInitializerError
	at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:438)
	at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:484)
	at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
	at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
	at cn.neu.WordCount.main(WordCount.java:45)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
	at java.base/java.lang.String.checkBoundsBeginEnd(Unknown Source)
	at java.base/java.lang.String.substring(Unknown Source)
	at org.apache.hadoop.util.Shell.<clinit>(Shell.java:49)
	... 5 more

点击(Shell.java:49)，进入如下界面，点击Attach Source

hadoop集群无法启动 hadoop集群运行wordcount_Hadoop_08

进入以下界面后，依次点击“External loaction”→“External file”，根据上图中的路径找到sources文件夹，打开后点击hadoop-common-2.6.0-sources.jar，点击“打开”，最后点击“OK”

hadoop集群无法启动 hadoop集群运行wordcount_MapReduce_09

hadoop集群无法启动 hadoop集群运行wordcount_WordCount_10

hadoop集群无法启动 hadoop集群运行wordcount_Eclipse_11

再次点击(Shell.java:49)可查看其源码，定位到第49行，源码如下

private static boolean IS_JAVA7_OR_ABOVE =
 System.getProperty(“java.version”).substring(0, 3).compareTo(“1.7”) >= 0;结合如下错误信息
at java.base/java.lang.String.checkBoundsBeginEnd(Unknown Source)
 at java.base/java.lang.String.substring(Unknown Source)

即找不到字符串，因此需要在主函数中添加如下代码System.setProperty("java.version", "1.8");，其中后面的数字比1.7大即可

若程序可以正常运行，等待程序运行完毕后，右击Project Explorer中Hadoop下新建的test目录，点击Refresh刷新，可在其中看到output目录

双击part-r-0000文件可查看程序运行结果
若要再次执行，要么在参数配置中更改输出目录，要么删除输出路径下的文件

有一个一劳永逸的方法，即在程序中主函数略加改动，即每次进行运算前检查输出路径是否存在，若存在则删除输出路径
改动前

System.setProperty("HADOOP_USER_NAME", "root");
        System.setProperty("java.version", "1.8");
		Configuration conf = new Configuration();
		String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
		if(otherArgs.length != 2){
			System.err.println("Usage WordCount <int> <out>");
			System.exit(2);
		}

改动后

System.setProperty("HADOOP_USER_NAME", "root");
		System.setProperty("java.version", "1.8");
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(conf);
		String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
		if(otherArgs.length != 2){
			System.err.println("Usage WordCount <int> <out>");
			System.exit(2);
		}
		Path outPath = new Path(otherArgs[1]);
		if(fs.exists(outPath)) {
			fs.delete(outPath, true);
		}

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。