文件下载
- WordCount.java 提取码2kwo
- log4j.properties 提取码tpz9
- data.txt 提取码zefp
具体步骤
注意:Eclipse连接Hadoop集群执行完所有步骤后方可进行接下来的操作
- 打开Eclipse,依次点击“File”→“New”→“Map/ReduceProject”,点击“Next”
- 在弹出的窗口填写项目名,选择项目路径,点击“Finish”
- 在mapreduce项目的src目录中新建cn.neu包,点击“Finish”
- 将下载的WordCount.java文件拷贝粘贴至cn.neu包中(直接拖拽即可)
- 使用Xftp等文件传输软件将远程Hadoop集群安装目录下的hadoop/hadoop-2.6.0/etc/hadoop目录下的core-site.xml和hdfs-site.xml传输到本地
上述两个XML文件和下载的log4j.properties文件一起拷贝到src中
注:若不清楚上述XML文件如何配置,推荐参考多台Linux虚拟机Hadoop集群的安装与部署(超详细版)
若不添加两个XML文件,会产生如下错误
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/G:/hadoop-2.6.0/share/hadoop/common/lib/hadoop-auth-2.6.0.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/test/input/data.txt
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
at cn.neu.WordCount.main(WordCount.java:60)
- 右击HDFS根目录,点击“Create new directory”
- 输入test后点击“OK”
- 在Project Explorer框内右击,点击Refresh刷新后,即可看到新建的目录
右击test文件夹,在此文件夹下建立目录input,刷新后如下 - 右击input目录,选择Upload files to DFS(HDFS以前也称DFS)
- 选择下载的data.txt文件后,点击“打开”,再次刷新Project Explorer,如下图所示
- WordCount.java代码中有两处参数值,因此需要配置参数
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
在代码编辑处右键鼠标,依次点击“Run As”→“Run Configurations”
点击Arguments,输入上一步骤中设置的data.txt路径和程序最终的输出路径,点击“Apply”后点击“Run”开始运行程序
注意:不可再程序执行前在test目录中新建output目录,output目录务必不存在!否则会产生目录已存在的错误!
- 可能会报出如下错误(若未报该错误,直接跳过此步骤)
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:438)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:484)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
at cn.neu.WordCount.main(WordCount.java:45)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
at java.base/java.lang.String.checkBoundsBeginEnd(Unknown Source)
at java.base/java.lang.String.substring(Unknown Source)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:49)
... 5 more
点击(Shell.java:49)
,进入如下界面,点击Attach Source
进入以下界面后,依次点击“External loaction”→“External file”,根据上图中的路径找到sources文件夹,打开后点击hadoop-common-2.6.0-sources.jar,点击“打开”,最后点击“OK”
再次点击(Shell.java:49)
可查看其源码,定位到第49行,源码如下
private static boolean IS_JAVA7_OR_ABOVE =
System.getProperty(“java.version”).substring(0, 3).compareTo(“1.7”) >= 0;结合如下错误信息
at java.base/java.lang.String.checkBoundsBeginEnd(Unknown Source)
at java.base/java.lang.String.substring(Unknown Source)
即找不到字符串,因此需要在主函数中添加如下代码System.setProperty("java.version", "1.8");
,其中后面的数字比1.7大即可
- 若程序可以正常运行,等待程序运行完毕后,右击Project Explorer中Hadoop下新建的test目录,点击Refresh刷新,可在其中看到output目录
双击part-r-0000文件可查看程序运行结果 - 若要再次执行,要么在参数配置中更改输出目录,要么删除输出路径下的文件
有一个一劳永逸的方法,即在程序中主函数略加改动,即每次进行运算前检查输出路径是否存在,若存在则删除输出路径
改动前
System.setProperty("HADOOP_USER_NAME", "root");
System.setProperty("java.version", "1.8");
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
if(otherArgs.length != 2){
System.err.println("Usage WordCount <int> <out>");
System.exit(2);
}
改动后
System.setProperty("HADOOP_USER_NAME", "root");
System.setProperty("java.version", "1.8");
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
if(otherArgs.length != 2){
System.err.println("Usage WordCount <int> <out>");
System.exit(2);
}
Path outPath = new Path(otherArgs[1]);
if(fs.exists(outPath)) {
fs.delete(outPath, true);
}