一 准备工作
1. 安装JDK https://www.oracle.com/technetwork/java/javase/downloads/index.html
如果安装的Hadoop是3.1.1版本,应选择jdk1.8以上版本
2. 下载HADOOP https://hadoop.apache.org/releases.html
3. 搜索HADOOP在Windows环境需要的工具
在GitHub上搜索winutils,在结果中找跟你下载的Hadoop同一版本的winutils,因为我下的是Hadoop 3.1.1,所以我下载的winutils是https://github.com/dafeng-lin/apache-hadoop-3.1.1-winutils
二 安装JDK
安装JDK后,需要配置JDK的环境,这里可以自行百度。
三 安装和配置Hapood环境
将Hadoop安装到自己想装的目录,目录路径应避免有空格、中文、特殊字符。我这里安装到D:\Hadoop-3.1.1
配置系统环境变量,在PATH中增加hadoop:
先备份D:\hadoop-3.1.1\etc目录,以备出现错误后跟原始文件的对比分析。
创建目录:
D:\hadoop-3.1.1\workplace\data
D:\hadoop-3.1.1\workplace\tmp
修改D:\hadoop-3.1.1\etc\hadoop\core-site.xml
因为一般我们hadoop的执行目录在D盘,所以下面不用写出/D:,用/hadoop-3.1.1会自动找到D盘根目录的Hadoop路径。
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop-3.1.1/workplace/tmp</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/hadoop-3.1.1/workplace/name</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9500</value>
</property>
</configuration>
修改D:\hadoop-3.1.1\etc\hadoop\mapred-site.xml文件
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>hdfs://localhost:9501</value>
</property>
</configuration>
修改D:\hadoop-3.1.1\etc\hadoop\hdfs-site.xml
<configuration>
<!-- 这个参数设置为1,因为是单机版hadoop -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/hadoop-3.1.1/workplace/data</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.http.address</name>
<value>localhost:50070</value>
</property>
</configuration>
修改D:\hadoop-3.1.1\etc\hadoop\yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<!-- NodeManager总的可用虚拟CPU个数 -->
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>1</value>
</property>
<property>
<!-- 每个节点可用的最大内存 -->
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<!-- 中间结果存放位置 -->
<name>yarn.nodemanager.local-dirs</name>
<value>/D:/hadoop-3.1.1/workplace/tmp/nm-local-dir</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/hadoop-3.1.1/logs/yarn</value>
</property>
</configuration>
修改D:\hadoop-3.1.1\etc\hadoop\hadoop-env.cmd
将“set JAVA_HOME=%JAVA_HOME%”注释掉,增加用绝对路径的环境变量:
@set JAVA_HOME=%JAVA_HOME%
set JAVA_HOME=C:\Java\jdk1.8.0_172
四 复制winutils
先将D:\hadoop-3.1.1\bin进行备份
再将apache-hadoop-3.1.1-winutils-master.zip解压后复制到Hadoop相应文件夹
五 格式化hdfs文件系统
执行D:\hadoop-3.1.1\bin\hdfs.cmd namenode -format
hdfs还有许多其他参数的命令,具体可以直接执行hdfs.cmd查看,或者在官网查看。
六 启动hadoop
执行D:\hadoop-3.1.1\sbin\start-all.cmd,回启动4个窗口,需要仔细观察4个窗口的启动日志,看是否有任意异常或错误报出。
如果一切正常,可以访问如下服务:
资源管理:http://localhost:50070/
节点管理:http://localhost:8088
问题记录:
1. 启动Hadoop报错,提示datanode卷错误:
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
2018-09-30 10:45:10,306 INFO datanode.DataNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting DataNode
STARTUP_MSG: host = DESKTOP-ORMM49N/10.200.130.178
STARTUP_MSG: args = []
STARTUP_MSG: version = 3.1.1
STARTUP_MSG: classpath = C:\hadoop-3.1.1\etc\hadoop;C:\hadoop-3.1.1\share\hadoop\common;
... ...
C:\hadoop-3.1.1\share\hadoop\mapreduce\hadoop-mapreduce-examples-3.1.1.jar
STARTUP_MSG: build = https://github.com/apache/hadoop -r 2b9a8c1d3a2caf1e733d57f346af3ff0d5ba529c; compiled by 'leftnoteasy' on 2018-08-02T04:26Z
STARTUP_MSG: java = 1.8.0_172
************************************************************/
2018-09-30 10:45:11,277 INFO checker.ThrottledAsyncChecker: Scheduling a check for [DISK]file:/C:/hadoop-3.1.1/workplace/data
2018-09-30 10:45:11,312 WARN checker.StorageLocationChecker: Exception checking StorageLocation [DISK]file:/C:/hadoop-3.1.1/workplace/data
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Ljava/lang/String;)Lorg/apache/hadoop/io/nativeio/NativeIO$POSIX$Stat;
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.getStat(NativeIO.java:455)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfoByNativeIO(RawLocalFileSystem.java:796)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:710)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:678)
at org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:233)
at org.apache.hadoop.util.DiskChecker.checkDirInternal(DiskChecker.java:141)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:116)
at org.apache.hadoop.hdfs.server.datanode.StorageLocation.check(StorageLocation.java:239)
at org.apache.hadoop.hdfs.server.datanode.StorageLocation.check(StorageLocation.java:52)
at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker$1.call(ThrottledAsyncChecker.java:142)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-09-30 10:45:11,314 ERROR datanode.DataNode: Exception in secureMain
org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 0, volumes configured: 1, volumes failed: 1, volume failures tolerated: 0
at org.apache.hadoop.hdfs.server.datanode.checker.StorageLocationChecker.check(StorageLocationChecker.java:220)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2762)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2677)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2719)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2863)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2887)
2018-09-30 10:45:11,318 INFO util.ExitUtil: Exiting with status 1: org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 0, volumes configured: 1, volumes failed: 1, volume failures tolerated: 0
2018-09-30 10:45:11,321 INFO datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at DESKTOP-ORMM49N/10.200.130.178
************************************************************/
C:\hadoop-3.1.1\sbin>
解决方案:删除提前手动建立的data文件夹,让hadoop自己创建。
2. 启动yarn nodemanager失败,问题原因是winutil文件夹需要跟hadoop版本3.1.1一致的源码编译的包
解决方案:到GITHUB找到跟Hadoop同一版本的winutil文件
3. 无法访问localhost:50070
修改hdfs-site.xml,增加如下配置后,问题解决。
<property>
<name>dfs.http.address</name>
<value>localhost:50070</value>
</property>
但这时根本原因没找到。于是注释掉上面的配置后,再次启动,查看namenode的日志发下下面有意思的情况:
想起自己在host文件里配置了某域名的IP映射:??????.??????.com 0.0.0.0,原来是host配置影响了这里默认site启动地址。
3. yarn nodemanager报错,提示找不到临时路径
2018-11-08 21:15:41,191 INFO localizer.ResourceLocalizationService: Localizer started on port 8040
2018-11-08 21:15:41,192 WARN nativeio.NativeIO: NativeIO.getStat error (3): 系统找不到指定的路径。
-- file path: hadoop-3.1.1/workplace/tmp/nm-local-dir/filecache
2018-11-08 21:15:41,193 INFO ipc.Server: IPC Server listener on 8040: starting
2018-11-08 21:15:41,193 INFO ipc.Server: IPC Server Responder: starting
2018-11-08 21:15:41,259 WARN nativeio.NativeIO: NativeIO.getStat error (3): 系统找不到指定的路径。
-- file path: hadoop-3.1.1/workplace/tmp/nm-local-dir/usercache
2018-11-08 21:15:41,328 WARN nativeio.NativeIO: NativeIO.getStat error (3): 系统找不到指定的路径。
-- file path: hadoop-3.1.1/workplace/tmp/nm-local-dir/nmPrivate
从上面log可以看出,hadoop-3.1.1路径指向不正确,回到yarn-site.xml中,发现配置如下:
<property>
<!-- 中间结果存放位置 -->
<name>yarn.nodemanager.local-dirs</name>
<value>/hadoop-3.1.1/workplace/tmp/nm-local-dir</value>
</property>
这个路径配置方式在core-site.xml是没有问题的。怀疑yarn解析时,去掉了前面的“/”,因此重新配置如下
<property>
<!-- 中间结果存放位置 -->
<name>yarn.nodemanager.local-dirs</name>
<value>/D:/hadoop-3.1.1/workplace/tmp/nm-local-dir</value>
</property>
重启yarn nodemanager,问题解决。