安装Hadoop3.2.1(很多坑)

从官网下载hadoop包,hadoop-3.2.1.tar.gz ,342.56M 931KB/s 用时 8m 19s

$ wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz

解压,路径为/home/wang/hadoop/hadoop-3.2.1

$ tar -zxvf hadoop-3.2.1.tar.gz

设置环境变量

$ vim  /etc/profile

加入以下设置

export HADOOP_HOME=/home/wang/hadoop/hadoop-3.2.1
export PATH=$PATH:$HADOOP_HOME/bin

生效

$ source /etc/profile

不成功的话,修改权限,

$ su passwd

改密码:123456

$ su
$ chmod 777 /etc/profile
$ source /etc/profile

检查环境变量是否设置成功

$ hadoop version

报错:ERROR: JAVA_HOME is not set and could not be found.

安装JDK14.0.1,下载地址:

http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

解压:

$ tar -zxvf jdk-14.0.1_linux-x64_bin.tar.gz

配置环境:

$ vim /etc/profile

在文件末尾加上以下内容

export JAVA_HOME=/home/wang/jdk-14.0.1
export JRE_HOME=/home/wang/jdk-14.0.1/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

使其生效

$ source /etc/profile

然后重启电脑

验证安装是否成功

java -version

以下表示安装成功

wang@wang2019:~$ java -version
java version "14.0.1" 2020-04-14
Java(TM) SE Runtime Environment (build 14.0.1+7)
Java HotSpot(TM) 64-Bit Server VM (build 14.0.1+7, mixed mode, sharing)

再查看Hadoop安装成功没

$ hadoop version

以下则表示设置成功

Hadoop 3.2.1
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
Compiled by rohithsharmaks on 2019-09-10T15:56Z
Compiled with protoc 2.5.0
From source with checksum 776eaf9eee9c0ffc370bcbc1888737
This command was run using /home/wang/hadoop-3.2.1/share/hadoop/common/hadoop-common-3.2.1.jar

参数设置

core-site.xml:集群全局参数,定义系统级别的参数,如HDFS URL 、Hadoop的临时目录等
修改/home/wangjie/hadoop-3.2.1/etc/hadoop/core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>file:///</value>
        <value>hdfs://localhost/</value>
    </property>
</configuration>

hdfs-site.xml:namenode,datanode存放位置、文件副本的个数、文件的读取权限等
修改//home/wangjie/hadoop-3.2.1/etc/hadoop/hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

修改//home/wangjie/hadoop-3.2.1/etc/hadoop/mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

修改//home/wangjie/hadoop-3.2.1/etc/hadoop/yarn-site.xml

<configuration>
    <property>
        <name>yarn.resourcemanger.hostname</name>
        <value>localhost</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

初始化hdfs

格式化HDFS,可以执行$ hdfs namenode -format,如果这个命令不行执行以下命令

cd  /home/wang/hadoop-3.2.1/bin
./hdfs namenode -format

启动hadoop

cd /home/wang/hadoop-3.2.1/sbin
./start-all.sh

报如下错误:

WARNING: Attempting to start all Apache Hadoop daemons as wang in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [localhost]
localhost: ssh: connect to host localhost port 22: Connection refused
Starting datanodes
localhost: ssh: connect to host localhost port 22: Connection refused
Starting secondary namenodes [wang2019]
wangjie2019: ssh: connect to host wang2019 port 22: Connection refused
Starting resourcemanager
Starting nodemanagers
localhost: ssh: connect to host localhost port 22: Connection refused

解决:

Ubuntu默认并没有安装ssh服务,如果通过ssh链接Ubuntu,需要自己手动安装openssh-server。判断是否安装ssh服务,可以通过如下命令进行:

ssh localhost

显示:ssh: connect to host localhost port 22: Connection refused
问题分析如下:出现这个问题是因为Ubuntu默认没有安装openssh-server,我们用一个命令来看下,如果只有agent,说明没有安装openssh-server,命令如下:

ps -e|grep ssh

显示: 2015 ? 00:00:00 ssh-agent
既然问题找到了,我们就开始用命令来安装openssh-server,命令如下:

sudo apt-get install openssh-server

安装完成后,我们再用ps -e|grep ssh命令来看下,openssh-server安装上去没有。输入命令后出现如下结果,说明安装完毕。
显示:
2015 ? 00:00:00 ssh-agent
15588 ? 00:00:00 sshd

最后,我们通过ssh localhost命令来看下,这个命令主要用来连接本机如果出现要输入密码,说明成功。

ssh localhost

但又出现下面问题:

WARNING: Attempting to start all Apache Hadoop daemons as wang in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [localhost]
localhost: wang@localhost: Permission denied (publickey,password).
Starting datanodes
localhost: wang@localhost: Permission denied (publickey,password).
Starting secondary namenodes [wang2019]
wang2019: Warning: Permanently added 'wang2019' (ECDSA) to the list of known hosts.
wang2019: wang@wang2019: Permission denied (publickey,password).
Starting resourcemanager
Starting nodemanagers
localhost: wang@localhost: Permission denied (publickey,password).

配置ssh免密码连入

(base) wang@wang2019:~$ cd ~/.ssh
(base) wang@wang2019:~/.ssh$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/wang/.ssh/id_rsa):       
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/wang/.ssh/id_rsa.
Your public key has been saved in /home/wang/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:7VvvtwaEAU/zOk6GG8N02w1G/Ffr2A2ccrLzJHFMzLY wang@wangjie2019
The key's randomart image is:
+---[RSA 2048]----+
|         ..o.+   |
|          o.+.= .|
|         . ooO.oo|
|        o.o.O.Eo.|
|        S=.*.O+oo|
|         .B =o.o.|
|         ....=.  |
|           o ....|
|          .  .+o.|
+----[SHA256]-----+
(base) wang@wang2019:~/.ssh$ ls 
id_rsa  id_rsa.pub  known_hosts
(base) wang@wang2019:~/.ssh$ cat ./id——rsa.pub >> ./authorized_keys
cat: ./id——rsa.pub: 没有那个文件或目录
(base) wang@wang2019:~/.ssh$ cat ./id_rsa.pub >> ./authorized_keys

单机回环ssh免密码登录测试

即在单机结点上用ssh进行登录,看能否登录成功。登录成功后注销退出,过程如下:

(base) wang@wang2019:~/.ssh$ ssh localhost
Welcome to Ubuntu 18.04.4 LTS (GNU/Linux 5.3.0-62-generic x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage


 * Canonical Livepatch is available for installation.
   - Reduce system reboots and improve kernel security. Activate at:
     https://ubuntu.com/livepatch

1 个可升级软件包。
0 个安全更新。

Your Hardware Enablement Stack (HWE) is supported until April 2023.
*** System restart required ***
Last login: Wed Jul  8 09:53:25 2020 from 127.0.0.1
(base) wang@wang2019:~$ exit
注销
Connection to localhost closed.
(base) wang@wang2019:~/.ssh$

如上图所示有以上信息表示操作成功,单点回环SSH登录及注销成功,这将为后续跨子结点SSH远程免密码登录作好准备。

此时出现错误:

WARNING: Attempting to start all Apache Hadoop daemons as wang in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [localhost]
localhost: ERROR: JAVA_HOME is not set and could not be found.
Starting datanodes
localhost: ERROR: JAVA_HOME is not set and could not be found.
Starting secondary namenodes [wangjie2019]
wang2019: ERROR: JAVA_HOME is not set and could not be found.
Starting resourcemanager
Starting nodemanagers
localhost: ERROR: JAVA_HOME is not set and could not be found.

其实是hadoop里面hadoop-env.sh文件里面的java路径设置不对,hadoop-env.sh在hadoop-3.2.1/etc/hadoop目录下,具体的修改办法如下:

将语句 export JAVA_HOME=
修改为 export JAVA_HOME=/home/wang/jdk-14.0.1

再次在输入命令启动Hadoop:

cd hadoop-3.2.1/sbin
./start-all.sh

显示:

WARNING: Attempting to start all Apache Hadoop daemons as wangjie in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [wangjie2019]
Starting resourcemanager
Starting nodemanagers

再次确认启动成功

启动成功后执行jps检查下是否成功启动
$ jps

结果为:

(base) wang@wang2019:~/hadoop-3.2.1/sbin$ jps
23488 NameNode
26931 Jps
23721 DataNode
24010 SecondaryNameNode

hadoop web页面默认地址:http://localhost:9870/

CDH 安装hadoop3_CDH 安装hadoop3

yarn默认地址:http://localhost:8088