Hadoop基本安装配置主要包括以下五个步骤
1、 创建Hadoop用户
2、 安装Java
3、 设置SSH登录权限
4、 单机安装配置
5、 伪分布式安装配置
6、 完全分布式安装
本文使用虚拟机部署方式实现Hadoop集群,虚拟机环境采用Virtual box+Ubuntu16.04,一个master,两个slaver节点(本文中node1为master,node2和node3为slaver),主机操作系统环境为Windows10
一、创建Hadoop用户
为方便操作,我们创建一个名为“hadoop”的用户来运行程序,这样可以使不同用户之间有明确的权限区别,同时,也可以针对Hadoop的配置操作而不影响其他用户的使用。
sudo adduser 用户名 #增加用户,比如 sudo adduser hadoop
#一些其他命令
sudo passwd 用户名 #修改用户密码
sudo chfn 用户名 #修改用户资料
sudo deluser 用户名 #删除用户
su 用户名 #切换到其他用户(需要该用户的密码)
sudo vim /etc/hostname #修改主机名
之后需要为hadoop用户添加sudo权限,可以使用命令
sudo adduser hadoop sudo
或者直接编辑 /etc/sudoers文件,在%sudo ALL=(ALL:ALL) ALL后添加一行:hadoop ALL=(ALL:ALL) ALL,其中hadoop是用户名
#
# This file MUST be edited with the 'visudo' command as root.
#
# Please consider adding local content in /etc/sudoers.d/ instead of
# directly modifying this file.
#
# See the man page for details on how to write a sudoers file.
#
Defaults env_reset
Defaults mail_badpass
Defaults secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin"
# Host alias specification
# User alias specification
# Cmnd alias specification
# User privilege specification
root ALL=(ALL:ALL) ALL
# Members of the admin group may gain root privileges
%admin ALL=(ALL) ALL
# Allow members of group sudo to execute any command
%sudo ALL=(ALL:ALL) ALL
hadoop ALL=(ALL:ALL) ALL
# See sudoers(5) for more information on "#include" directives:
#includedir /etc/sudoers.d
创建好hadoop用户后(每个节点上都创建hadoop用户),使用该用户进行下面操作
二、安装java
可参考 ubuntu16.04安装jdk ,建议安装jdk 1.8版本(每个节点上都要安装jdk)
三、设置SSH登录权限
对于Hadoop的伪分布和全分布而言,Hadoop的名称节点(NameNode)需要启动集群中的所有机器的Hadoop守护进程,这个过程可以通过SSH登录来实现。Hadoop并没有提供SSH输入密码登录的形式,因此,为了能够顺利登录每台机器,需要将所有机器配置为名称节点可以无密码登录它们。
可以将NameNode节点视为master节点,被调度节点视为slave节点,因此需要master节点可以通过SSH访问slave节点
1、首先在slave节点上查看SSH进程是否启动
ps -e | grep ssh
如果有如下输出,说明进程已经启动
3167 ? 00:00:00 sshd
否则需要安装OpenSSH -Server,可以通过以下命令安装
sudo apt-get install openssh-server
安装完毕后,需再次检查服务是否启动
2、尝试在master节点上使用SSH+密码方式登录salve节点
可以使用
ssh hadoop@192.168.13.227 #hadoop是slaver用户名,ip为slaver ip
或者
ssh 192.168.13.227
命令来尝试登陆。能够登录证明两个虚拟机(计算机)之间的连接正常
3、在master 节点生成密钥
ssh-keygen -t rsa
默认生成路径为/home/hadoop/.ssh/, 生成密钥后会在该目录下生成id_rsa和id_rsa.pub两个文件。其中.ssh是一个隐藏文件夹,可以使用ls -a命令查看隐藏文件。
4、执行ssh-copy-id 该指令将master生成的密钥,写入到slaver节点的 /home/hadoop/.ssh/authorized_keys 文件中
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@192.168.13.227
hadoop是slaver节点的用户名,192.168.13.227是slaver节点的地址
5、从master节点使用SSH方式登录slaver节点
ssh hadoop@192.168.13.227
此时,登录已经不需要密码了。请注意,这个密钥只对hadoop这个账户生效。
四、安装单机Hadoop:单机模式,即只在一台机器上运行,存储采用本地文件系统,没有采用分布式文件系统HDFS。
Hadoop的下载:http://mirrors.cnnic.cn/apache/hadoop/common 本文下载的是2.10.1版本
本文单机安装在master节点上操作,将下载的文件夹解压后,可以放置到自己喜欢的位置,如“/usr/local/hadoop”文件夹下,注意。文件夹的用户和组必须都为hadoop
对于单机安装,首先需要更改hadoop-env.sh文件,以配置hadoop的运行的环境变量,这里只需要将JAVA_HOME环境变量指定到本机的JDK目录即可。具体命令如下:
$ su hadoop #切换用户为hadoop
$ sudo tar -zxvf hadoop-2.10.1.tar.gz -C /usr/local #解压文件
$ cd /usr/local/ #进入该文件夹下
$ sudo mv ./hadoop-2.10.1/ ./hadoop # 将文件夹名改为hadoop
$ echo $JAVA_HOME #因为我们前文安装过java环境了,再次查看一下是否正确,安装正确会输出路径
$ cd /usr/local/hadoop/bin #进入hadoop的bin目录
$ ./hadoop version #查看hadoop的版本信息,安装正确会输出版本信息
$ cd /usr/local
$ sudo chown -R hadoop ./hadoop # 修改文件所属用户为hadoop
$ sudo chgrp -R hadoop ./hadoop # 修改文件所属组为hadoop
$ ll #查看hadoop文件夹的用户和组是否都改为了hadoop
单机安装完毕后,我们使用Hadoop提供给我们的例子做一下测试,这里我们使用grep例子做测试。首先,在hadoop的目录下新建input文件夹,用来存放输入数据;然后,将/etc/hadoop文件夹下的配置文件拷贝到input文件夹中;接下来,在hadoop目录下新建output文件夹,用来存放输出数据;执行代码如下:
$ cd /usr/local/hadoop
$ mkdir input #创建文件夹
$ cp ./etc/hadoop/*.xml ./input # 将配置文件复制到input目录下
$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep ./input ./output 'dfs[a-z.]+'
$ cat ./output/* # 查看运行结果
1 dfsadmin #这是输出结果
运行成功后,可以看到grep程序将input文件夹作为输入,从文件夹中筛选出所有符合正则表达式dfs[a-z]+的单词,并把单词出现的次数的统计结果输出到/usr/local/hadoop/output文件夹下。注意:如果再次运行上述命令,会报错,因为Hadoop默认不会覆盖output输出结果的文件夹,所有需要先删除output文件夹才能再次运行。
五、伪分布式安装
伪分布式安装是指在一台机器上模拟一个小的集群,但是集群中只有一个节点。当Hadoop应用于集群时,不论是伪分布式还是真正的分布式运行,都需要通过配置文件对各组件的协同工作进行设置,最重要的几个配置文件见下表
文件名称 | 格式 | 描述 |
hadoop-env.sh | Bash脚本 | 记录配置Hadoop运行所需的环境变量,以运行Hadoop |
core-size.xml | Hadoop配置XML | Hadoop core的配置项,如HDFS和MapReduce常用的I/O设置等 |
hdfs-size.xml | Hadoop的配置XML | Hadoop守护进程的配置项,包括NameNode、SecondaryNameNode和DataNode等 |
mapred-site.xml | Hadoop配置XML | MapReduce守护进程的配置项,包括JobTracker和TaskTracker |
masters | 纯文本 | 运行SecondaryNameNode的机器列表(每行一个) |
slaves | 纯文本 | 运行DataNode和TaskTracker的机器列表(每行一个) |
hadoop-metrices.properties | Java属性 | 控制metrics在Hadoop上如何发布的属性 |
1、对于伪分布式配置,我们需要修改/usr/local/hadoop/etc/hadoop/下的:core-site.xml,hdfs-site.xml这俩个文件。本文伪分布式安装也是在master节点上操作。
core-site.xml中 < configuration> < /configuration>修改如下:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
可以看出,core-site.xml配置文件的格式十分简单,< name > 标签代表了配置项的名字;hadoop.tmp.dir用于保存临时文件,如果没有配置这个参数,则默认使用的临时目录为/tmp/hadoo-hadoop,这个目录在Hadoop重启后会被系统清理掉;< value >项设置的是配置的值;对于core-site.xml文件,我们只需要在其中指定HDFS的地址和端口号,端口号按照官方文档设置为9000即可。
hdfs-site.xml修改如下:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
dfs.replicaion:指定副本数量,我们设置其值为1,这也是hadoop运行的默认最小值,它限制了HDFS文件系统中同一份数据的副本数量。因为这里采用伪分布式,集群中只有一个节点,因此副本数量replication的值也只能设置为1。dfs.namenode.name.dir:设定名称节点元数据的保存目录。dfs.datanode.data.dir:设定数据节点的数据保存目录
2、配置完成后,需要初始化文件系统
命令如下:
$ cd /usr/local/hadoop
$ ./bin/hdfs namenode -format
输出如下,表示初始化成功,如果初始化失败,请先检查core-site.xml和hdfs-site.xml是否出现配置出错的问题
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
21/04/24 20:05:51 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = node1/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.10.1
STARTUP_MSG: classpath = /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/stax2-api-3.1.4.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-lang3-3.4.jar:/usr/local/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/common/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/gson-2.2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jets3t-0.9.0.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-net-3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-annotations-2.10.1.jar:/usr/local/hadoop/share/hadoop/common/lib/zookeeper-3.4.14.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar:/usr/local/hadoop/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/httpcore-4.4.4.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-recipes-2.13.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jcip-annotations-1.0-1.jar:/usr/local/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/local/hadoop/share/hadoop/common/lib/avro-1.7.7.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/common/lib/httpclient-4.5.2.jar:/usr/local/hadoop/share/hadoop/common/lib/json-smart-1.3.1.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-auth-2.10.1.jar:/usr/local/hadoop/share/hadoop/common/lib/nimbus-jose-jwt-7.9.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-framework-2.13.0.jar:/usr/local/hadoop/share/hadoop/common/lib/woodstox-core-5.0.3.jar:/usr/local/hadoop/share/hadoop/common/lib/spotbugs-annotations-3.1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/snappy-java-1.0.5.jar:/usr/local/hadoop/share/hadoop/common/lib/mockito-all-1.8.5.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/common/lib/hamcrest-core-1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/jsch-0.1.55.jar:/usr/local/hadoop/share/hadoop/common/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/common/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/common/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jsr305-3.0.2.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-api-1.7.25.jar:/usr/local/hadoop/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-sslengine-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-digester-1.8.jar:/usr/local/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-json-1.9.jar:/usr/local/hadoop/share/hadoop/common/lib/netty-3.10.6.Final.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-1.9.4.jar:/usr/local/hadoop/share/hadoop/common/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-compress-1.19.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-client-2.13.0.jar:/usr/local/hadoop/share/hadoop/common/lib/audience-annotations-0.5.0.jar:/usr/local/hadoop/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/junit-4.11.jar:/usr/local/hadoop/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-configuration-1.6.jar:/usr/local/hadoop/share/hadoop/common/hadoop-nfs-2.10.1.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-2.10.1-tests.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-2.10.1.jar:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/hadoop-hdfs-client-2.10.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-annotations-2.9.10.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xercesImpl-2.12.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xml-apis-1.4.01.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/okhttp-2.7.5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jsr305-3.0.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-databind-2.9.10.6.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-core-2.9.10.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-3.10.6.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-all-4.1.50.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/okio-1.6.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/htrace-core4-4.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-client-2.10.1.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.10.1.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-rbf-2.10.1.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-rbf-2.10.1-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-native-client-2.10.1.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-client-2.10.1-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-2.10.1-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-native-client-2.10.1-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-nfs-2.10.1.jar:/usr/local/hadoop/share/hadoop/yarn:/usr/local/hadoop/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/yarn/lib/stax2-api-3.1.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-lang3-3.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/activation-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/gson-2.2.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jets3t-0.9.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-net-3.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guava-11.0.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-client-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/zookeeper-3.4.14.jar:/usr/local/hadoop/share/hadoop/yarn/lib/geronimo-jcache_1.0_spec-1.0-alpha-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/api-util-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/yarn/lib/HikariCP-java7-2.4.12.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-math3-3.1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/json-io-2.5.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/java-util-1.9.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/httpcore-4.4.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/curator-recipes-2.13.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jcip-annotations-1.0-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jsp-api-2.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/mssql-jdbc-6.2.1.jre7.jar:/usr/local/hadoop/share/hadoop/yarn/lib/fst-2.50.jar:/usr/local/hadoop/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/yarn/lib/avro-1.7.7.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-codec-1.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/httpclient-4.5.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/json-smart-1.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/nimbus-jose-jwt-7.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/curator-framework-2.13.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/woodstox-core-5.0.3.jar:/usr/local/hadoop/share/hadoop/yarn/lib/spotbugs-annotations-3.1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/snappy-java-1.0.5.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-lang-2.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jsch-0.1.55.jar:/usr/local/hadoop/share/hadoop/yarn/lib/xmlenc-0.52.jar:/usr/local/hadoop/share/hadoop/yarn/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jsr305-3.0.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/apacheds-i18n-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-sslengine-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/yarn/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-digester-1.8.jar:/usr/local/hadoop/share/hadoop/yarn/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/yarn/lib/jersey-json-1.9.jar:/usr/local/hadoop/share/hadoop/yarn/lib/netty-3.10.6.Final.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-beanutils-1.9.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/servlet-api-2.5.jar:/usr/local/hadoop/share/hadoop/yarn/lib/api-asn1-api-1.0.0-M20.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-compress-1.19.jar:/usr/local/hadoop/share/hadoop/yarn/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/curator-client-2.13.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/metrics-core-3.0.1.jar:/usr/local/hadoop/share/hadoop/yarn/lib/audience-annotations-0.5.0.jar:/usr/local/hadoop/share/hadoop/yarn/lib/java-xmlbuilder-0.4.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/yarn/lib/htrace-core4-4.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/yarn/lib/commons-configuration-1.6.jar:/usr/local/hadoop/share/hadoop/yarn/lib/ehcache-3.3.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.10.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-registry-2.10.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-tests-2.10.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.10.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.10.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-api-2.10.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-common-2.10.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.10.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.10.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-common-2.10.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.10.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-router-2.10.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-client-2.10.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.10.1.jar:/usr/local/hadoop/share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-2.10.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/hadoop-annotations-2.10.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/avro-1.7.7.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/snappy-java-1.0.5.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/javax.inject-1.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/asm-3.2.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/guice-3.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/netty-3.10.6.Final.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/commons-compress-1.19.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/usr/local/hadoop/share/hadoop/mapreduce/lib/junit-4.11.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.10.1-tests.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.10.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.10.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.10.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.10.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.10.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.10.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.10.1.jar:/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.1.jar:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar
STARTUP_MSG: build = https://github.com/apache/hadoop -r 1827467c9a56f133025f28557bfc2c562d78e816; compiled by 'centos' on 2020-09-14T13:17Z
STARTUP_MSG: java = 1.8.0_291
************************************************************/
21/04/24 20:05:51 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
21/04/24 20:05:51 INFO namenode.NameNode: createNameNode [-format]
Java HotSpot(TM) Client VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
21/04/24 20:05:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Formatting using clusterid: CID-48db268d-e942-412c-b33e-724efc960870
21/04/24 20:05:55 INFO namenode.FSEditLog: Edit logging is async:true
21/04/24 20:05:56 INFO namenode.FSNamesystem: KeyProvider: null
21/04/24 20:05:56 INFO namenode.FSNamesystem: fsLock is fair: true
21/04/24 20:05:56 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
21/04/24 20:05:56 INFO namenode.FSNamesystem: fsOwner = hadoop (auth:SIMPLE)
21/04/24 20:05:56 INFO namenode.FSNamesystem: supergroup = supergroup
21/04/24 20:05:56 INFO namenode.FSNamesystem: isPermissionEnabled = true
21/04/24 20:05:56 INFO namenode.FSNamesystem: HA Enabled: false
21/04/24 20:05:56 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
21/04/24 20:05:56 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000
21/04/24 20:05:56 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
21/04/24 20:05:56 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
21/04/24 20:05:56 INFO blockmanagement.BlockManager: The block deletion will start around 2021 四月 24 20:05:56
21/04/24 20:05:56 INFO util.GSet: Computing capacity for map BlocksMap
21/04/24 20:05:56 INFO util.GSet: VM type = 32-bit
21/04/24 20:05:56 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
21/04/24 20:05:56 INFO util.GSet: capacity = 2^22 = 4194304 entries
21/04/24 20:05:56 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
21/04/24 20:05:56 WARN conf.Configuration: No unit for dfs.heartbeat.interval(3) assuming SECONDS
21/04/24 20:05:56 WARN conf.Configuration: No unit for dfs.namenode.safemode.extension(30000) assuming MILLISECONDS
21/04/24 20:05:56 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
21/04/24 20:05:56 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
21/04/24 20:05:56 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
21/04/24 20:05:56 INFO blockmanagement.BlockManager: defaultReplication = 1
21/04/24 20:05:56 INFO blockmanagement.BlockManager: maxReplication = 512
21/04/24 20:05:56 INFO blockmanagement.BlockManager: minReplication = 1
21/04/24 20:05:56 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
21/04/24 20:05:56 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
21/04/24 20:05:56 INFO blockmanagement.BlockManager: encryptDataTransfer = false
21/04/24 20:05:56 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
21/04/24 20:05:56 INFO namenode.FSNamesystem: Append Enabled: true
21/04/24 20:05:56 INFO namenode.FSDirectory: GLOBAL serial map: bits=24 maxEntries=16777215
21/04/24 20:05:57 INFO util.GSet: Computing capacity for map INodeMap
21/04/24 20:05:57 INFO util.GSet: VM type = 32-bit
21/04/24 20:05:57 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
21/04/24 20:05:57 INFO util.GSet: capacity = 2^21 = 2097152 entries
21/04/24 20:05:57 INFO namenode.FSDirectory: ACLs enabled? false
21/04/24 20:05:57 INFO namenode.FSDirectory: XAttrs enabled? true
21/04/24 20:05:57 INFO namenode.NameNode: Caching file names occurring more than 10 times
21/04/24 20:05:57 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: falseskipCaptureAccessTimeOnlyChange: false
21/04/24 20:05:57 INFO util.GSet: Computing capacity for map cachedBlocks
21/04/24 20:05:57 INFO util.GSet: VM type = 32-bit
21/04/24 20:05:57 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
21/04/24 20:05:57 INFO util.GSet: capacity = 2^19 = 524288 entries
21/04/24 20:05:57 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
21/04/24 20:05:57 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
21/04/24 20:05:57 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
21/04/24 20:05:57 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
21/04/24 20:05:57 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
21/04/24 20:05:57 INFO util.GSet: Computing capacity for map NameNodeRetryCache
21/04/24 20:05:57 INFO util.GSet: VM type = 32-bit
21/04/24 20:05:57 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
21/04/24 20:05:57 INFO util.GSet: capacity = 2^16 = 65536 entries
21/04/24 20:05:57 INFO namenode.FSImage: Allocated new BlockPoolId: BP-529316488-127.0.1.1-1619265957484
21/04/24 20:05:57 INFO common.Storage: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted.
21/04/24 20:05:57 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
21/04/24 20:05:58 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 325 bytes saved in 0 seconds .
21/04/24 20:05:58 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
21/04/24 20:05:58 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid = 0 when meet shutdown.
21/04/24 20:05:58 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at node1/127.0.1.1
************************************************************/
3、初始化完成后,启动Hadoop
$ cd /usr/local/hadoop
$ ./sbin/start-dfs.sh
结果报错,如下
hadoop@node1:/usr/local/hadoop$ ./sbin/start-dfs.sh
Java HotSpot(TM) Client VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
21/04/24 20:18:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: ssh: connect to host localhost port 22: Connection refused
localhost: ssh: connect to host localhost port 22: Connection refused
Starting secondary namenodes [0.0.0.0]
0.0.0.0: ssh: connect to host 0.0.0.0 port 22: Connection refused
Java HotSpot(TM) Client VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
21/04/24 20:18:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
解决如下:
$ ssh localhost #使用该命令发现没有安装openssh-server
$ sudo apt-get install openssh-server #安装
$ ps -e|grep ssh #安装后查看ssh-server是否启动了
之后继续重新初始化,结果又报错,如下
localhost: Error: JAVA_HOME is not set and could not be found.
解决如下:
hadoop@node1:/usr/local/hadoop$ echo $JAVA_HOME
/usr/lib/jdk/jdk1.8.0
发现JAVA_HOME路径已经设置了,因此需要将/hadoop/etc/hadoop/hadoop-env.sh文件的JAVA_HOME改为绝对路径。将export JAVA_HOME=$JAVA_HOME改为export JAVA_HOME=/usr/lib/jdk/jdk1.8.0
之后重新启动hadoop(根据提示输入密码),发现没有错误了,但是有警告如下(警告并不影响实际的使用):
21/04/24 20:37:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
解决方法如下(慎用):在系统根目录下 /etc/profile添加环境变量
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
之后使修改生效
source /etc/profile
并把相同配置添加到hadoop-env.sh文件末尾。此方法在本人环境上并未解决该警告问题,因此在这里仅提供一个参考思路,慎用!
用jps命令查看Hadoop是否启动成功,如果出现DataNode、NameNode、SecondaryNameNode的进程说明启动成功。如下所示:
hadoop@node1:/usr/local/hadoop$ jps
11203 DataNode
11367 SecondaryNameNode
11047 NameNode
11583 Jps
如果还有问题,请重复执行以下命令,并根据错误日志解决相应问题
$ ./sbin/stop-dfs.sh # 关闭hadoop
$ rm -r ./tmp # 删除 tmp 文件,注意这会删除 HDFS中原有的所有数据
$ ./bin/hdfs namenode -format # 重新格式化名称节点
$ ./sbin/start-dfs.sh # 重启hadoop
4、 使用浏览器查看HDFS信息
访问web界面:http://localhost:50070 来查看hadoop的信息
5、运行Hadoop伪分布式实例
$ cd /usr/local/hadoop
$ ./bin/hdfs dfs -mkdir -p /user/hadoop # 在HDFS中创建用户目录
$ ./bin/hdfs dfs -mkdir input #在HDFS中创建hadoop用户对应的input目录
$ ./bin/hdfs dfs -put ./etc/hadoop/*.xml input #把本地文件复制到HDFS中
$ ./bin/hdfs dfs -ls input #查看文件列表
Found 8 items
-rw-r--r-- 1 hadoop supergroup 4436 2019-01-11 19:35 input/capacity-scheduler.xml
-rw-r--r-- 1 hadoop supergroup 1075 2019-01-11 19:35 input/core-site.xml
-rw-r--r-- 1 hadoop supergroup 9683 2019-01-11 19:35 input/hadoop-policy.xml
-rw-r--r-- 1 hadoop supergroup 1130 2019-01-11 19:35 input/hdfs-site.xml
-rw-r--r-- 1 hadoop supergroup 620 2019-01-11 19:35 input/httpfs-site.xml
-rw-r--r-- 1 hadoop supergroup 3518 2019-01-11 19:35 input/kms-acls.xml
-rw-r--r-- 1 hadoop supergroup 5511 2019-01-11 19:35 input/kms-site.xml
-rw-r--r-- 1 hadoop supergroup 690 2019-01-11 19:35 input/yarn-site.xml
$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs[a-z.]+'
....
$ ./bin/hdfs dfs -cat output/* #查看运行结果
$ cd /usr/local/hadoop
$ ./bin/hdfs dfs -mkdir -p /user/hadoop # 在HDFS中创建用户目录
$ ./bin/hdfs dfs -mkdir input #在HDFS中创建hadoop用户对应的input目录
$ ./bin/hdfs dfs -put ./etc/hadoop/*.xml input #把本地文件复制到HDFS中
$ ./bin/hdfs dfs -ls input #查看文件列表
Found 8 items
-rw-r--r-- 1 hadoop supergroup 8814 2021-04-24 21:21 input/capacity-scheduler.xml
-rw-r--r-- 1 hadoop supergroup 1076 2021-04-24 21:21 input/core-site.xml
-rw-r--r-- 1 hadoop supergroup 10206 2021-04-24 21:21 input/hadoop-policy.xml
-rw-r--r-- 1 hadoop supergroup 1134 2021-04-24 21:21 input/hdfs-site.xml
-rw-r--r-- 1 hadoop supergroup 620 2021-04-24 21:21 input/httpfs-site.xml
-rw-r--r-- 1 hadoop supergroup 3518 2021-04-24 21:21 input/kms-acls.xml
-rw-r--r-- 1 hadoop supergroup 5939 2021-04-24 21:21 input/kms-site.xml
-rw-r--r-- 1 hadoop supergroup 690 2021-04-24 21:21 input/yarn-site.xml
$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs[a-z.]+'
....
$ ./bin/hdfs dfs -cat output/* #查看运行结果
1 dfsadmin
1 dfs.replication
1 dfs.namenode.name.dir
1 dfs.datanode.data.dir
再次运行需要删除output文件夹
$ ./bin/hdfs dfs -rm -r output # 删除 output 文件夹
6、关闭hadoop
./sbin/stop-dfs.sh
下次启动时不需要再执行节点格式化命令(否则会报错),只需要直接运行start-dfs.sh命令即可。
六、完全分布式安装
完全分布式模式:存储采用分布式文件系统HDFS,而且HDFS的节点和数据节点位于不同机器上。
为了便于区分,可以修改各个节点的主机名,通过 sudo vim /etc/hostname 命令修改主机名,本文主机名分别为node1,node2,node3,其中node1为master,node2和node3为slaver,然后执行如下命令修改自己所用节点的IP映射:
sudo vim /etc/hosts
例如本教程使用节点的名称与对应的 IP 关系如下:
192.168.43.202 node1
192.168.43.106 node2
我们在 /etc/hosts 中将该映射关系填写上去即可,如下图所示(一般该文件中只有一个 127.0.0.1,其对应名为 localhost,如果有多余的应删除,特别是不能有 “127.0.0.1 node1” 这样的记录):
127.0.0.1 localhost
192.168.43.202 node1
192.168.43.106 node2
192.168.43.177 node3
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
注意:使用虚拟机动态配置IP,当网络断线重连时,虚拟机节点的IP会发生变化,会造成SSH连接无法到达的情况,因此建议为虚拟机设置静态IP
需要在所有节点上完成网络配置。如上面讲的是 Master 节点的配置,而在其他的 Slave 节点上,也要对 /etc/hostname(修改为 Slave1、Slave2 等) 和 /etc/hosts(跟 Master 的配置一样)这两个文件进行修改!完成后需要重启一下,重启后在终端中才会看到机器名的变化。
配置好后需要在各个节点上执行如下命令,测试是否相互 ping 得通,如果 ping 不通,后面就无法顺利配置成功:
ping node1 -c 3 # 只ping 3次,否则要按 Ctrl+c 中断
ping node2 -c 3
ping node3 -c 3
参考上文我们伪分布式环境的配置,完成一,二和三步以及安装好Hadoop后,就可以将 Hadoop 的安装目录加入 PATH 变量中,这样就可以在任意目录中直接使用 hadoop、hdfs 等命令了,如果还没有配置的,需要在 master 节点上进行配置。首先执行 sudo gedit /etc/profile,加入一行
export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
保存后执行 source /etc/profile 使配置生效。
集群/分布式模式需要修改 /usr/local/hadoop/etc/hadoop 中的5个配置文件,更多设置项可点击查看官方说明,这里仅设置了正常启动所必须的设置项: slaves、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml 。
1、文件 slaves,将作为 DataNode 的主机名写入该文件,每行一个,默认为 localhost,所以在伪分布式配置时,节点即作为 NameNode 也作为 DataNode。完全分布式配置可以保留 localhost,也可以删掉,让 master 节点仅作为 NameNode 使用。本教程让 master 节点仅作为 NameNode 使用,因此将文件中原来的 localhost 删除,添加Hadoop集群的slave节点列表,不在列表中的slave节点便不会被视为计算节点。
执行编辑slaves文件命令:
vim /usr/local/hadoop/etc/hadoop/slaves
加入以下节点(注意我的slaver节点是node2和node3):
node2
node3
2、core-site.xml,位于/opt/hadoop/hadoop/etc/hadoop子目录下,用vi编辑core-site.xml文件,改为下面的配置:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://node1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
</configuration>
注意< value >hdfs://node1:9000 < /value >中node1是我的master节点上的主机名,应该根据具体环境进行修改
3、文件 hdfs-site.xml,dfs.replication 一般设为 3,但我们只有2个 Slave 节点,所以 dfs.replication 的值设为 2:
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node1:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/hadoop/tmp/dfs/data</value>
</property>
</configuration>
4、文件 mapred-site.xml (可能需要先重命名,默认文件名为 mapred-site.xml.template),然后配置修改如下:
执行复制和改名操作命令:
cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
配置如下:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>node1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>node1:19888</value>
</property>
</configuration>
5、文件 yarn-site.xml:
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>node1:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>node1:18030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>node1:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>node1:18141</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>node1:18088</value>
</property>
</configuration>
6、配置好后,将 master上的 /usr/local/hadoop 文件夹复制到各个节点上。因为之前有跑过伪分布式模式,建议在切换到集群模式前先删除之前的临时文件。在 master 节点上执行:
cd /usr/local
sudo rm -r ./hadoop/tmp # 删除 Hadoop 临时文件
sudo rm -r ./hadoop/logs/* # 删除日志文件
tar -zcf ~/hadoop.master.tar.gz ./hadoop # 先压缩再复制
cd ~
scp ./hadoop.master.tar.gz hadoop@node2:/home/hadoop #复制到salver节点node2
scp ./hadoop.master.tar.gz hadoop@node3:/home/hadoop #复制到salver节点node3
在 slaver节点(即node2和node3)上执行:
sudo rm -r /usr/local/hadoop # 删掉旧的(如果存在)
sudo tar -zxf ~/hadoop.master.tar.gz -C /usr/local
sudo chown -R hadoop /usr/local/hadoop #设置所属名为hadoop
7、首次启动需要先在 master 节点执行 NameNode 的格式化:
hdfs namenode -format # 首次运行需要执行初始化,之后不需要
接着可以启动 hadoop 了,启动需要在 master 节点上进行:
cd /usr/local/hadoop/sbin
./start-dfs.sh
./start-yarn.sh
./mr-jobhistory-daemon.sh start historyserver
通过命令 jps 可以查看各个节点所启动的进程。正确的话,在 master 节点上可以看到 NameNode、ResourceManager、SecondrryNameNode、JobHistoryServer 进程,如下所示:
hadoop@node1:/usr/local/hadoop$ jps
3072 Jps
2340 NameNode
2984 JobHistoryServer
2525 SecondaryNameNode
2702 ResourceManager
hadoop@node1:/usr/local/hadoop$
在 Slave 节点可以看到 DataNode 和 NodeManager 进程,如下图所示:
hadoop@node2:/usr/local/hadoop$ jps
2544 Jps
2273 DataNode
2376 NodeManager
hadoop@node3:/usr/local/hadoop$ jps
2172 DataNode
2271 NodeManager
2463 Jps
缺少任一进程都表示出错。另外还需要在 Master 节点上通过命令 hdfs dfsadmin -report 查看 DataNode 是否正常启动,如果 Live datanodes 不为 0 ,则说明集群启动成功。例如我这边一共有 2 个 Datanodes:
hadoop@node1:/usr/local/hadoop$ hdfs dfsadmin -report
Java HotSpot(TM) Client VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
21/04/26 20:03:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 18850250752 (17.56 GB)
Present Capacity: 4380217344 (4.08 GB)
DFS Remaining: 4380168192 (4.08 GB)
DFS Used: 49152 (48 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
Pending deletion blocks: 0
-------------------------------------------------
Live datanodes (2):
Name: 192.168.43.106:50010 (node2)
Hostname: node2
Decommission Status : Normal
Configured Capacity: 9425125376 (8.78 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 6671843328 (6.21 GB)
DFS Remaining: 2250887168 (2.10 GB)
DFS Used%: 0.00%
DFS Remaining%: 23.88%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Apr 26 20:03:19 CST 2021
Last Block Report: Mon Apr 26 19:58:04 CST 2021
Name: 192.168.43.177:50010 (node3)
Hostname: node3
Decommission Status : Normal
Configured Capacity: 9425125376 (8.78 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 6793449472 (6.33 GB)
DFS Remaining: 2129281024 (1.98 GB)
DFS Used%: 0.00%
DFS Remaining%: 22.59%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Apr 26 20:03:20 CST 2021
Last Block Report: Mon Apr 26 19:58:05 CST 2021
也可以通过 Web 页面看到查看 DataNode 和 NameNode 的状态:http://node1:50070/。如果不成功,可以通过启动日志排查原因。
8、执行分布式实例
执行分布式实例过程与伪分布式模式一样,首先创建 HDFS 上的用户目录:
hdfs dfs -mkdir -p /user/hadoop
将 /usr/local/hadoop/etc/hadoop 中的配置文件作为输入文件复制到分布式文件系统中:
hdfs dfs -mkdir input
hdfs dfs -put /usr/local/hadoop/etc/hadoop/*.xml input
通过查看 DataNode 的状态(占用大小有改变),输入文件确实复制到了 DataNode 中,如下图所示:
接着就可以运行 MapReduce 作业了:
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs[a-z.]+'
运行时的输出信息与伪分布式类似,会显示 Job 的进度。可能会有点慢,但如果迟迟没有进度,比如 5 分钟都没看到进度,那不妨重启 Hadoop 再试试。若重启还不行,则很有可能是内存不足引起,建议增大虚拟机的内存,或者通过更改 YARN 的内存配置解决。
...
21/04/26 20:24:33 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.43.202:18040
21/04/26 20:24:35 INFO input.FileInputFormat: Total input files to process : 1
21/04/26 20:24:36 INFO mapreduce.JobSubmitter: number of splits:1
21/04/26 20:24:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1619438292942_0002
21/04/26 20:24:37 INFO impl.YarnClientImpl: Submitted application application_1619438292942_0002
21/04/26 20:24:37 INFO mapreduce.Job: The url to track the job: http://node1:18088/proxy/application_1619438292942_0002/
21/04/26 20:24:37 INFO mapreduce.Job: Running job: job_1619438292942_0002
21/04/26 20:24:57 INFO mapreduce.Job: Job job_1619438292942_0002 running in uber mode : false
21/04/26 20:24:57 INFO mapreduce.Job: map 0% reduce 0%
21/04/26 20:25:06 INFO mapreduce.Job: map 100% reduce 0%
21/04/26 20:25:16 INFO mapreduce.Job: map 100% reduce 100%
21/04/26 20:25:17 INFO mapreduce.Job: Job job_1619438292942_0002 completed successfully
21/04/26 20:25:17 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=153
FILE: Number of bytes written=416619
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=390
HDFS: Number of bytes written=107
HDFS: Number of read operations=7
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=6535
Total time spent by all reduces in occupied slots (ms)=5743
Total time spent by all map tasks (ms)=6535
Total time spent by all reduce tasks (ms)=5743
Total vcore-milliseconds taken by all map tasks=6535
Total vcore-milliseconds taken by all reduce tasks=5743
Total megabyte-milliseconds taken by all map tasks=6691840
Total megabyte-milliseconds taken by all reduce tasks=5880832
Map-Reduce Framework
Map input records=5
Map output records=5
Map output bytes=137
Map output materialized bytes=153
Input split bytes=127
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=153
Reduce input records=5
Reduce output records=5
Spilled Records=10
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=274
CPU time spent (ms)=1290
Physical memory (bytes) snapshot=254038016
Virtual memory (bytes) snapshot=729309184
Total committed heap usage (bytes)=137498624
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=263
File Output Format Counters
Bytes Written=107
执行完毕后的输出结果:
hadoop@node1:/usr/local/hadoop$ ./bin/hdfs dfs -cat output/*
Java HotSpot(TM) Client VM warning: You have loaded library /usr/local/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
21/04/26 20:27:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1 dfsadmin
1 dfs.replication
1 dfs.namenode.secondary.http
1 dfs.namenode.name.dir
1 dfs.datanode.data.dir
关闭 Hadoop 集群也是在 Master 节点上执行的:
cd /usr/local/hadoop/sbin
./stop-yarn.sh
./stop-dfs.sh
./mr-jobhistory-daemon.sh stop historyserver
不足之处,请多指教!赠人玫瑰,手有余香,如果您觉得有用的话,记得一键三连!谢谢。