上一篇文章介绍了haoop集群HA高可用的搭建,相信大家已经掌握了其知识;本篇博客博主将继续为小伙伴分享HA联邦高可用的搭建,虽然,联邦机制在很多公司可能还达不到这样的数据集群规模以至于很多公司都没用使用;不过,像一些大型的游戏公司或者BAT这样的公司他们都采用的,为了增加小伙伴们的面试信心,博主还是简单分享下联邦的搭建过程。

    一、概述

           由于联邦机制其实就是比之前的HA机制时多了几对namenode而已,故本篇文章就直接在之前搭建的HA机制上改配置来搭建了。如果小伙伴们对上一篇的知识还不熟悉,建议先移步-->大数据教程(11.3)hadoop集群HA高可用搭建

    二、集群规划

         

主机名                           IP                      安装的软件                                运行的进程
            centos-aaron-ha-01    192.168.29.149    jdk、hadoop                            NameNode、DFSZKFailoverController(zkfc)
            centos-aaron-ha-02    192.168.29.150    jdk、hadoop                            NameNode、DFSZKFailoverController(zkfc)
            centos-aaron-ha-03    192.168.29.151    jdk、hadoop                            ResourceManager NameNode
            centos-aaron-ha-04    192.168.29.152    jdk、hadoop                            ResourceManager NameNode
            centos-aaron-ha-05    192.168.29.153    jdk、hadoop、zookeeper        DataNode、NodeManager、JournalNode、QuorumPeerMain
            centos-aaron-ha-06    192.168.29.154    jdk、hadoop、zookeeper        DataNode、NodeManager、JournalNode、QuorumPeerMain
            centos-aaron-ha-07    192.168.29.155    jdk、hadoop、zookeeper        DataNode、NodeManager、JournalNode、QuorumPeerMain

    三、集群搭建

          (1)修改core-site.xml

<configuration>
<!-- 指定hdfs的nameservice为viewfs:/// -->
<property>
<name>fs.defaultFS</name>
<value>viewfs:///</value>
</property>
<property>
<name>fs.viewfs.mounttable.default.link./bi</name>
<value>hdfs://bi/</value>
</property>
<property>
<name>fs.viewfs.mounttable.default.link./dt</name>
<value>hdfs://dt/</value>
</property>
<!-- 指定 /tmp 目录,许多依赖hdfs的组件可能会用到此目录 -->
<name>fs.viewfs.mounttable.default.link./tmp</name>
<value>hdfs://bi/tmp</value>
</property>
<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hdpdata</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>centos-aaron-ha-05:2181,centos-aaron-ha-06:2181,centos-aaron-ha-07:2181</value>
</property>
</configuration>

          (2)修改hdfs-site.xml

<configuration>
<!--指定hdfs的nameservice为bi,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>bi,dt</value>
</property>
<!-- bi下面有两个NameNode,分别是nn1,nn2 -->
<property>
<name>dfs.ha.namenodes.bi</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.ha.namenodes.dt</name>
<value>nn3,nn4</value>
</property>
<!-- nn1的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.bi.nn1</name>
<value>centos-aaron-ha-01:9000</value>
</property>
<!-- nn1的http通信地址 -->
<property>
<name>dfs.namenode.http-address.bi.nn1</name>
<value>centos-aaron-ha-01:50070</value>
</property>
<!-- nn2的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.bi.nn2</name>
<value>centos-aaron-ha-02:9000</value>
</property>
<!-- nn2的http通信地址 -->
<property>
<name>dfs.namenode.http-address.bi.nn2</name>
<value>centos-aaron-ha-02:50070</value>
</property>

<!-- dt的RPC通信地址 -->
<!-- nn3的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.dt.nn3</name>
<value>centos-aaron-ha-03:9000</value>
</property>
<!-- nn3的http通信地址 -->
<property>
<name>dfs.namenode.http-address.dt.nn3</name>
<value>centos-aaron-ha-03:50070</value>
</property>
<!-- nn4的RPC通信地址 -->
<property>
<name>dfs.namenode.rpc-address.dt.nn4</name>
<value>centos-aaron-ha-04:9000</value>
</property>
<!-- nn4的http通信地址 -->
<property>
<name>dfs.namenode.http-address.dt.nn4</name>
<value>centos-aaron-ha-04:50070</value>
</property>

<!-- 指定NameNode的edits元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://centos-aaron-ha-05:8485;centos-aaron-ha-06:8485;centos-aaron-ha-07:8485/bi</value>
</property>

<!--  在dt名称空间的两个namenode中,用如下配置-->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://centos-aaron-ha-05:8485;centos-aaron-ha-06:8485;centos-aaron-ha-07:8485/dt</value>
</property>

<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/journaldata</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.bi</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.dt</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<!-- 配置sshfence隔离机制超时时间 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>

          (3)将hdfs-site.xml、core-site.xml两个文件分发到集群上的所有机器

sudo scp -r /home/hadoop/apps/hadoop-2.9.1/etc/hadoop/hdfs-site.xml  hadoop@centos-aaron-ha-02:/home/hadoop/apps/hadoop-2.9.1/etc/hadoop/hdfs-site.xml
sudo scp -r /home/hadoop/apps/hadoop-2.9.1/etc/hadoop/core-site.xml  hadoop@centos-aaron-ha-02:/home/hadoop/apps/hadoop-2.9.1/etc/hadoop/core-site.xml

sudo scp -r /home/hadoop/apps/hadoop-2.9.1/etc/hadoop/hdfs-site.xml  hadoop@centos-aaron-ha-03:/home/hadoop/apps/hadoop-2.9.1/etc/hadoop/hdfs-site.xml
sudo scp -r /home/hadoop/apps/hadoop-2.9.1/etc/hadoop/core-site.xml  hadoop@centos-aaron-ha-03:/home/hadoop/apps/hadoop-2.9.1/etc/hadoop/core-site.xml

sudo scp -r /home/hadoop/apps/hadoop-2.9.1/etc/hadoop/hdfs-site.xml  hadoop@centos-aaron-ha-04:/home/hadoop/apps/hadoop-2.9.1/etc/hadoop/hdfs-site.xml
sudo scp -r /home/hadoop/apps/hadoop-2.9.1/etc/hadoop/core-site.xml  hadoop@centos-aaron-ha-04:/home/hadoop/apps/hadoop-2.9.1/etc/hadoop/core-site.xml

sudo scp -r /home/hadoop/apps/hadoop-2.9.1/etc/hadoop/hdfs-site.xml  hadoop@centos-aaron-ha-05:/home/hadoop/apps/hadoop-2.9.1/etc/hadoop/hdfs-site.xml
sudo scp -r /home/hadoop/apps/hadoop-2.9.1/etc/hadoop/core-site.xml  hadoop@centos-aaron-ha-05:/home/hadoop/apps/hadoop-2.9.1/etc/hadoop/core-site.xml

sudo scp -r /home/hadoop/apps/hadoop-2.9.1/etc/hadoop/hdfs-site.xml  hadoop@centos-aaron-ha-06:/home/hadoop/apps/hadoop-2.9.1/etc/hadoop/hdfs-site.xml
sudo scp -r /home/hadoop/apps/hadoop-2.9.1/etc/hadoop/core-site.xml  hadoop@centos-aaron-ha-06:/home/hadoop/apps/hadoop-2.9.1/etc/hadoop/core-site.xml

sudo scp -r /home/hadoop/apps/hadoop-2.9.1/etc/hadoop/hdfs-site.xml  hadoop@centos-aaron-ha-07:/home/hadoop/apps/hadoop-2.9.1/etc/hadoop/hdfs-site.xml
sudo scp -r /home/hadoop/apps/hadoop-2.9.1/etc/hadoop/core-site.xml  hadoop@centos-aaron-ha-07:/home/hadoop/apps/hadoop-2.9.1/etc/hadoop/core-site.xml

          (4)将centos-aaron-ha-01、centos-aaron-ha-02这两个namenode节点上的hdfs-site.xml中的以下配置部分删除掉

<!--  在dt名称空间的两个namenode中,用如下配置-->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://centos-aaron-ha-05:8485;centos-aaron-ha-06:8485;centos-aaron-ha-07:8485/dt</value>
</property>

          (5)将centos-aaron-ha-03、centos-aaron-ha-04这两个namenode节点上的hdfs-site.xml中的以下配置部分删除掉

<!-- 指定NameNode的edits元数据在JournalNode上的存放位置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://centos-aaron-ha-05:8485;centos-aaron-ha-06:8485;centos-aaron-ha-07:8485/bi</value>
</property>

          (6)清理之前HA集群留下的数据(namenode、datanode、journalnode)

#清理namenode、datanode、journalnode的工作目录数据
rm -rf /home/hadoop/hdpdata/
rm -rf /home/hadoop/journaldata/
#清理zookeeper中之前存储的集群的相关信息
rm -rf /home/hadoop/apps/zookeeper-3.4.13/data/version-2/
rm -rf /home/hadoop/apps/zookeeper-3.4.13/data/zookeeper_server.pid

         (7).注意:两个namenode之间要配置ssh免密码登陆,别忘了配置centos-aaron-ha-04到centos-aaron-ha-03的免登陆

在centos-aaron-ha-04上生产一对钥匙
ssh-keygen -t rsa
ssh-copy-id centos-aaron-ha-03

    四、集群初始化启动

          (1)启动zookeeper集群(分别在centos-aaron-ha-05、centos-aaron-ha-06、centos-aaron-ha-07上启动zk)

cd /home/hadoop/apps/zookeeper-3.4.13/bin/
./zkServer.sh start
#查看状态:一个leader,两个follower
./zkServer.sh status

          (2)启动journalnode(分别在centos-aaron-ha-05、centos-aaron-ha-06、centos-aaron-ha-07上执行)

cd /home/hadoop/apps/hadoop-2.9.1/
hadoop-daemon.sh start journalnode
#运行jps命令检验,centos-aaron-ha-05、centos-aaron-ha-06、centos-aaron-ha-07上多了JournalNode进程

          (3)在bi下nn1上初始化namenode

#在centos-aaron-ha-01上执行命令:
hdfs namenode -format -clusterid hdp2019
#格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件,这里我配置的是/home/hadoop/hdpdata/,然后将/home/hadoop/hdpdata/拷贝到centos-aaron-ha-02的/home/hadoop/hdpdata/下。
scp -r hdpdata/ centos-aaron-ha-02:/home/hadoop/
##也可以这样,建议hdfs namenode -bootstrapStandby 【注:此步骤需先启动centos-aaron-ha-01上的namenode: hadoop-daemon.sh start namenode】

          (4)在dt下nn3上初始化namenode

#在centos-aaron-ha-03上执行命令:
hdfs namenode -format -clusterid hdp2019
#格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件,这里我配置的是/home/hadoop/hdpdata/,然后将/home/hadoop/hdpdata/拷贝到centos-aaron-ha-04的/home/hadoop/hdpdata/下。
scp -r hdpdata/ centos-aaron-ha-04:/home/hadoop/
##也可以这样,建议hdfs namenode -bootstrapStandby 【注:此步骤需先启动centos-aaron-ha-03上的namenode: hadoop-daemon.sh start namenode】

          (5)格式化ZKFC(在centos-aaron-ha-01、centos-aaron-ha-03上各执行一次即可)

hdfs zkfc -formatZK

          (6)在bi下nn1上(centos-aaron-ha-01)

start-dfs.sh

          (7)在resoucemanager配置的主机上(centos-aaron-ha-03)启动yarn

start-yarn.sh

          (8)在 centos-aaron-ha-04上启动resourcemanager

yarn-daemon.sh start resourcemanager

    五、集群验证

          (1)namenode验证,浏览器访问

http://centos-aaron-ha-01:50070
NameNode 'hadoop01:9000' (active)
http://centos-aaron-ha-02:50070
NameNode 'hadoop02:9000' (standby)
http://centos-aaron-ha-03:50070
NameNode 'hadoop01:9000' (active)
http://centos-aaron-ha-04:50070
NameNode 'hadoop02:9000' (standby)

          (2)验证HDFS HA federation

#查看集群中的根目录,可以看到有【/bi /dt】两个子目录
hadoop fs -ls /
#创建一个文件夹
hadoop fs -mkdir ad.txt /bi/wd
#首先向hdfs上传一个文件
hadoop fs -put ad.txt /bi/wd
hadoop fs -ls /bi/wd
然后再kill掉active的NameNode
kill -9 <pid of NN>
通过浏览器访问:http://centos-aaron-ha-01:50070
NameNode 'centos-aaron-ha-01:9000' (active)
这个时候centos-aaron-ha-02上的NameNode变成了active
在执行命令:
hadoop fs -ls /bi/wd
Found 1 items
-rw-r--r--   3 hadoop supergroup         44 2019-01-11 18:20 /bi/wd/ad.txt
刚才上传的文件依然存在!!!
手动启动那个挂掉的NameNode
hadoop-daemon.sh start namenode
通过浏览器访问:http://centos-aaron-ha-02:50070
NameNode 'centos-aaron-ha-02:9000' (standby)

**另外两台namenode集群的测试同理

          (3)验证YARN:运行一下hadoop提供的demo中的WordCount程序:

hadoop jar /home/hadoop/apps/hadoop-2.9.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar wordcount viewfs:///bi/wd viewfs:///bi/wdout

    六、运行效果

hdfs联邦配置 hadoop联邦_java

hdfs联邦配置 hadoop联邦_大数据_02

hdfs联邦配置 hadoop联邦_运维_03

hdfs联邦配置 hadoop联邦_运维_04

hdfs联邦配置 hadoop联邦_hadoop_05

[hadoop@centos-aaron-ha-01 ~]$ hadoop jar /home/hadoop/apps/hadoop-2.9.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar wordcount viewfs:///bi/wd viewfs:///bi/wdout
19/01/11 19:02:37 INFO input.FileInputFormat: Total input files to process : 1
19/01/11 19:02:37 INFO mapreduce.JobSubmitter: number of splits:1
19/01/11 19:02:37 INFO Configuration.deprecation: yarn.resourcemanager.zk-address is deprecated. Instead, use hadoop.zk.address
19/01/11 19:02:37 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
19/01/11 19:02:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1547204450523_0001
19/01/11 19:02:38 INFO impl.YarnClientImpl: Submitted application application_1547204450523_0001
19/01/11 19:02:38 INFO mapreduce.Job: The url to track the job: http://centos-aaron-ha-03:8088/proxy/application_1547204450523_0001/
19/01/11 19:02:38 INFO mapreduce.Job: Running job: job_1547204450523_0001
19/01/11 19:02:51 INFO mapreduce.Job: Job job_1547204450523_0001 running in uber mode : false
19/01/11 19:02:51 INFO mapreduce.Job:  map 0% reduce 0%
19/01/11 19:03:03 INFO mapreduce.Job:  map 100% reduce 0%
19/01/11 19:03:16 INFO mapreduce.Job:  map 100% reduce 100%
19/01/11 19:03:16 INFO mapreduce.Job: Job job_1547204450523_0001 completed successfully
19/01/11 19:03:16 INFO mapreduce.Job: Counters: 54
        File System Counters
                FILE: Number of bytes read=98
                FILE: Number of bytes written=405013
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=129
                HDFS: Number of bytes written=60
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
                VIEWFS: Number of bytes read=0
                VIEWFS: Number of bytes written=0
                VIEWFS: Number of read operations=0
                VIEWFS: Number of large read operations=0
                VIEWFS: Number of write operations=0
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=9134
                Total time spent by all reduces in occupied slots (ms)=8261
                Total time spent by all map tasks (ms)=9134
                Total time spent by all reduce tasks (ms)=8261
                Total vcore-milliseconds taken by all map tasks=9134
                Total vcore-milliseconds taken by all reduce tasks=8261
                Total megabyte-milliseconds taken by all map tasks=9353216
                Total megabyte-milliseconds taken by all reduce tasks=8459264
        Map-Reduce Framework
                Map input records=5
                Map output records=8
                Map output bytes=76
                Map output materialized bytes=98
                Input split bytes=85
                Combine input records=8
                Combine output records=8
                Reduce input groups=8
                Reduce shuffle bytes=98
                Reduce input records=8
                Reduce output records=8
                Spilled Records=16
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=595
                CPU time spent (ms)=3070
                Physical memory (bytes) snapshot=351039488
                Virtual memory (bytes) snapshot=4140134400
                Total committed heap usage (bytes)=139972608
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=0
        File Output Format Counters 
                Bytes Written=0
[hadoop@centos-aaron-ha-01 ~]$ 
[hadoop@centos-aaron-ha-01 ~]$ hdfs dfs -ls  viewfs:///bi/wdout
Found 2 items
-rw-r--r--   3 hadoop supergroup          0 2019-01-11 19:03 viewfs:///bi/wdout/_SUCCESS
-rw-r--r--   3 hadoop supergroup         60 2019-01-11 19:03 viewfs:///bi/wdout/part-r-00000
[hadoop@centos-aaron-ha-01 ~]$ hdfs dfs -cat  viewfs:///bi/wdout/part-r-00000
ddfsZZ  1
df      1
dsfsd   1
hello   1
sdfdsf  1
sdfsd   1
sdss    1
xxx     1
[hadoop@centos-aaron-ha-01 ~]$

    七、最后总结

           本次搭建联邦集群的过程中也遇到了一些问题,如.1:运行mapreduce程序是报错找不到temp的目录、博主这里主要是因为少core-site中配置了tmp目录参数引起的。如.2:hdfs集群初始化后,浏览器上查看clustid不一致;博主这是因为执行namenode fomart的时候clustid写成clustId引起。总之,在遇到问题的时候,看着日志来查问题就可以解决。

           最后寄语,以上是博主本次文章的全部内容,如果大家觉得博主的文章还不错,请点赞;如果您对博主其它服务器大数据技术或者博主本人感兴趣,请关注博主博客,并且欢迎随时跟博主沟通交流。