avator hadoop的安装是一个磨砺人心智的过程,仅在此记录曾经的辛酸:

1、基本配置:hosts、防火墙、免密钥;

2、浮动IP配置:

   安装ucarp-1.5.2-1.el6.rf.x86_64.rpm包;

   将ucarp.sh, vip-down.sh和vip-up.sh拷贝到主备两台机器的/etc目录下,增加执行权限:

    

hadoop 数据备份 hadoop热备份_hadoop 数据备份

hadoop 数据备份 hadoop热备份_hadoop_02

ucarp.sh

#!/bin/sh
ucarp --interface=eth0 --srcip=192.168.1.1 --vhid=24 --pass=mypassword \
--192.168.1.204 \
--upscript=/etc/vip-up.sh --downscript=/etc/vip-down.sh

hadoop 数据备份 hadoop热备份_hadoop 数据备份

hadoop 数据备份 hadoop热备份_hadoop_02

vip-down.sh

#! /bin/sh
/sbin/ip addr del 192.168.1.204/24 dev eth0

hadoop 数据备份 hadoop热备份_hadoop 数据备份

hadoop 数据备份 hadoop热备份_hadoop_02

vip-up.sh

#! /bin/sh
/sbin/ip addr add 192.168.1.204/24 dev eth0
AvatarNode=$(/xxx/jdk/bin/jps  | grep "AvatarNode")
if [ -n "$AvatarNode" ];
then
Standby=$(/xxx/hadoop/bin/hadoop org.apache.hadoop.hdfs.AvatarShell -showAvatar | grep "Standby")
if [ -n "$Standby" ];
then
/xxx/hadoop/bin/hadoop  dfsadmin  -saveNamespace
/xxx/hadoop/bin/hadoop org.apache.hadoop.hdfs.AvatarShell -setAvatar primary
fi
fi

 

   将ucarp.sh中的第一个IP地址修改为本机的固定IP,将ucarp.sh中的第二个IP地址修改为浮动IP,将vip-down.sh和vip-up.sh中的IP地址修改为浮动IP;

   分别在主备机上执行ucarp.sh,先执行者为主机,后执行者为备机。ucarp.sh为永久执行程序,所以须用nohup后台执行;

   浮动IP设置成功后的查看命令:ip address show;同时可以进行测试,如关闭第一台机器或拔掉其网线,ping虚拟IP仍然能通,标志配置成功!

3、NFS配置:

  作用:利用NFS实现FLOG和EDIT的热备份

  在standby机器上将hdfs-site.xml中的配置文件里的dfs.name.dir.shared0/1属性的父目录挂载到NFS服务中,vim /etc/exports增加"/xxx/avatarshare *(rw,sync,no_root_squash)";

4、节点参数配置:

  primary(AvatarNode0)节点:

    1).core-site.xm:

    

hadoop 数据备份 hadoop热备份_hadoop 数据备份

hadoop 数据备份 hadoop热备份_hadoop_02

core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- special parameters for avatarnode -->
  <property>
        <name>fs.default.name0</name>
        <value>hdfs://0.0.0.0:9000</value>
  </property>
  <property>
        <name>fs.default.name1</name>
        <value>hdfs://192.168.1.2:9000</value>
  </property>
<!-- special parameters for avatarnode -->
<property>
        <name>fs.default.name</name>
        <value>hdfs://192.168.1.204:9000</value>
</property>
<property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/hadoop/tmp</value>
</property>
</configuration>

    其中,fs.default.name0为AvatarNode0上NameNode的RPC服务器地址,值为hdfs://0.0.0.0:9000,使得客户端用物理IP和虚拟IP都可以访问此NameNode;fs.default.name1则是AvatarNode1上NameNode的RPC服务器地址;

    2).hdfs-site.xml

hadoop 数据备份 hadoop热备份_hadoop 数据备份

hadoop 数据备份 hadoop热备份_hadoop_02

hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<!-- special parameters for avatarnode -->
<configuration>
<property>
  <name>dfs.permissions</name>
  <value>false</value>
</property>
<property>
        <name>dfs.name.dir.shared0</name>
        <value>/xxx/avatarshare/share0/namenode</value>
</property> 
<property>
        <name>dfs.name.edits.dir.shared0</name>
        <value>/xxx/avatarshare/share0/editlog</value>
</property>
<property>
        <name>dfs.http.address0</name>
        <value>0.0.0.0:50070</value>
</property> 
<property>
        <name>dfs.name.dir.shared1</name>
        <value>/xxx/avatarshare/share1/namenode</value>
</property> 
<property>
        <name>dfs.name.edits.dir.shared1</name>
        <value>/xxx/avatarshare/share1/editlog</value>
</property>
<property>
        <name>dfs.http.address1</name>
        <value>192.168.1.2:50070</value>
</property> 
<property>
        <name>dfs.http.address</name>
        <value>0.0.0.0:50070</value>
</property> 
<property>
        <name>dfs.name.dir</name>
        <value>/xxx/local/namenode</value>
</property>
<property>
        <name>dfs.name.edits.dir</name>
        <value>/xxx/local/editlog</value>
</property> 
</configuration>

    在此配置文件里,除了Hadoop的配置项外,Avatar的配置项dfs.name.dir.shared0,dfs.name.edits.dir.shared0,dfs.name.dir.shared1,dfs.name.edits.dir.shared1,分别为AvatarNode0上HDFS的镜像日志存储目录,AvatarNode1上HDFS的镜像日志存储目录。可以看到这些目录都在NFS的共享目录中,当AvatarNode0上运行的是PrimaryNameNode时,会向dfs.name.edits.dir.share0中写日志,AvatarNode1上的StandbyNameNode就会去读这些日志,反之,当AvatarNode1上运行的是PrimaryNameNode时,会向dfs.name.edits.dir.share1中写日志,AvatarNode0上的StandbyNameNode就会去读这些日志。

  standby节点(AvatarNode1)节点

    1).core-site.xml

hadoop 数据备份 hadoop热备份_hadoop 数据备份

hadoop 数据备份 hadoop热备份_hadoop_02

core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- special parameters for avatarnode -->
   <property>
        <name>fs.default.name0</name>
        <value>hdfs://192.168.1.1:9000</value>
  </property>
  <property>
        <name>fs.default.name1</name>
        <value>hdfs://0.0.0.0:9000</value>
  </property>
<!-- special parameters for avatarnode -->
<property>
        <name>fs.default.name</name>
        <value>hdfs://192.168.1.204:9000</value>
</property>
<property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/hadoop/tmp</value>
</property>
</configuration>

    2).hdfs-site.xml

hadoop 数据备份 hadoop热备份_hadoop 数据备份

hadoop 数据备份 hadoop热备份_hadoop_02

hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<!-- special parameters for avatarnode -->
<configuration>
<property>
  <name>dfs.permissions</name>
  <value>false</value>
</property>
<property>
        <name>dfs.name.dir.shared0</name>
        <value>/xxx/avatarshare/share0/namenode</value>
</property> 
<property>
        <name>dfs.name.edits.dir.shared0</name>
        <value>/xxx/avatarshare/share0/editlog</value>
</property>
<property>
        <name>dfs.http.address0</name>
        <value>192.168.1.1:50070</value>
</property> 
<property>
        <name>dfs.name.dir.shared1</name>
        <value>/xxx/avatarshare/share1/namenode</value>
</property> 
<property>
        <name>dfs.name.edits.dir.shared1</name>
        <value>/xxx/avatarshare/share1/editlog</value>
</property>
<property>
        <name>dfs.http.address1</name>
        <value>0.0.0.0:50070</value>
</property> 
<property>
        <name>dfs.name.dir</name>
        <value>/xxx/local/namenode</value>
</property>
<property>
        <name>dfs.name.edits.dir</name>
        <value>/xxx/local/editlog</value>
</property> 
</configuration>

  datanode节点

    1).core-site.xml

hadoop 数据备份 hadoop热备份_hadoop 数据备份

hadoop 数据备份 hadoop热备份_hadoop_02

core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- special parameters for avatarnode -->
  <property>
        <name>fs.default.name0</name>
        <value>hdfs://192.168.1.1:9000</value>
  </property>
  <property>
        <name>fs.default.name1</name>
        <value>hdfs://192.168.1.2:9000</value>
  </property>
<!-- special parameters for avatarnode -->
<property>
        <name>fs.default.name</name>
        <value>hdfs://192.168.1.204:9000</value>
</property>
<property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/hadoop/tmp</value>
</property>
</configuration>

    2).hdfs-site.xml

hadoop 数据备份 hadoop热备份_hadoop 数据备份

hadoop 数据备份 hadoop热备份_hadoop_02

hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<!-- special parameters for avatarnode -->
<configuration>
<property>
  <name>dfs.permissions</name>
  <value>false</value>
</property>
<property>
        <name>dfs.http.address0</name>
        <value>192.168.1.1:50070</value>
</property> 
<property>
        <name>dfs.http.address1</name>
        <value>192.168.1.2:50070</value>
</property> 
<!-- special parameters for avatarnode -->
<property>
        <name>dfs.http.address</name>
        <value>192.168.1.204:50070</value>
</property> 
<property>
        <name>dfs.replication</name>
        <value>3</value>
</property>
<property>
        <name>dfs.data.dir</name>
        <value>/data0,/data1</value>
</property>
</configuration>

 

5、启动:

  1).在primary机器上执行命令"mount -v -t nfs -o tcp,soft,retry=2,timeo=2,rsize=32768,wsize=32768  192.168.1.x:/xxx/avatarshare /xxx/avatarshare",将standby节点的持久化目录挂载过来;

  2).在primary节点上执行"$HADOOP_HOME/bin/hadoop namenode -format";然后将/xxx/local目录下的namenode、editlog的copy到NFS共享目录share0、share1中;再执行"$HADOOP_HOME/bin/hadoop namenode org.apache.hadoop.hdfs.server.namenode.AvatarNode  -zero -format"进行格式化操作;

  3).在standby节点上执行"$HADOOP_HOME/bin/hadoop namenode org.apache.hadoop.hdfs.server.namenode.AvatarNode  -one –format"进行格式化;

  4).启动primary节点:"$HADOOP_HOME/bin/hadoop org.apache.hadoop.hdfs.server.namenode.AvatarNode –zero";

  5).启动standby节点:"$HADOOP_HOME/bin/hadoop org.apache.hadoop.hdfs.server.namenode.AvatarNode -one -standby -sync";

  6).启动datanode节点:"$HADOOP_HOME/bin/hadoop org.apache.hadoop.hdfs.server.datanode";

6、过程是艰辛的,当初摸索了3天结果因为版本问题弄的蛋都碎了;常见问题解决思路如下:

  1).hdfs-site.xml挂载路径的问题:自己尝试下自己的理解,然后将配置改成自己认为的0/1,最后你会恍然大悟的;

  2).最尼玛蛋疼的就是格式化的问题:反复的格式化蛋都碎过好几次,最后在版本不对的情况下摸索了一个方法,先启动primary节点,然后再格式化standby节点并启动之,当然这是在上述步骤失效的情况下的下策;

  3).可以用netstat -anp |grep myport看本机要启动的服务是否启动到位,经常遇到的情况是地址绑定到ipv6上了,这里两种思路:一是彻底禁用ipv6,而是在hadoop-evn.sh中添加"export HADOOP_OPTS="-Djava.net.preferlIPv4Stack=true"让java程序使用ipv4(ps:特别观察下preferIPv4Stack,而不是.preferlIPv4Stack,多一个l耗费了我大半天,关键是尼玛心累)