安装环境:

OS: Oracle linux 5.6

JDK: jdk1.6.0_18

Hadoop: hadoop-0.20.2

Hbase: hbase-0.90.5

安装准备:

1. Jdk环境已安装:版本为1.6以上

2. hadoop环境已安装:完全分布模式安装如下

3. hbase版本选择

Hbase 版本必需与 Hadoop版本匹配,否则会安装失败或不能正常使用。关于两者何种版本能正常匹配,可以看官方文档或在网上搜寻安装的成功案例。

4. hbase软件下载

http://mirror.bjtu.edu.cn/apache/hbase/hbase-0.90.5/

hbase完全分布式模式单点失效测试 hbase完全分布式安装_hbase完全分布式模式单点失效测试

安装概述:

l 配置hosts,确保涉及的主机名均可以解析为ip

l 编辑hbase-env.xml

l 编辑hbase-site.xml

l 编辑regionservers文件

l 把Hbase复制到其它节点

l 启动Hbase

l 验证启动

安装步骤:

1. 配置hosts

此步在配置hadoop时已经完成,如下:

[root@gc ~]$ cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.2.101 rac1.localdomain rac1
192.168.2.102 rac2.localdomain rac2
192.168.2.100 gc.localdomain gc

2. 拷贝并解压安装包
[grid@gc ~]$ pwd
/home/grid
[grid@gc ~]$ tar -xzvf hbase-0.90.5.tar.gz

3.替换hadoop核心jar包
主要目的是防止因为hbase和hadoop版本不同出现兼容问题,造成hmaster启动异常
$ pwd
/home/grid/hbase-0.90.5/lib
$ mv hadoop-core-0.20-append-r1056497.jar hadoop-core-0.20-append-r1056497.jar.bak
$ cp /home/grid/hadoop-0.20.2/hadoop-0.20.2-core.jar /home/grid/hbase-0.90.5/lib/
$ chmod 775 hadoop-0.20.2-core.jar
4. 编辑hbase-env.xml
[grid@gc conf]$ pwd
/home/grid/hbase-0.90.5/conf
[grid@gc conf]$ vi hbase-env.sh
# 添加如下内容
# The java implementation to use. Java 1.6 required.
export JAVA_HOME=/usr/java/jdk1.6.0_18
# Extra Java CLASSPATH elements. Optional.
export HBASE_CLASSPATH=/home/grid/hadoop-0.20.2/conf
# Where log files are stored. $HBASE_HOME/logs by default.
export HBASE_LOG_DIR=${HBASE_HOME}/logs
# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=true

5. 编辑hbase-site.xml
[grid@gc conf]$ vi hbase-site.xml
# 添加如下内容
<property>
<name>hbase.rootdir</name> #设置hbase数据库存放数据的目录
<value>hdfs://gc:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name> #打开hbase分布模式
<value>true</value>
</property>
<property>
<name>hbase.master</name> #指定hbase集群主控节点
<value>gc:60000</value> 
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>gc,rac1,rac2</value> #指定zookeeper集群节点名,因为是由zookeeper表决算法决定的
</property> 
<property>
<name>hbase.zookeeper.property.dataDir</name> #指zookeeper集群data目录
<value>/home/grid/hbase-0.90.5/zookeeper</value>
</property>

6.编辑regionservers文件
[grid@gc conf]$ cat regionservers 
# 把localhost改为如下
rac1
rac2

7.将修改的hbase目录同步其它节点
--分别同步到rac1,rac2两节点
[grid@gc ~]$ scp -r hbase-0.90.5 rac1:/home/grid/
[grid@gc ~]$ scp -r hbase-0.90.5 rac2:/home/grid/

8.启动/关闭Hbase数据库集群
--启动hbase之前必需检查hadoop是否已经启动
[grid@gc ~]$ hadoop-0.20.2/bin/hadoop dfsadmin -report
Configured Capacity: 45702094848 (42.56 GB)
Present Capacity: 3562618880 (3.32 GB)
DFS Remaining: 3562348544 (3.32 GB)
DFS Used: 270336 (264 KB)
DFS Used%: 0.01%
Under replicated blocks: 4
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 2 (2 total, 0 dead)

Name: 192.168.2.101:50010
Decommission Status : Normal
Configured Capacity: 22851047424 (21.28 GB)
DFS Used: 135168 (132 KB)
Non DFS Used: 20131606528 (18.75 GB)
DFS Remaining: 2719305728(2.53 GB)
DFS Used%: 0%
DFS Remaining%: 11.9%
Last contact: Tue Dec 25 09:40:14 CST 2012


Name: 192.168.2.102:50010
Decommission Status : Normal
Configured Capacity: 22851047424 (21.28 GB)
DFS Used: 135168 (132 KB)
Non DFS Used: 22007869440 (20.5 GB)
DFS Remaining: 843042816(803.99 MB)
DFS Used%: 0%
DFS Remaining%: 3.69%
Last contact: Tue Dec 25 09:40:13 CST 2012

--启动Hbase集群
----在gc master节点
[grid@gc ~]$ hbase-0.90.5/bin/start-hbase.sh 
rac2: starting zookeeper, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-rac2.localdomain.out
gc: starting zookeeper, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-gc.localdomain.out
rac1: starting zookeeper, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-rac1.localdomain.out
starting master, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-master-gc.localdomain.out
rac1: starting regionserver, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-regionserver-rac1.localdomain.out
rac2: starting regionserver, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-regionserver-rac2.localdomain.out

--可以看到多出两个hbase进程
[grid@gc ~]$ jps
2718 HQuorumPeer
6875 JobTracker
6799 SecondaryNameNode
8129 org.eclipse.equinox.launcher_1.1.1.R36x_v20101122_1400.jar
2864 Jps
6651 NameNode
2772 HMaster

--rac1,rac2 slave节点
[grid@rac1 ~]$ jps
23663 HRegionServer
3736 DataNode
23585 HQuorumPeer
23737 Jps
3840 TaskTracker

[grid@rac2 ~]$ jps
10579 TaskTracker
29735 HQuorumPeer
29897 Jps
10480 DataNode
29812 HRegionServer

--通过浏览器验证:

http://192.168.2.100:60010/master.jsp

hbase完全分布式模式单点失效测试 hbase完全分布式安装_hbase完全分布式模式单点失效测试_02

hbase完全分布式模式单点失效测试 hbase完全分布式安装_大数据_03

--关闭hbase集群
[grid@gc hbase-0.90.5]$ bin/stop-hbase.sh 
stopping hbase...................
gc: stopping zookeeper.
rac2: stopping zookeeper.
rac1: stopping zookeeper.

命令行操作:
1. hbase命令
--进入habase
[grid@gc ~]$ hbase-0.90.5/bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.90.5, r1212209, Fri Dec 9 05:40:36 UTC 2011
hbase(main):001:0>

--查看数据库状态
hbase(main):002:0> status
2 servers, 0 dead, 1.0000 average load

--查询数据库版本
hbase(main):004:0> version
0.90.5, r1212209, Fri Dec 9 05:40:36 UTC 2011

--帮助命令
hbase(main):003:0> help
HBase Shell, version 0.90.5, r1212209, Fri Dec 9 05:40:36 UTC 2011
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.

COMMAND GROUPS:
 Group name: general
 Commands: status, version

 Group name: ddl
 Commands: alter, create, describe, disable, drop, enable, exists, is_disabled, is_enabled, list

 Group name: dml
 Commands: count, delete, deleteall, get, get_counter, incr, put, scan, truncate

 Group name: tools
 Commands: assign, balance_switch, balancer, close_region, compact, flush, major_compact, move, split, unassign, zk_dump

 Group name: replication
 Commands: add_peer, disable_peer, enable_peer, remove_peer, start_replication, stop_replication

SHELL USAGE:
Quote all names in HBase Shell such as table and column names. Commas delimit
command parameters. Type <RETURN> after entering a command to run it.
Dictionaries of configuration used in the creation and alteration of tables are
Ruby Hashes. They look like this:

 {'key1' => 'value1', 'key2' => 'value2', ...}

and are opened and closed with curley-braces. Key/values are delimited by the
'=>' character combination. Usually keys are predefined constants such as
NAME, VERSIONS, COMPRESSION, etc. Constants do not need to be quoted. Type
'Object.constants' to see a (messy) list of all constants in the environment.

If you are using binary keys or values and need to enter them in the shell, use
double-quote'd hexadecimal representation. For example:

 hbase> get 't1', "key\x03\x3f\xcd"
 hbase> get 't1', "key\003\023\011"
 hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"

The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.
For more on the HBase Shell, see http://hbase.apache.org/docs/current/book.html

2.  Hbase数据库操作命令
--创建表
resume表逻辑模型:
行键
时间戳
列族binfo
列族edu
列族work
lichangzai
T2
binfo:age=’1980-1-1’


T3
binfo:sex=’man’


T5

edu:mschool=’rq no.1’

T6

edu:university=’qhddx’

T7


work:company1=’12580’
changfei
T10
binfo:age=’1986-2-1’


T11

edu:university=’bjdx’

T12


work:company1=’LG’
……
Tn




--创建表
hbase(main):005:0> create 'resume','binfo','edu','work'
0 row(s) in 16.5710 seconds

--列出表
hbase(main):006:0> list
TABLE 
resume 
1 row(s) in 1.6080 seconds

--查看表结构
hbase(main):007:0> describe 'resume'
DESCRIPTION ENABLED 
{NAME => 'resume', FAMILIES => [{NAME => 'binfo', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', C true 
OMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'fals 
e', BLOCKCACHE => 'true'}, {NAME => 'edu', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESS 
ION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLO 
CKCACHE => 'true'}, {NAME => 'work', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 
 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACH 
E => 'true'}]} 
1 row(s) in 1.8590 seconds

--添加列族
hbase(main):014:0> disable 'resume' 
0 row(s) in 4.2630 seconds
hbase(main):015:0> alter 'resume',name='f1'
0 row(s) in 4.6990 seconds

--删除列族
hbase(main):017:0> alter 'resume',{NAME=>'f1',METHOD=>'delete'}
0 row(s) in 1.1390 seconds
--或是
hbase(main):021:0> alter 'resume','delete' => 'f1'
0 row(s) in 1.9310 seconds
hbase(main):022:0> enable 'resume'
0 row(s) in 5.9060 seconds


注意:
(1)  ddl命令是区分大小写的,像ddl中的alter,create, drop, enable等都必需用小写。而{}中的属性名都必需用大写。
(2)  alter、drop表之前必需在先禁用(disabel)表,修改完后再启用表(enable)表,否则会报错

--查询禁用状态
hbase(main):024:0> is_disabled 'resume'
false 
0 row(s) in 0.4930 seconds

hbase(main):021:0> is_enabled 'resume'
true 
0 row(s) in 0.2450 seconds

--删除表
hbase(main):015:0> create 't1','f1'
0 row(s) in 15.3730 seconds

hbase(main):016:0> disable 't1'
0 row(s) in 6.4840 seconds

hbase(main):017:0> drop 't1'
0 row(s) in 7.3730 seconds

--查询表是否存在
hbase(main):018:0> exists 'resume'
Table resume does exist 
0 row(s) in 2.3900 seconds

hbase(main):019:0> exists 't1'
Table t1 does not exist 
0 row(s) in 1.3270 seconds

--插入数据
put 'resume','lichangzai','binfo:age','1980-1-1'
put 'resume','lichangzai','binfo:sex','man'
put 'resume','lichangzai','edu:mschool','rq no.1'
put 'resume','lichangzai','edu:university','qhddx'
put 'resume','lichangzai','work:company1','12580'
put 'resume','lichangzai','work:company2','china mobile'
put 'resume','lichangzai','binfo:site','blog.csdn.net/lichangzai'
put 'resume','lichangzai','binfo:mobile','13712345678'
put 'resume','changfei','binfo:age','1986-2-1'
put 'resume','changfei','edu:university','bjdx'
put 'resume','changfei','work:company1','LG'
put 'resume','changfei','binfo:mobile','13598765401'
put 'resume','changfei','binfo:site','hi.baidu/lichangzai'

--获取一行键的所有数据
hbase(main):014:0> get 'resume','lichangzai'
COLUMN CELL 
binfo:age timestamp=1356485720612, value=1980-1-1 
binfo:mobile timestamp=1356485865523, value=13712345678 
binfo:sex timestamp=1356485733603, value=man 
binfo:site timestamp=1356485859806, value=blog.csdn.net/lichangzai 
edu:mschool timestamp=1356485750361, value=rq no.1 
edu:university timestamp=1356485764211, value=qhddx 
work:company1 timestamp=1356485837743, value=12580 
work:company2 timestamp=1356485849365, value=china mobile 
8 row(s) in 2.1090 seconds

注意:必须通过行键Row Key来查询数据

--获取一个行键,一个列族的所有数据
hbase(main):015:0> get 'resume','lichangzai','binfo'
COLUMN CELL 
binfo:age timestamp=1356485720612, value=1980-1-1 
binfo:mobile timestamp=1356485865523, value=13712345678 
binfo:sex timestamp=1356485733603, value=man 
binfo:site timestamp=1356485859806, value=blog.csdn.net/lichangzai 
4 row(s) in 1.6010 seconds

--获取一个行键,一个列族中一个列的所有数据
hbase(main):017:0> get 'resume','lichangzai','binfo:sex' 
COLUMN CELL 
binfo:sex timestamp=1356485733603, value=man 
1 row(s) in 0.8980 seconds

--更新一条记录
hbase(main):018:0> put 'resume','lichangzai','binfo:mobile','13899999999'
0 row(s) in 1.7640 seconds

hbase(main):019:0> get 'resume','lichangzai','binfo:mobile'
COLUMN CELL 
binfo:mobile timestamp=1356486691591, value=13899999999 
1 row(s) in 1.5710 seconds

注意:更新实质就是插入一条带有时间戳的记录,get查询时只显示最新时间的记录

--通过timestamp来获取数据
------查询最新的时间戳的数据
hbase(main):020:0> get 'resume','lichangzai',{COLUMN=>'binfo:mobile',TIMESTAMP=>1356486691591}
COLUMN CELL 
binfo:mobile timestamp=1356486691591, value=13899999999 
1 row(s) in 0.4060 seconds

------查之前(即删除)时间戳的数据
hbase(main):021:0> get 'resume','lichangzai',{COLUMN=>'binfo:mobile',TIMESTAMP=>1356485865523} 
COLUMN CELL 
binfo:mobile timestamp=1356485865523, value=13712345678 
1 row(s) in 0.7780 seconds

--全表扫描
hbase(main):022:0> scan 'resume'
ROW COLUMN+CELL 
changfei column=binfo:age, timestamp=1356485874056, value=1986-2-1 
changfei column=binfo:mobile, timestamp=1356485897477, value=13598765401 
changfei column=binfo:site, timestamp=1356485906106, value=hi.baidu/lichangzai 
changfei column=edu:university, timestamp=1356485880977, value=bjdx 
changfei column=work:company1, timestamp=1356485888939, value=LG 
lichangzai column=binfo:age, timestamp=1356485720612, value=1980-1-1 
lichangzai column=binfo:mobile, timestamp=1356486691591, value=13899999999 
lichangzai column=binfo:sex, timestamp=1356485733603, value=man 
lichangzai column=binfo:site, timestamp=1356485859806, value=blog.csdn.net/lichangzai 
lichangzai column=edu:mschool, timestamp=1356485750361, value=rq no.1 
lichangzai column=edu:university, timestamp=1356485764211, value=qhddx 
lichangzai column=work:company1, timestamp=1356485837743, value=12580 
lichangzai column=work:company2, timestamp=1356485849365, value=china mobile 
2 row(s) in 3.6300 seconds

--删除指定行键的列族字段
hbase(main):023:0> put 'resume','changfei','binfo:sex','man'
0 row(s) in 1.2630 seconds

hbase(main):024:0> delete 'resume','changfei','binfo:sex'
0 row(s) in 0.5890 seconds

hbase(main):026:0> get 'resume','changfei','binfo:sex'
COLUMN CELL 
0 row(s) in 0.5560 seconds

--删除整行
hbase(main):028:0> create 't1','f1','f2'
0 row(s) in 8.3950 seconds

hbase(main):029:0> put 't1','a','f1:col1','xxxxx' 
0 row(s) in 2.6790 seconds

hbase(main):030:0> put 't1','a','f1:col2','xyxyx'
0 row(s) in 0.5130 seconds

hbase(main):031:0> put 't1','b','f2:cl1','ppppp'
0 row(s) in 1.2620 seconds

hbase(main):032:0> deleteall 't1','a'
0 row(s) in 1.2030 seconds

hbase(main):033:0> get 't1','a'
COLUMN CELL 
0 row(s) in 0.8980 seconds

--查询表中有多少行
hbase(main):035:0> count 'resume'
2 row(s) in 2.8150 seconds
hbase(main):036:0> count 't1' 
1 row(s) in 0.9500 seconds

--清空表
hbase(main):037:0> truncate 't1'
Truncating 't1' table (it may take a while):
- Disabling table...
- Dropping table...
- Creating table...
0 row(s) in 21.0060 seconds

注意:Truncate表的处理过程:由于Hadoop的HDFS文件系统不允许直接修改,所以只能先删除表在重新创建已达到清空表的目的


3.  


遇到的问题
问题:
在刚配置完成hbase安装后,各节点进程还正常,可是过一小段时间后,master节点的HMaster进程就自已停止了。之后再重新启动master节点后,就出现了下面的问题

--master节点缺少HMaster进程
[grid@gc bin]$ ./start-hbase.sh
 rac1: starting zookeeper, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-rac1.localdomain.out
 rac2: starting zookeeper, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-rac2.localdomain.out
 gc: starting zookeeper, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-zookeeper-gc.localdomain.out
 starting master, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-master-gc.localdomain.out
 rac2: starting regionserver, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-regionserver-rac2.localdomain.out
 rac1: starting regionserver, logging to /home/grid/hbase-0.90.5/bin/../logs/hbase-grid-regionserver-rac1.localdomain.out
 [grid@gc bin]$ jps
 3871 NameNode
 4075 JobTracker
 8853 Jps
 4011 SecondaryNameNode
8673 HQuorumPeer

 --两slave节点rac1,rac2进程正常
[grid@rac1 bin]$ jps
10353 HQuorumPeer
10576 Jps
 6457 DataNode
 6579 TaskTracker
10448 HRegionServer
 [grid@rac2 ~]$ jps
10311 HQuorumPeer
10534 Jps
 6426 DataNode
 6546 TaskTracker
10391 HRegionServer

下面是部分日志
--master节点gc的日志[grid@gc logs]$ tail -100f hbase-grid-master-gc.localdomain.log
2012-12-25 15:23:45,842 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server gc/192.168.2.100:2181
 2012-12-25 15:23:45,853 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to gc/192.168.2.100:2181, initiating session
 2012-12-25 15:23:45,861 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
 2012-12-25 15:23:46,930 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server rac1/192.168.2.101:2181
 2012-12-25 15:23:47,167 WARN org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
 java.net.ConnectException: Connection refused
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
 2012-12-25 15:23:48,251 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server rac2/192.168.2.102:2181
 2012-12-25 15:23:48,362 INFO org.apache.zookeeper.ZooKeeper: Session: 0x0 closed
 2012-12-25 15:23:48,362 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
 2012-12-25 15:23:48,367 ERROR org.apache.hadoop.hbase.master.HMasterCommandLine: Failed to start master
 java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1065)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:142)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:102)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1079)
 Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndFailSilent(ZKUtil.java:931)
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.<init>(ZooKeeperWatcher.java:134)
at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:219)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:1060)

[grid@gc logs]$tail -100f hbase-grid-zookeeper-gc.localdomain.log
2012-12-25 15:23:57,380 WARN org.apache.zookeeper.server.quorum.QuorumCnxManager: Cannot open channel to 2 at election address rac2/192.168.2.102:3888
 java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
 at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:366)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:335)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:360)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:333)
at java.lang.Thread.run(Thread.java:619)
 .......
 2012-12-25 15:23:57,670 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:user.home=/home/grid
 2012-12-25 15:23:57,671 INFO org.apache.zookeeper.server.ZooKeeperServer: Server environment:user.dir=/home/grid/hbase-0.90.5
 2012-12-25 15:23:57,679 INFO org.apache.zookeeper.server.ZooKeeperServer: Created server with tickTime 3000 minSessionTimeout 6000 maxSessionTimeout 180000 datadir /home/grid/hbase-0.90.5/zookeeper/version-2 snapdir /home/grid/hbase-0.90.5/zookeeper/version-2
2012-12-25 15:23:58,118 WARN org.apache.zookeeper.server.quorum.Learner: Unexpected exception, tries=0, connecting to rac1/192.168.2.101:2888
 java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:525)
at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:212)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:65)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:644)at 
 2012-12-25 15:24:00,886 INFO org.apache.zookeeper.server.quorum.Learner: Getting a snapshot from leader
 2012-12-25 15:24:00,897 INFO org.apache.zookeeper.server.quorum.Learner: Setting leader epoch 9
 2012-12-25 15:24:01,051 INFO org.apache.zookeeper.server.persistence.FileTxnSnapLog: Snapshotting: 900000000
 2012-12-25 15:24:03,218 INFO org.apache.zookeeper.server.NIOServerCnxn: Accepted socket connection from /192.168.2.101:12397
 2012-12-25 15:24:03,377 INFO org.apache.zookeeper.server.NIOServerCnxn: Client attempting to establish new session at /192.168.2.101:12397
 2012-12-25 15:24:03,396 WARN org.apache.zookeeper.server.quorum.Learner: Got zxid 0x900000001 expected 0x1
 2012-12-25 15:24:03,400 INFO org.apache.zookeeper.server.persistence.FileTxnLog: Creating new log file: log.900000001
 2012-12-25 15:24:03,470 INFO org.apache.zookeeper.server.NIOServerCnxn: Established session 0x3bd0f2560e0000 with negotiated timeout 180000 for client /192.168.2.101:12397
 2012-12-25 15:24:07,057 INFO org.apache.zookeeper.server.NIOServerCnxn: Accepted socket connection from /192.168.2.102:52300
 2012-12-25 15:24:07,690 INFO org.apache.zookeeper.server.NIOServerCnxn: Client attempting to establish new session at /192.168.2.102:52300
 2012-12-25 15:24:07,712 INFO org.apache.zookeeper.server.NIOServerCnxn: Established session 0x3bd0f2560e0001 with negotiated timeout 180000 for client /192.168.2.102:52300
 2012-12-25 15:24:10,016 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2 (n.leader), 34359738398 (n.zxid), 1 (n.round), LOOKING (n.state), 2 (n.sid), FOLLOWING (my state)
 2012-12-25 15:24:30,422 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2 (n.leader), 34359738398 (n.zxid), 2 (n.round), LOOKING (n.state), 2 (n.sid), FOLLOWING (my state)
 2012-12-25 15:24:30,423 INFO org.apache.zookeeper.server.quorum.FastLeaderElection: Notification: 2 (n.leader), 34359738398 (n.zxid), 2 (n.round), LOOKING (n.state), 2 (n.sid), FOLLOWING (my state)

--slave节点rac2的日志[grid@rac2 logs]$ tail -100f hbase-grid-regionserver-rac2.localdomain.log2012-12-25 15:23:46,939 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server rac1/192.168.2.101:2181
 2012-12-25 15:23:47,154 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to rac1/192.168.2.101:2181, initiating session
 2012-12-25 15:23:47,453 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
 2012-12-25 15:23:47,977 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server gc/192.168.2.100:2181
 2012-12-25 15:23:48,354 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to gc/192.168.2.100:2181, initiating session
 2012-12-25 15:23:49,583 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server gc/192.168.2.100:2181, sessionid = 0x3bd0f2560e0001, negotiated timeout = 180000
 2012-12-25 15:23:52,052 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Installed shutdown hook thread: Shutdownhook:regionserver60020
解决方法
禁用IPV6,将/etc/hosts文件里面的::1 localhost那一行删掉重启
[grid@rac1 ~]$ cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
# ::1 localhost6.localdomain6 localhosti6
192.168.2.101 rac1.localdomain rac1
 192.168.2.102 rac2.localdomain rac2192.168.2.100 gc.localdomaingc

Hbase故障解决:

http://wiki.apache.org/hadoop/Hbase/Troubleshooting

参考了这名网友的文章:http://chfpdxx.blog.163.com/blog/static/29542296201241411325789/