Hbase的完全分布式部署

原创

guojianrui 2014-12-19 17:29:25 博主文章分类：linux实验 ©著作权

©著作权归作者所有：来自51CTO博客作者guojianrui的原创作品，请联系作者获取转载授权，否则将追究法律责任

Hbase的完全分布式部署：

安装Hbase集群需要先安装hadoop，上篇文章已经写过如何部署hadoop集群，直接部署hbase：

hbase版本：hbase-0.94.12

注意：hbase的版本需要与hadoop对应，查看是否对应只需要看hbase-0.94.1/lib/hadoop-core后面的版本号是否与hadoop的版本对应，如果不对应，可以将hadoop下hadoop-core文件复制过来，我就是因为版本不对应造成hbase创建表不成功。

1、解压hbase-0.94.12到/usr/local/下

2、配置hbase-env.sh文件

# vim conf/hbase-env.sh

export JAVA_HOME=/usr/java/jdk                             引用jdk 
export HBASE_CLASSPATH=/usr/local/hadoop-1.2.1/conf        引用hadoop配置目录
export HBASE_MANAGES_ZK=false                              关闭内置zk
export HBASE_HEAPSIZE=1024

3、配置hbase-site.xml文件

<configuration>
  <property>
    <name>hbase.rootdir</name>
   <value>hdfs://hadoop01:9000/hbase</value>
    <description>The directory shared by region servers.</description>
  </property>
  <property>
    <name>hbase.hregion.max.filesize</name>
    <value>1073741824</value>
    <description>
    Maximum HStoreFile size. If any one of a column families' HStoreFiles has
    grown to exceed this value, the hosting HRegion is split in two.
    Default: 256M.
    </description>
  </property>
  <property>
    <name>hbase.hregion.memstore.flush.size</name>
    <value>134217728</value>
    <description>
    Memstore will be flushed to disk if size of the memstore
    exceeds this number of bytes.  Value is checked by a thread that runs
    every hbase.server.thread.wakefrequency.
    </description>
  </property>
  <property>
    <name>hbase.cluster.distributed</name>
    <value>true</value>
    <description>The mode the cluster will be in. Possible values are
      false: standalone and pseudo-distributed setups with managed Zookeeper
      true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
</description>
  </property>
  <property>
      <name>hbase.zookeeper.property.clientPort</name>
      <value>2181</value>
      <description>Property from ZooKeeper's config zoo.cfg.
      The port at which the clients will connect.
      </description>
  </property>
  <property>
    <name>zookeeper.session.timeout</name>
    <value>120000</value>
  </property>
  <property>
    <name>hbase.zookeeper.property.tickTime</name>
    <value>6000</value>
  </property>
    <property>
      <name>hbase.zookeeper.quorum</name>
      <value>hadoop01,hadoop02,hadoop03</value>
      <description>Comma separated list of servers in the ZooKeeper Quorum.
      For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".
      By default this is set to localhost for local and pseudo-distributed modes
      of operation. For a fully-distributed setup, this should be set to a full
      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
      this is the list of servers which we will start/stop ZooKeeper on.
      </description>
</property>
 <property>
        <name>hbase.tmp.dir</name>
        <value>/hadoop/hbase</value>
    </property>
<property> 
<name>hbase.master</name>
<value>192.168.1.138:60000</value>
</property>
 </configuration>

配置截图：

# vim conf/regionservers 相当于hadoop的slave

hadoop01
hadoop02
hadoop03

接下来，我们就可以使用scp -r hbase 子节点名称:/目录，命令来进行远程拷贝分发了，截图如下：

然后，我们就可以关闭各个节点上的防火墙，来启动集群了，注意，要先启动Hadoop的集群，然后启动Hbase的集群，顺序不能反,截图如下：

# bin/stop-hbase.sh 启动所有hbase进程

# bin/stop-hbase.sh 停止

至此，我们的集群已经成功启动，下面访问ｈｂａｓｅ的端口６００１０的ｗｅｂ页面，可以看到我们的集群信息，截图如下：

Hbase的完全分布式部署 _完全分布式部署_04

Hbase的完全分布式部署 _Hbase_05

注意，为了确保能够在ｗｉｎ上访问ｈｂａｓｅ的端口成功，需要关闭，防火墙以及在ｗｉｎ上的ｈｏｓｔｓ文件配置映射信息，截图如下：

至此，我们已经配置完毕，最后关闭集群的时候，要先关闭ｈｂａｓｅ的集群，然后再关闭ｈａｄｏｏｐ的集群。

连接hbase创建表

# bin/hbase shell

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 0.94.12, r1524863, Fri Sep 20 04:44:41 UTC 2013

hbase(main):001:0>

创建一个名为 small的表，这个表只有一个 column family 为 cf。可以列出所有的表来检查创建情况，然后插入些值。

hbase(main):003:0> create 'small', 'cf'
0 row(s) in 1.2200 seconds
hbase(main):003:0> list
small
1 row(s) in 0.0550 seconds
hbase(main):004:0> put 'small', 'row1', 'cf:a', 'value1'
0 row(s) in 0.0560 seconds
hbase(main):005:0> put 'small', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0370 seconds
hbase(main):006:0> put 'small', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0450 seconds

检查插入情况.Scan这个表
hbase(main):005:0> scan 'small'
Get一行，操作如下
hbase(main):008:0> get 'small', 'row1'
disable 再 drop 这张表，可以清除你刚刚的操作
hbase(main):012:0> disable 'small'
0 row(s) in 1.0930 seconds
hbase(main):013:0> drop 'small'
0 row(s) in 0.0770 seconds

导出与导入

[hadoop@master hbase-0.94.12]$ bin/hbase org.apache.hadoop.hbase.mapreduce.Driver export small small

导出的表，在hadoop文件系统的当前用户目录下，small文件夹中。例如，导出后在hadoop文件系统中的目录结构：

[hadoop@master hadoop-1.0.4]$ bin/hadoop dfs -ls

Found 1 items

drwxr-xr-x - hadoop supergroup 0 2013-10-22 10:44 /user/hadoop/small

[hadoop@master hadoop-1.0.4]$ bin/hadoop dfs -ls ./small

Found 3 items

-rw-r--r-- 2 hadoop supergroup 0 2013-10-22 10:44 /user/hadoop/small/_SUCCESS

drwxr-xr-x - hadoop supergroup 0 2013-10-22 10:44 /user/hadoop/small/_logs

-rw-r--r-- 2 hadoop supergroup 285 2013-10-22 10:44 /user/hadoop/small/part-m-00000

2.把这个表导入到另外一台集群中hbase中时，需要把part-m-00000先put到另外hadoop中，假设put的路径也是：

/user/hadoop/small/

而且，这个要导入的hbase要已经建有相同第表格。

那么从hadoop中导入数据到hbase：

#hbase org.apache.hadoop.hbase.mapreduce.Driver import small part-m-00000

这样，没有意外的话就能正常把hbase数据导入到另外一个hbase数据库。

Web UI

用于访问和监控Hadoop系统运行状态

	Daemon	缺省端口	配置参数
HDFS	Namenode	50070	dfs.http.address
	Datanodes	50075	dfs.datanode.http.address
	Secondarynamenode	50090	dfs.secondary.http.address
	Backup/Checkpoint node*	50105	dfs.backup.http.address
MR	Jobracker	50030	mapred.job.tracker.http.address
MR	Tasktrackers	50060	mapred.task.tracker.http.address
HBase	HMaster	60010	hbase.master.info.port
HBase	HRegionServer	60030	hbase.regionserver.info.port