Hbase的完全分布式部署:
安装Hbase集群需要先安装hadoop,上篇文章已经写过如何部署hadoop集群,直接部署hbase:
hbase版本:hbase-0.94.12
注意:hbase的版本需要与hadoop对应,查看是否对应只需要看hbase-0.94.1/lib/hadoop-core后面的版本号是否与hadoop的版本对应,如果不对应,可以将hadoop下hadoop-core文件复制过来,我就是因为版本不对应造成hbase创建表不成功。
1、解压hbase-0.94.12到/usr/local/下
2、配置hbase-env.sh文件
# vim conf/hbase-env.sh
export JAVA_HOME=/usr/java/jdk 引用jdk export HBASE_CLASSPATH=/usr/local/hadoop-1.2.1/conf 引用hadoop配置目录 export HBASE_MANAGES_ZK=false 关闭内置zk export HBASE_HEAPSIZE=1024
3、配置hbase-site.xml文件
<configuration> <property> <name>hbase.rootdir</name> <value>hdfs://hadoop01:9000/hbase</value> <description>The directory shared by region servers.</description> </property> <property> <name>hbase.hregion.max.filesize</name> <value>1073741824</value> <description> Maximum HStoreFile size. If any one of a column families' HStoreFiles has grown to exceed this value, the hosting HRegion is split in two. Default: 256M. </description> </property> <property> <name>hbase.hregion.memstore.flush.size</name> <value>134217728</value> <description> Memstore will be flushed to disk if size of the memstore exceeds this number of bytes. Value is checked by a thread that runs every hbase.server.thread.wakefrequency. </description> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> <description>The mode the cluster will be in. Possible values are false: standalone and pseudo-distributed setups with managed Zookeeper true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh) </description> </property> <property> <name>hbase.zookeeper.property.clientPort</name> <value>2181</value> <description>Property from ZooKeeper's config zoo.cfg. The port at which the clients will connect. </description> </property> <property> <name>zookeeper.session.timeout</name> <value>120000</value> </property> <property> <name>hbase.zookeeper.property.tickTime</name> <value>6000</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>hadoop01,hadoop02,hadoop03</value> <description>Comma separated list of servers in the ZooKeeper Quorum. For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on. </description> </property> <property> <name>hbase.tmp.dir</name> <value>/hadoop/hbase</value> </property> <property> <name>hbase.master</name> <value>192.168.1.138:60000</value> </property> </configuration>
配置截图:
# vim conf/regionservers 相当于hadoop的slave
hadoop01
hadoop02
hadoop03
接下来,我们就可以使用scp -r hbase 子节点名称:/目录,命令来进行远程拷贝分发了,截图如下:
然后,我们就可以关闭各个节点上的防火墙,来启动集群了,注意,要先启动Hadoop的集群,然后启动Hbase的集群,顺序不能反,截图如下:
# bin/stop-hbase.sh 启动所有hbase进程
# bin/stop-hbase.sh 停止
至此,我们的集群已经成功启动,下面访问hbase的端口60010的web页面,可以看到我们的集群信息,截图如下:
注意,为了确保能够在win上访问hbase的端口成功,需要关闭,防火墙以及在win上的hosts文件配置映射信息,截图如下:
至此,我们已经配置完毕,最后关闭集群的时候,要先关闭hbase的集群,然后再关闭hadoop的集群。
连接hbase创建表
# bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.94.12, r1524863, Fri Sep 20 04:44:41 UTC 2013
hbase(main):001:0>
创建一个名为 small的表,这个表只有一个 column family 为 cf。可以列出所有的表来检查创建情况,然后插入些值。
hbase(main):003:0> create 'small', 'cf'
0 row(s) in 1.2200 seconds
hbase(main):003:0> list
small
1 row(s) in 0.0550 seconds
hbase(main):004:0> put 'small', 'row1', 'cf:a', 'value1'
0 row(s) in 0.0560 seconds
hbase(main):005:0> put 'small', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0370 seconds
hbase(main):006:0> put 'small', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0450 seconds
检查插入情况.Scan这个表
hbase(main):005:0> scan 'small'
Get一行,操作如下
hbase(main):008:0> get 'small', 'row1'
disable 再 drop 这张表,可以清除你刚刚的操作
hbase(main):012:0> disable 'small'
0 row(s) in 1.0930 seconds
hbase(main):013:0> drop 'small'
0 row(s) in 0.0770 seconds
导出与导入
[hadoop@master hbase-0.94.12]$ bin/hbase org.apache.hadoop.hbase.mapreduce.Driver export small small
导出的表,在hadoop文件系统的当前用户目录下,small文件夹中。例如,导出后在hadoop文件系统中的目录结构:
[hadoop@master hadoop-1.0.4]$ bin/hadoop dfs -ls
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2013-10-22 10:44 /user/hadoop/small
[hadoop@master hadoop-1.0.4]$ bin/hadoop dfs -ls ./small
Found 3 items
-rw-r--r-- 2 hadoop supergroup 0 2013-10-22 10:44 /user/hadoop/small/_SUCCESS
drwxr-xr-x - hadoop supergroup 0 2013-10-22 10:44 /user/hadoop/small/_logs
-rw-r--r-- 2 hadoop supergroup 285 2013-10-22 10:44 /user/hadoop/small/part-m-00000
2.把这个表导入到另外一台集群中hbase中时,需要把part-m-00000先put到另外hadoop中,假设put的路径也是:
/user/hadoop/small/
而且,这个要导入的hbase要已经建有相同第表格。
那么从hadoop中导入数据到hbase:
#hbase org.apache.hadoop.hbase.mapreduce.Driver import small part-m-00000
这样,没有意外的话就能正常把hbase数据导入到另外一个hbase数据库。
Web UI
用于访问和监控Hadoop系统运行状态
Daemon | 缺省端口 | 配置参数 | |
HDFS | Namenode | 50070 | dfs.http.address |
Datanodes | 50075 | dfs.datanode.http.address | |
Secondarynamenode | 50090 | dfs.secondary.http.address | |
Backup/Checkpoint node* | 50105 | dfs.backup.http.address | |
MR | Jobracker | 50030 | mapred.job.tracker.http.address |
Tasktrackers | 50060 | mapred.task.tracker.http.address | |
HBase | HMaster | 60010 | hbase.master.info.port |
HRegionServer | 60030 | hbase.regionserver.info.port |
参考文档: