本篇主要讲HBase的部署,Apache HBase provides large-scale tabular storage for Hadoop using the Hadoop Distributed File System (HDFS)
1 安装HBase
apt-get install hbase
2 HBase配置设置
1 Using DNS with HBase
HBase使用本机主机名来报告自己的IP地址。所以你的DNS必须得正常工作。
2 Using the Network Time Protocol (NTP) with HBase
时间也必须一致。
3 为HBase设置用户限制
#vi /etc/security/limits.conf
hdfs - nofile 32768
hbase - nofile 32768
#vi /etc/pam.d/common-session
session required pam_limits.so
4 在HBase中设置dfs.datanode.max.xcievers
在hadoop的HDFS DataNode节点中,有一个在同一时刻可以访问文件数目的最大值, 我们可以增大该值,以提高效率,至少配置4096,如下配置/etc/hadoop/conf/hdfs-site.xml:
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
</property>
3 HBase也有多种模式,单机模式
默认请看下,HBase的配置文件就是单机模式的,在这种模式下,一个单独的JVM主机运行HBase Master,HBase Region Server,ZooKeeper等服务。
1 安装HBase Master
apt-get install hbase-master
2 启动HBase Master
service hbase-master start
3 检查单机模式
http://localhost:60010
4 安装配置REST
apt-get install hbase-rest
service hbase-rest start
修改配置文件hbase-site.xml
<property>
<name>hbase.rest.port</name>
<value>60050</value>
</property>
4 HBase伪分布式模式
Pseudo-distributed mode differs from standalone mode in that each of the component processes run in a separate JVM
1 修改HBase配置文件/etc/hbase/conf/hbase-site.xml
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://myhost:8020/hbase</value>
</property>
2 在HDFS中创建/hbase目录
hadoop fs -mkdir /hbase
hadoop fs -chown hbase /hbase
3 开启伪分布模式
HBase要正常工作,还需要其他组件配合
1 安装启用ZooKeeper Server
可以安装在同一台机器,启用不同的端口
2 启动HBase Master
service hbase-master start
3 安装启动HBase RegionServer
apt-get install hbase-regionserver
service hbase-regionserver start
4 检查伪分布模式
jps
4 Installing and Starting the HBase Thrift Server
The HBase Thrift Server is an alternative gateway for accessing the HBase server. Thrift mirrors most of the HBase client APIs while enabling popular programming languages to interact with HBase. The Thrift Server is multiplatform and more performant than REST in many situations. Thrift can be run collocated along with the region servers, but should not be collocated with the NameNode or the JobTracker.
apt-get install hbase-thrift
service hbase-thrift start
5 部署分布式HBase
1 选择部署地点
master node:you will typically run the HBase Master and a ZooKeeper quorum peer(NameNode and JobTracker)
slave nodes:On each node, Cloudera recommends running a Region Server(TaskTracker (MRv1) and a DataNode)
2 部署配置文件
在你决定部署在那台机器上之后,就可以修改配置文件了,然后把这些文件同步到其他机器,从伪分布式到分布式你只需要修改一个配置项,如下hbase-site.xml:
<property>
<name>hbase.zookeeper.quorum</name>
<value>mymasternode</value>
</property>
HBase集群各个服务的启动顺序:
1 The ZooKeeper Quorum Peer
2 The HBase Master
3 Each of the HBase RegionServers
这时候HBase的web接口可以通过60010来访问
3 通过HBase Shell访问HBase
hbase shell
4 Using MapReduce with HBase
为了使用HBase运行mapreduce任务,你需要把HBase和zookeeper的jar包增加到hadoop java的classpath中。
5 HBase复制
HBase复制提供了从一个HBase集群向另一个HBase集群复制数据的功能。从用户应用收取数据的叫master集群,从master集群收取数据的叫slave集群。总共有三种模式:
1 Master-Slave Replication
2 Master-Master Replication
3 Cyclic Replication
1 关于集群的小知识
*) You make the configuration changes on the master cluster side
*) In the case of master-master replication, you make the changes on both sides
*) Replication works at the table-column-family level. The family should exist on all the slaves. (You can have additional, non replicating families on both sides).
*) The timestamps of the replicated HLog entries are kept intact. In case of a collision (two entries identical as to row key, column family, column qualifier, and timestamp) only the entry arriving later write will be read.
*) Increment Column Values (ICVs) are treated as simple puts when they are replicated. In the master-master case, this may be undesirable, creating identical counters that overwrite one another.
*) Make sure the master and slave clusters are time-synchronized with each other.
2 部署HBase复制
1 修改配置文件hbase-site.xml
<property>
<name>hbase.replication</name>
<value>true</value>
</property>
2 把hbase-site.xml发送到所有节点
3 重启HBase
4 在HBase master中运行以下命令
add_peer
add_peer '<n>',"slave.zookeeper.quorum:zookeeper.clientport.:zookeeper.znode.parent"
example:hbase> add_peer '1', "zk.server.com:2181:/hbase"
5 一旦你有了peer,即可开启复制
disable 'your_table'
alter 'your_table', {NAME => 'family_name', REPLICATION_SCOPE => '1'}
enable 'your_table'
6 在HBase master上列出所有的peer配置
list_peers
7 在peer级别禁止复制
disable_peer ("<peerID>")
enable_peer(<"peerID">)
8 Stopping Replication in an Emergency
stop_replication
9 Initiating Replication of Pre-existing Data
10 Verifying Replicated Data
hadoop jar $HBASE_HOME/hbase-<version>.jar verifyrep [--starttime=timestamp1] [--stoptime=timestamp [--families=comma separated list of families] <peerId> <tablename>