本篇属于Hadoop系列环境搭建系列,腾讯云或百度云上都有许多搭建好的环境可以直接用。不过亲自动手实践一下,收获肯定会更多一些。

目录

(1)软件环境准备

(2)HBASE安装及配置


(1)软件环境准备

Hadoop运行环境:即环境中已经能运行Hadoop。可以参见我的上一篇博文:

超详细的Hadoop3.1.2架构单机、伪分布式、完全分布式安装和配置:

Hbase安装包:可以在http://mirror.bit.edu.cn/apache/hadoop/common/下载

zookeeper安装包:所用版本为3.4.6

(2)安装zookeeper

首先来安装zookeeper,将soft目录中的安装tar包解压至家目录下的modules中,与hadoop安装目录一致:

cdh查看hadoop是否高可用 cdh查看hadoop版本号_cdh查看hadoop是否高可用

cdh查看hadoop是否高可用 cdh查看hadoop版本号_jar_02

压完成后,接下来就是环境变量及配置文件的修改了。首先将zookeeper设置一下环境变量,即与前述组件安装一样,将zookeeper的路径设置到环境变量bashrc文件中。

#setting for zookeeper

export ZOOKEEPER_HOME=/home/hadoop/modules/zookeeper-3.4.6

export PATH=$PATH:$ZOOKEEPER_HOME/bin

设置完成后,要使用source命令使其生效。

cdh查看hadoop是否高可用 cdh查看hadoop版本号_cdh查看hadoop是否高可用_03

zookeeper的配置文件在安装目录文件夹下的conf中。

cdh查看hadoop是否高可用 cdh查看hadoop版本号_cdh查看hadoop是否高可用_04

将zoo_sample.cfg重命名为zoo.cfg,然后设置其中的数据目录及日志存放目录,如下红色字体。同时注册节点及端口号。本实验环境为伪分布式,所以节点也只有一个。

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/home/hadoop/tmp/zookeeper/data
dataLogDir=/home/hadoop/tmp/zookeeper/logs
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=big01:2888:3888

编辑完成后,保存zoo.cfg即可。

cdh查看hadoop是否高可用 cdh查看hadoop版本号_cdh查看hadoop是否高可用_05

然后就可以启动zookeeper服务了。到安装目录下的bin文件夹里,如下图:

cdh查看hadoop是否高可用 cdh查看hadoop版本号_hbase_06

zkServer.sh脚本,命令行输入:./zkServer.sh start,启动zookeeper服务。

cdh查看hadoop是否高可用 cdh查看hadoop版本号_cdh查看hadoop是否高可用_07

可以在./zkServer.sh后跟上status,查看zookeeper进程状态:

cdh查看hadoop是否高可用 cdh查看hadoop版本号_hadoop_08

cdh查看hadoop是否高可用 cdh查看hadoop版本号_cdh查看hadoop是否高可用_09

(3)HBASE安装及配置

1. 将安装包解压:

[hadoop@master ~]$ tar -zxvf hbase-2.2.0-bin.tar.gz

2. 设置环境变量:

[root@master ~]# vi /etc/profile
#setting for hbase
export HBASE_HOME=/home/hadoop/hbase-2.2.0
export PATH=$HBASE_HOME/bin:$PATH

保存后,使用source /etc/profile使其生效。

3. 配置文件修改,进入hbase安装目录下的conf文件夹,主要修改hbase-env.sh和hbase-site.xml文件、regionservers

[hadoop@master]$ cd hbase-2.2.0/conf
[hadoop@master]$ vi hbase-env.sh
# The java implementation to use.  Java 1.8+ required. 
 export JAVA_HOME=/home/hadoop/jdk1.8.0_11

# Extra Java CLASSPATH elements.  Optional.
export HBASE_CLASSPATH=/home/hadoop/hbase-2.2.0/conf

继续修改hbase-site.xml文件:

<configuration>
        <property>
                <name>hbase.rootdir</name>
                <value>hdfs://master:9000/hbase</value>
        </property>

        <property>
                <name>hbase.zookeeper.quorum</name>
                <value>master</value>
                <description>The directory shared by RegionServers.
                </description>
        </property>

        <property>
               <name>hbase.cluster.distributed</name>
               <value>true</value>
        </property>

        <property>
              <name>hbase.master.info.port</name>
              <value>16010</value>
         </property>
	<property>     
		<name>hbase.unsafe.stream.capability.enforce</name>
		<value>false</value>
	</property>      
</configuration>

      修改regionservers文件设置zookeeper节点

使用vi命令打开该文件,默认为localhost,修改为本机器的hostname即可。实践时本机器的hostname为master。

4. 启动HBASE, 在bin目录下启动./start-hbase.sh

[hadoop@master bin]$ ./start-hbase.sh
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-3.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-2.2.0/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-3.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-2.2.0/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
localhost: running zookeeper, logging to /home/hadoop/hbase-2.2.0/bin/../logs/hbase-hadoop-zookeeper-master.out
running master, logging to /home/hadoop/hbase-2.2.0/bin/../logs/hbase-hadoop-master-master.out
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-3.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-2.2.0/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
: running regionserver, logging to /home/hadoop/hbase-2.2.0/bin/../logs/hbase-hadoop-regionserver-master.out

查看进程如下:

[hadoop@master bin]$ jps
10595 NodeManager
19224 HMaster
10473 ResourceManager
10090 DataNode
19642 Jps
9947 NameNode
19371 HRegionServer
19167 HQuorumPeer

其中的Hmaster、HRegionServer和HQuorumPeer都是Hbase的进程,表明已经正常启动了。

5. 可以从web界面查看状态。在hbase-site.xm文件中设置了端口号为16010,因此可以在外部浏览器里输入IP地址和端口号:

cdh查看hadoop是否高可用 cdh查看hadoop版本号_hadoop_10

常见错误:

1. 如果报错出现:

java.lang.IllegalStateException: The procedure WAL relies on the ability to hsync for proper operation during component failures, but the underlying filesystem does not support doing so. Please check the config value of 'hbase.procedure.store.wal.use.hsync' to set the desired level of robustness and ensure the config value of 'hbase.wal.dir' points to a FileSystem mount that can provide it.

则需要在hbase-site.xml增加如下配置:

<property>
 <name>hbase.unsafe.stream.capability.enforce</name>
 <value>false</value>
 </property>2. 如果出现运行hbase shell时:ERROR: KeeperErrorCode = NoNode for /hbase/meta-region-serve
说明zookeeper没正常启动

3. 如果运行hbase时出现:Java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutputHelper

则需要在hbase-site.xml增加如下配置:

<property>
 <name>hbase.wal.provider</name>
 <value>filesystem</value>
 </property>


 

(3)Hbase使用测试

Hbase是一种典型的NOSQL数据库,与Redis、Mongodb等类似,没有严格的数据模型定义。不像关系型数据库,模型定义完备,满足各种范式要求,然后一行一行存储和读取,Hbase则是以列来存储和读取,每一列有列名、列号和列值,同时还有版本号,也就是这一列的值可以存储好几个版本,HBase专门用于大数据的分布式存储。所以除非有真正大数据量的需求,HBASE发挥他的特长,一般的数据量来使用hbase还是有点浪费的。

1. 在当前用户目录下输入hbase shell命令,进入hbase操作:

[hadoop@master ~]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-3.1.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-2.2.0/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell
Version 2.2.0, rUnknown, Tue Jun 11 04:30:30 UTC 2019
Took 0.0016 seconds                                                                                                                                   
hbase(main):001:0> exit

出现了hbase(main):001.0>就可以在后面输入相关操作命令了。

可以敲入help,看看相关帮助:

hbase(main):003:0> help
HBase Shell, version 2.2.0, rUnknown, Tue Jun 11 04:30:30 UTC 2019
Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.
Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.

COMMAND GROUPS:
  Group name: general
  Commands: processlist, status, table_help, version, whoami

  Group name: ddl
  Commands: alter, alter_async, alter_status, clone_table_schema, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, list_regions, locate_region, show_filters

  Group name: namespace
  Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables

  Group name: dml
  Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve

2. 新建namespace。在hbase里其数据库名以namespace来代替,可以看做一个业务或项目名称集合来理解。因此在新建时使用的是create _namespace,查看现有的namespace使用list_namespace命令,删除是drop_namespace。

hbase(main):004:0> list_namespace
NAMESPACE                                                                                                                                             
default                                                                                                                                               
hbase                                                                                                                                                 
stuinfo                                                                                                                                               
3 row(s)
Took 0.0254 seconds                                                                                                                                   
hbase(main):005:0> create_namespace 'sinaWeiboData'
Took 0.4083 seconds                                                                                                                                   
hbase(main):006:0> list_namespace
NAMESPACE                                                                                                                                             
default                                                                                                                                               
hbase                                                                                                                                                 
sinaWeiboData                                                                                                                                         
stuinfo                                                                                                                                               
4 row(s)
Took 0.0186 seconds

3. namespace中相关表操作。有了业务名称namespace如sinaWeiboData,就可以添加相关记录表了。由于是列存储方式,因此这里新建就是列名。

使用create ’namespace:表名', 列名1,列名2方式。创建成功后,可以使用describe方式来查看结构

hbase(main):010:0> create 'sinaWeiboData:logs','user','record'
Created table sinaWeiboData:logs
Took 2.4328 seconds                                                                                                                                   
=> Hbase::Table - sinaWeiboData:logs
hbase(main):011:0> list
TABLE                                                                                                                                                 
sinaWeiboData:logs                                                                                                                                    
user                                                                                                                                                  
2 row(s)
Took 0.0075 seconds                                                                                                                                   
=> ["sinaWeiboData:logs", "user"]
hbase(main):013:0> describe 'sinaWeiboData:logs'
Table sinaWeiboData:logs is ENABLED                                                                                                                   
sinaWeiboData:logs                                                                                                                                    
COLUMN FAMILIES DESCRIPTION                                                                                                                           
{NAME => 'record', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WR
ITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_W
RITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'tru
e', BLOCKSIZE => '65536'}                                                                                                                             

{NAME => 'user', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRIT
E => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRI
TE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true'
, BLOCKSIZE => '65536'}                                                                                                                               

2 row(s)

QUOTAS                                                                                                                                                
0 row(s)
Took 0.3043 seconds

4. 新建列记录。HBase中用put命令添加数据,注意:一次只能为一个表的一行数据的一个列,也就是一个单元格添加一个数据,所以直接用shell命令插入数据效率很低,在实际应用中,一般都是利用编程操作数据。

例如先往logs表里增加第一行的列名为user的第一个记录,然后再增加第一列的列名为record的第一条记录

hbase(main):002:0> put 'sinaWeiboData:logs','1001','user','caojianhua'
Took 0.2171 seconds                                                                                                                                   
hbase(main):003:0> put 'sinaWeiboData:logs','1001','record','visiting all the news and be focused by 333 fans'
Took 0.0106 seconds

5. 查看记录。使用get命令来查看。格式参考:get 表名、列名,行键。

hbase(main):004:0> get 'sinaWeiboData:logs','1001'
COLUMN                                 CELL                                                                                                           
 record:                               timestamp=1581038283677, value=visiting all the news and be focused by 333 fans                                
 user:                                 timestamp=1581038245344, value=caojianhua                                                                      
1 row(s)
Took 0.0481 seconds

也可以使用scan扫描来获取,不过这个在数据较多时较为耗时:

hbase(main):005:0> scan 'sinaWeiboData:logs'
ROW                                    COLUMN+CELL                                                                                                    
 1001                                  column=record:, timestamp=1581038283677, value=visiting all the news and be focused by 333 fans                
 1001                                  column=user:, timestamp=1581038245344, value=caojianhua                                                        
1 row(s)
Took 0.0463 seconds