Apache Hbase 系列文章

1、hbase-2.1.0介绍及分布式集群部署、HA集群部署、验证、硬件配置推荐 2、hbase-2.1.0 shell基本操作详解 3、HBase的java API基本操作(创建、删除表以及对数据的添加、删除、查询以及多条件查询) 4、HBase使用(namespace、数据分区、rowkey设计、原生api访问hbase) 5、Apache Phoenix(5.0.0-5.1.2) 介绍及部署、使用(基本使用、综合使用、二级索引示例)、数据分区示例 6、Base批量装载——Bulk load(示例一:基本使用示例) 7、Base批量装载-Bulk load(示例二:写千万级数据-mysql数据以ORCFile写入hdfs,然后导入hbase) 8、HBase批量装载-Bulk load(示例三:写千万级数据-mysql数据直接写成Hbase需要的数据,然后导入hbase)


(文章目录)


本文介绍了Hbase的数据模型、shell的常见操作及示例、以及shell的管理操作示例。 本文的前提是hbase的环境可用。 本文分为三部分,即Hbase数据模型简介、常见shell操作及示例、shell管理操作及示例。

一、HBase数据模型

1、简介

在HBASE中,数据存储在具有行和列的表中。看起来关系数据库(RDBMS)一样,但将HBASE表看成是多个维度的Map结构更容易理解。 在这里插入图片描述 示例 在这里插入图片描述

{
  "col1" : "hbase",
  "col2" : "hello",
  "col3" : "world"
}

2、概念介绍

1)、表(Table)

HBase中数据都是以表形式来组织的 HBase中的数据由多个行组成 在HBase WebUI(http://server1:16010中可以查看到目前HBase中的表)

2)、行(row)

HBASE中的行由一个rowkey(行键)和一个或多个列组成,列的值与rowkey、列相关联 行在存储时按行键按字典顺序排序

3)、列(Column)

HBASE中的列由列簇(Column Family)和列限定符(Column Qualifier)组成 表示如下——列簇名:列限定符名。例如:C1:USER_ID、C1:SEX

4)、列簇(Column Family)

在这里插入图片描述 每个列簇都有一组存储属性,例如:

  • 是否应该缓存在内存中
  • 数据如何被压缩或行键如何编码等

表中的每一行都有相同的列簇,但在列簇中不存储任何内容 所有的列簇的数据全部都存储在一块(文件系统HDFS) HBase官方建议所有的列簇保持一样的列,并且将同一类的列放在一个列簇中

5)、列标识符(Column Qualifier)

列簇中包含一个个的列限定符,这样可以为存储的数据提供索引 列簇在创建表的时候是固定的,但列限定符是不作限制的 不同的行可能会存在不同的列标识符

6)、单元格(Cell)

单元格是行、列系列和列限定符的组合 包含一个值和一个时间戳(表示该值的版本) 单元格中的内容是以二进制存储的 在这里插入图片描述

二、常见的shell操作

hbase集群在创建时,默认预定义了两个特殊的命名空间

  • hbase - 系统命名空间,用于包含 HBase 内部表
  • default - 没有明确指定命名空间的表将自动落入这个命名空间

1、创建表

  • 创建表,必须传递两个值,一个是表名,一个是列族名。
  • 其他配置可选,其他配置都是对表(实际列族)的约束,根据实际生产要求添加,比如压缩,时间戳,版本等等。且属性可以单独指定,不指定的属性就是默认值。
  • 在HBase中,所有的数据也都是保存在表中的。要将订单数据保存到HBase中,首先需要将表创建出来。

1)、启动HBase Shell

HBase的shell其实JRuby的IRB(交互式的Ruby),但在其中添加了一些HBase的命令。

[alanchan@server1 conf]$ hbase shell
HBase Shell
Use "help" to get list of supported commands.
Use "exit" to quit this interactive shell.
Version 2.1.0, re1673bb0bbfea21d6e5dba73e013b09b8b49b89b, Tue Jul 10 17:26:48 CST 2018
Took 0.0025 seconds                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
hbase(main):001:0> 

2)、创建表

语法:create '表名','列簇名'... 注意:

示例:创建订单表,表名为ORDER_INFO,该表有一个列簇为C1

create 'ORDER','C1'

# 注意,Hbase表/列族属性介绍:
# NAME => '', 列族名
# VERSIONS => '1',  版本数,默认数据存放一个版本,多余删除,实际生产常用参数,可以设置更多
# EVICT_BLOCKS_ON_CLOSE => 'false', 是否在关闭时从blockcache中取出缓存块。
# NEW_VERSION_BEHAVIOR => 'false',可选新版本行为,hbase2特性,与删除有关。
# KEEP_DELETED_CELLS => 'FALSE',列族是否可以选择保留已删除的单元格。如果true情况下,仍然可以检索已删除的单元格。默认一般删除了就不保留了,false。具体可以参考: Apache HBase ™ Reference Guide
# CACHE_DATA_ON_WRITE => 'false', 写入缓存数据
# DATA_BLOCK_ENCODING => 'NONE', 数据块block的编码方式设置, HBase目前提供了四种常用的编码方式: Prefix_Tree、 Diff 、 Fast_Diff 、Prefix。
# TTL => 'FOREVER', 全称time to live,列族可以以设置一个以秒为单位的 TTL 长度,一旦到了过期时间,HBase 会自动删除行。这适用于一行的所有版本——甚至是当前版本。在 HBase 中为行编码的 TTL 时间以 UTC 指定。非常常用,生产一般设置TTL,相当于数仓里的生命周期,比如一个月等,不然数据一直膨胀。
# MIN_VERSIONS => '0', 如果 HBase 中的表设置了 TTL 的时候,MIN_VERSIONS 才会起作用。
# REPLICATION_SCOPE => '0',REPLICATION_SCOPE 是列族级别属性,其值可以是 0 或 1。值 0 表示禁用复制,而 1 表示启用复制。这个一般默认值0。
# BLOOMFILTER => 'ROW',布隆过滤器级别,默认行级别
# CACHE_INDEX_ON_WRITE => 'false', 写入缓存索引
# IN_MEMORY => 'false', 是否将列族存储在内存中,HBase 可以选择一个列族赋予更高的优先级缓存,激进缓存(表示优先级更高),IN_MEMORY 默认是false。如果设置为true,HBase 会尝试将整个列族保存在内存中,只有在需要保存是才会持久化写入磁盘。但是在运行时 HBase 会尝试将整张表加载到内存里。
# CACHE_BLOOMS_ON_WRITE => 'false', 写入时缓存爆发
# PREFETCH_BLOCKS_ON_OPEN => 'false', 在打开状态下预取块,默认false
# COMPRESSION => 'NONE', 配置数据是否压缩,以及压缩算法,如snappy等,针对列族进行配置,一张表多个列族可以不同列族不同压缩算法。
# BLOCKCACHE => 'true',  块缓存是否开启,默认开启
# BLOCKSIZE => '65536'} 设置HFile数据块大小(默认64kb)一般不改,即使修改也是集群层面的统一设置,很少设置单个表,单个列族的属性。

# 如上表的属性都是针对列族的,所有的操作属性都是列族级别的。我们可以针对列族设置,也可以使用默认值。
# 指定列族属性创建表。
# 1.创建表使用NAME属性值指定列族名 
hbase(main):040:0> create 't4', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'} 
hbase(main):040:0> create 't4', 'f1','f2','f3' 
# 注意这两种创建的表结构都是一样的 
# 2.其他指定列族属性创建表 
hbase> create 't1', 'f1', 'f2', 'f3' 
hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true} 
hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}} 
hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 1000000, MOB_COMPACT_PARTITION_POLICY => 'weekly'}
# 上面是默认的,可以通过创建表时指定很多属性,比如预分区、压缩方式等

# 示例
# 1、创建表时指定一个拆分点数组
# split point 将定义n+1个区域,其中n是分割点的数量, point 为'10'时实际上是在指定字节分割'\x31\30'
hbase>create 't1','f',SPLITS => ['10','20','30']
# 获取rowkey的第一个字节,按照进行分割
hbase> create 't',{NAME => 'f0',VERSIONS => 1, COMPRESSION => 'snappy'},{NAME => 'f1',VERSIONS => 10000, COMPRESSION => 'snappy'},CONFIGURATION => {'SPLIT_POLICY' => 'org.apache.hadoop.hbase.regionserver.KeyPrefixRegionSplitPolicy','KeyPrefixRegionSplitPolicy.prefix_length' => '1'},SPLITS => ['2', '4', '6', '8']

# 2、使用 SPLITS_FILE 来指定一个文本文件,文件内写入拆分点。文件中的每一行都指定一个 split point key
splits.txt文件内容:
10
20
30
hbase>create 't1','f',SPLITS_FILE=>'splits.txt'

# 3、根据所需的区域数量和分割算法自动计算分割。HBase 提供了基于均匀分割或基于十六进制键来分割键范围的算法,也可以提供自己的分割算法来细分键范围
# 基于随机算法创建一个有4默认个分区的表
hbase>create 't','f', { NUMREGIONS => 4 , SPLITALGO => 'UniformSplit' }
# 基于 hex keys 创建一个有4个默认分区的表
hbase>create 't','f', { NUMREGIONS => 4, SPLITALGO => 'HexStringSplit' }
# 说明:
    # NUMREGIONS:hbase 默认 HFile 的大小为 10G(hbase.hregion.max.filesize=10737418240=10G)
    # HexStringSplit【占用空间大,rowkey是十六进制的字符串作为前缀的​】​​​​​​:Row是十六进制编码的长值,其范围为“00000000”=>“FFFFFFFF”,格式是MD5校验和或任何其他均匀分布的十六进制值的ASCII表示形式。
    # UniformSplit【占用空间小,rowkey前缀完全随机】:rowkey前缀完全随机,键的空间平均分割,当rowkey是近似一致的随机字节时(例如散列),建议使用这个。
    # DecimalStringSplit:rowkey是10进制数字字符串作为前缀的。

3)、查看表

hbase(main):002:0> list
TABLE                                                                                                                                                                                                                                                                    
0 row(s)
Took 0.0358 seconds                                                                                                                                                                                                                                                      
=> []
hbase(main):003:0> create 'order','C1';
hbase(main):004:0* list
Created table order
Took 2.5173 seconds                                                                                                                                                                                                                                                      
TABLE                                                                                                                                                                                                                                                                    
order                                                                                                                                                                                                                                                                    
1 row(s)
Took 0.0106 seconds                                                                                                                                                                                                                                                      
=> ["order"]

4)、禁用表

语法:disable "表名"

5)、删除表

要删除某个表,必须要先禁用表 语法:drop "表名" 示例:删除ORDER表

hbase(main):004:0* list
Created table order
Took 2.5173 seconds                                                                                                                                                                                                                                                      
TABLE                                                                                                                                                                                                                                                                    
order                                                                                                                                                                                                                                                                    
1 row(s)
Took 0.0106 seconds                                                                                                                                                                                                                                                      
=> ["order"]
hbase(main):005:0> disable 'order';
hbase(main):006:0* drop 'order';
hbase(main):007:0* list
Took 1.6657 seconds                                                                                                                                                                                                                                                      
Took 0.8121 seconds                                                                                                                                                                                                                                                      
TABLE                                                                                                                                                                                                                                                                    
0 row(s)
Took 0.0046 seconds                                                                                                                                                                                                                                                      
=> []

2、添加数据

向订单表中添加以下数据 在这里插入图片描述

1)、PUT操作

HBase中的put命令,可以用来将数据保存到表中。但put一次只能保存一个列的值。 以下是put的语法结构:put '表名','ROWKEY','列簇名:列名','值'

put '表名','ROWKEY','列簇名:列名','值'

#先创建表,参考上一节
#要添加以上的数据,需要使用7次put操作。如下:
create 'orderinfo','C1' 

put 'orderinfo','000001','C1:ID','000001'
put 'orderinfo','000001','C1:STATUS','已提交'
put 'orderinfo','000001','C1:PAY_MONEY','4070'
put 'orderinfo','000001','C1:PAYWAY','1'
put 'orderinfo','000001','C1:USER_ID','4944191'
put 'orderinfo','000001','C1:OPERATION_DATE','2020-04-25 12:09:16'
put 'orderinfo','000001','C1:CATEGORY','手机;'

hbase(main):011:0> put 'orderinfo','000001','C1:ID','000001';
hbase(main):012:0* put 'orderinfo','000001','C1:STATUS','已提交'
Took 0.0349 seconds                                                                                                                                                                                                                                                      
Took 0.0022 seconds                                                                                                                                                                                                                                                      
hbase(main):013:0> put 'orderinfo','000001','C1:PAY_MONEY','4070'
Took 0.0031 seconds                                                                                                                                                                                                                                                      
hbase(main):014:0> put 'orderinfo','000001','C1:PAYWAY','1'
Took 0.0023 seconds                                                                                                                                                                                                                                                      
hbase(main):015:0> put 'orderinfo','000001','C1:USER_ID','4944191'
Took 0.0039 seconds                                                                                                                                                                                                                                                      
hbase(main):016:0> put 'orderinfo','000001','C1:OPERATION_DATE','2020-04-25 12:09:16'
Took 0.0022 seconds                                                                                                                                                                                                                                                      
hbase(main):017:0> put 'orderinfo','000001','C1:CATEGORY','手机;'
Took 0.0038 seconds 

2)、get命令查看添加的数据

要求将rowkey为000001对应的数据查询出来。

在HBase中,可以使用get命令来获取单独的一行数据。 语法:get '表名','rowkey'

#查询指定订单ID的数据
get 'orderinfo','000001'

hbase(main):020:0> get 'orderinfo','000001'
COLUMN                                                              CELL                                                                                                                                                                                                 
 C1:CATEGORY                                                        timestamp=1664352765105, value=\xE6\x89\x8B\xE6\x9C\xBA;                                                                                                                                             
 C1:ID                                                              timestamp=1664352751623, value=000001                                                                                                                                                                
 C1:OPERATION_DATE                                                  timestamp=1664352763364, value=2020-04-25 12:09:16                                                                                                                                                   
 C1:PAYWAY                                                          timestamp=1664352763327, value=1                                                                                                                                                                     
 C1:PAY_MONEY                                                       timestamp=1664352763314, value=4070                                                                                                                                                                  
 C1:STATUS                                                          timestamp=1664352751638, value=\xE5\xB7\xB2\xE6\x8F\x90\xE4\xBA\xA4                                                                                                                                  
 C1:USER_ID                                                         timestamp=1664352763348, value=4944191                                                                                                                                                               
1 row(s)
Took 0.0183 seconds          

3)、显示中文

在HBase shell中,如果在数据中出现了一些中文,默认HBase shell中显示出来的是十六进制编码。要想将这些编码显示为中文,需要在get命令后添加一个属性:{FORMATTER => 'toString'}

get 'orderinfo','000001', {FORMATTER => 'toString'}

hbase(main):021:0> get 'orderinfo','000001', {FORMATTER => 'toString'}
COLUMN                                                              CELL                                                                                                                                                                                                 
 C1:CATEGORY                                                        timestamp=1664352765105, value=手机;                                                                                                                                                                   
 C1:ID                                                              timestamp=1664352751623, value=000001                                                                                                                                                                
 C1:OPERATION_DATE                                                  timestamp=1664352763364, value=2020-04-25 12:09:16                                                                                                                                                   
 C1:PAYWAY                                                          timestamp=1664352763327, value=1                                                                                                                                                                     
 C1:PAY_MONEY                                                       timestamp=1664352763314, value=4070                                                                                                                                                                  
 C1:STATUS                                                          timestamp=1664352751638, value=已提交                                                                                                                                                                   
 C1:USER_ID                                                         timestamp=1664352763348, value=4944191                                                                                                                                                               
1 row(s)
Took 0.0314 seconds 

注:

  • { key => value},这个是Ruby语法,表示定义一个HASH结构
  • get是一个HBase Ruby方法,’orderinfo’、’000001’、{FORMATTER => 'toString'}是put方法的三个参数
  • FORMATTER要使用大写
  • 在Ruby中用{}表示一个字典,类似于hashtable,FORMATTER表示key、’toString’表示值

3、使用put来更新数据

将订单ID为000001的状态,更改为「已付款」

在HBase中,也是使用put命令来进行数据的更新,语法与之前的添加数据一样。

put 'orderinfo', '000001', 'C1:STATUS', '已付款'

hbase(main):022:0> put 'orderinfo', '000001', 'C1:STATUS', '已付款'
Took 0.0053 seconds                                                                                                                                                                                                                                                      
hbase(main):023:0> get 'orderinfo','000001', {FORMATTER => 'toString'}
COLUMN                                                              CELL                                                                                                                                                                                                 
 C1:CATEGORY                                                        timestamp=1664352765105, value=手机;                                                                                                                                                                   
 C1:ID                                                              timestamp=1664352751623, value=000001                                                                                                                                                                
 C1:OPERATION_DATE                                                  timestamp=1664352763364, value=2020-04-25 12:09:16                                                                                                                                                   
 C1:PAYWAY                                                          timestamp=1664352763327, value=1                                                                                                                                                                     
 C1:PAY_MONEY                                                       timestamp=1664352763314, value=4070                                                                                                                                                                  
 C1:STATUS                                                          timestamp=1664353267667, value=已付款                                                                                                                                                                   
 C1:USER_ID                                                         timestamp=1664352763348, value=4944191                                                                                                                                                               
1 row(s)
Took 0.0132 seconds                                                             

注意:

  • HBase中会自动维护数据的版本
  • 每当执行一次put后,都会重新生成新的时间戳
 C1:STATUS                                                          timestamp=1664352751638, value=已提交
 C1:STATUS                                                          timestamp=1664353267667, value=已付款

4、删除操作

删除数据的时候,其实HBase不是真的直接把数据删除掉,而是给某个列设置一个标志,然后查询数据的时候,有这个标志的数据,就不显示出来

1)、delete命令

删除指定的列的值。 在HBase中,可以使用delete命令来将一个单元格的数据删除。 语法格式如下:delete '表名', 'rowkey', '列簇:列' 注意:此处HBase默认会保存多个时间戳的版本数据,所以这里的delete删除的是最新版本的列数据。

#将订单ID为000001的状态列STATUS删除
delete 'orderinfo','000001','C1:STATUS'

hbase(main):023:0> get 'orderinfo','000001', {FORMATTER => 'toString'}
COLUMN                                                              CELL                                                                                                                                                                                                 
 C1:CATEGORY                                                        timestamp=1664352765105, value=手机;                                                                                                                                                                   
 C1:ID                                                              timestamp=1664352751623, value=000001                                                                                                                                                                
 C1:OPERATION_DATE                                                  timestamp=1664352763364, value=2020-04-25 12:09:16                                                                                                                                                   
 C1:PAYWAY                                                          timestamp=1664352763327, value=1                                                                                                                                                                     
 C1:PAY_MONEY                                                       timestamp=1664352763314, value=4070                                                                                                                                                                  
 C1:STATUS                                                          timestamp=1664353267667, value=已付款                                                                                                                                                                   
 C1:USER_ID                                                         timestamp=1664352763348, value=4944191                                                                                                                                                               
1 row(s)
Took 0.0132 seconds                                                                                                                                                                                                                                                      
hbase(main):024:0> delete 'orderinfo','000001','C1:STATUS'
Took 0.0177 seconds                                                                                                                                                                                                                                                      
hbase(main):025:0> get 'orderinfo','000001', {FORMATTER => 'toString'}
COLUMN                                                              CELL                                                                                                                                                                                                 
 C1:CATEGORY                                                        timestamp=1664352765105, value=手机;                                                                                                                                                                   
 C1:ID                                                              timestamp=1664352751623, value=000001                                                                                                                                                                
 C1:OPERATION_DATE                                                  timestamp=1664352763364, value=2020-04-25 12:09:16                                                                                                                                                   
 C1:PAYWAY                                                          timestamp=1664352763327, value=1                                                                                                                                                                     
 C1:PAY_MONEY                                                       timestamp=1664352763314, value=4070                                                                                                                                                                  
 C1:STATUS                                                          timestamp=1664352751638, value=已提交                                                                                                                                                                   
 C1:USER_ID                                                         timestamp=1664352763348, value=4944191                                                                                                                                                               
1 row(s)
Took 0.0124 seconds      

该表中之前修改为已付款的数据没有了,只有已提交数据

 C1:STATUS                                                          timestamp=1664352751638, value=已提交 

如果该列已经没有数据了,则该列将不会显示

hbase(main):045:0> delete 'orderinfo','000001','C1:STATUS'
Took 0.0100 seconds                                                                                                                                                                                                                                                      
hbase(main):046:0> get 'orderinfo','000001', {FORMATTER => 'toString'}
COLUMN                                                              CELL                                                                                                                                                                                                 
 C1:CATEGORY                                                        timestamp=1664354127539, value=手机;                                                                                                                                                                   
 C1:ID                                                              timestamp=1664354126226, value=000001                                                                                                                                                                
 C1:OPERATION_DATE                                                  timestamp=1664354126294, value=2020-04-25 12:09:16                                                                                                                                                   
 C1:PAYWAY                                                          timestamp=1664354126270, value=1                                                                                                                                                                     
 C1:PAY_MONEY                                                       timestamp=1664354126256, value=4070                                                                                                                                                                  
 C1:USER_ID                                                         timestamp=1664354126281, value=4944191
 

2)、deleteall命令删除整行数据

deleteall命令可以将指定rowkey对应的所有列全部删除。 语法:deleteall '表名','rowkey'

删除指定的订单 将订单ID为000001的信息全部删除(删除所有的列)

deleteall 'orderinfo','000001'

hbase(main):026:0> deleteall 'orderinfo','000001'
Took 0.0056 seconds                                                                                                                                                                                                                                                      
hbase(main):027:0> get 'orderinfo','000001', {FORMATTER => 'toString'}
COLUMN                                                              CELL                                                                                                                                                                                                 
0 row(s)
Took 0.0065 seconds 

3)、truncate命令清空表

truncate命令用来清空某个表中的所有数据(包含所有分區、壓縮算法等)。 语法:truncate "表名" 将orderinfo的数据全部删除

#清空orderinfo的所有数据
truncate 'orderinfo'

put 'orderinfo','000001','C1:ID','000001'
put 'orderinfo','000001','C1:STATUS','已提交'
put 'orderinfo','000001','C1:PAY_MONEY','4070'
put 'orderinfo','000001','C1:PAYWAY','1'
put 'orderinfo','000001','C1:USER_ID','4944191'
put 'orderinfo','000001','C1:OPERATION_DATE','2020-04-25 12:09:16'
put 'orderinfo','000001','C1:CATEGORY','手机;'

put 'orderinfo','000002','C1:ID','000002'
put 'orderinfo','000002','C1:STATUS','已提交'
put 'orderinfo','000002','C1:PAY_MONEY','88480'
put 'orderinfo','000002','C1:PAYWAY','1'
put 'orderinfo','000002','C1:USER_ID','654321'
put 'orderinfo','000002','C1:OPERATION_DATE','2022-04-25 12:09:16'
put 'orderinfo','000002','C1:CATEGORY','computer;'

hbase(main):042:0> get 'orderinfo','000001', {FORMATTER => 'toString'}
COLUMN                                                              CELL                                                                                                                                                                                                 
 C1:CATEGORY                                                        timestamp=1664354127539, value=手机;                                                                                                                                                                   
 C1:ID                                                              timestamp=1664354126226, value=000001                                                                                                                                                                
 C1:OPERATION_DATE                                                  timestamp=1664354126294, value=2020-04-25 12:09:16                                                                                                                                                   
 C1:PAYWAY                                                          timestamp=1664354126270, value=1                                                                                                                                                                     
 C1:PAY_MONEY                                                       timestamp=1664354126256, value=4070                                                                                                                                                                  
 C1:STATUS                                                          timestamp=1664354126240, value=已提交                                                                                                                                                                   
 C1:USER_ID                                                         timestamp=1664354126281, value=4944191                                                                                                                                                               
1 row(s)
Took 0.0089 seconds                                                                                                                                                                                                                                                      
hbase(main):043:0> get 'orderinfo','000002', {FORMATTER => 'toString'}
COLUMN                                                              CELL                                                                                                                                                                                                 
 C1:CATEGORY                                                        timestamp=1664354138837, value=computer;                                                                                                                                                             
 C1:ID                                                              timestamp=1664354137455, value=000002                                                                                                                                                                
 C1:OPERATION_DATE                                                  timestamp=1664354137513, value=2022-04-25 12:09:16                                                                                                                                                   
 C1:PAYWAY                                                          timestamp=1664354137490, value=1                                                                                                                                                                     
 C1:PAY_MONEY                                                       timestamp=1664354137478, value=88480                                                                                                                                                                 
 C1:STATUS                                                          timestamp=1664354137467, value=已提交                                                                                                                                                                   
 C1:USER_ID                                                         timestamp=1664354137502, value=654321                                                                                                                                                                
1 row(s)
Took 0.0073 seconds 
hbase(main):047:0> truncate 'orderinfo'
Truncating 'orderinfo' table (it may take a while):
Disabling table...
Truncating table...
Took 3.7296 seconds                                                                                                                                                                                                                                                      
hbase(main):048:0> scan 'orderinfo'
ROW                                                                 COLUMN+CELL                                                                                                                                                                                          
0 row(s)
Took 0.1142 seconds        

5、导入数据集

需求 将数据集orderinfo.txt 导入到HBase中。文件内容格式如下:

put 'orderinfo' ,'02602f66-adc7-40d4-8485-76b5632b5b53', 'C1:STATUS','已提交'
put 'orderinfo' ,'0968a418-f2bc-49b4-b9a9-2157cf214cfd', 'C1:STATUS','已完成'
put 'orderinfo' ,'0e01edba-5e55-425e-837a-7efb91c56630', 'C1:STATUS','已提交'

利用hbase shell命令导入即可

hbase shell 文件名

#需要导入的文件上传至指定的位置
[alanchan@server4 testMR]$ hbase shell orderinfo.txt 
Took 0.4314 seconds                                                                                                                                                                                                                                                      
Took 0.0037 seconds                                                                                                                                                                                                                                                      
Took 0.0028 seconds                                                                                                                                                                                                                                                      
Took 0.0032 seconds                                                                                                                                                                                                                                                      
Took 0.0025 seconds                                                                                                                                                                                                                                                      
Took 0.0026 seconds                                                                                                                                                                                                                                                      
Took 0.0025 seconds                                                                                                                                                                                                                                                      
Took 0.0029 seconds                                                                                                                                                                                                                                                      
Took 0.0026 seconds                                                                                                                                                                                                                                                      
Took 0.0026 seconds 
...

6、计数操作

查看HBase中的orderinfo表,一共有多少条记录

1)、count命令

count命令专门用来统计一个表中有多少条数据。 语法:count ‘表名’

hbase(main):050:0> count 'orderinfo'
66 row(s)
Took 0.0638 seconds                                                                                                                                                                                                                                                      
=> 66
# 注意:这个操作是比较耗时的。在数据量大的这个命令可能会运行很久

2)、MapReduce程序RowCounter命令

#大量数据的计数统计 当HBase中数据量大时,可以使用HBase中提供的MapReduce程序来进行计数统计。 语法如下:$HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter '表名'

#启动YARN集群
start-yarn.sh
#启动history server
mr-jobhistory-daemon.sh start historyserver
#执行MR JOB
$HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'orderinfo'

[alanchan@server4 testMR]$ $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'orderinfo'
/usr/local/bigdata/hadoop-3.1.4/libexec/hadoop-functions.sh:行2360: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_USER: 错误的替换
/usr/local/bigdata/hadoop-3.1.4/libexec/hadoop-functions.sh:行2455: HADOOP_ORG.APACHE.HADOOP.HBASE.UTIL.GETJAVAPROPERTY_OPTS: 错误的替换
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/bigdata/hadoop-3.1.4/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/bigdata/hbase-2.1.0/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2022-09-28 17:01:54,073 INFO  [main] impl.TimelineClientImpl: Timeline service address: http://server1:8188/ws/v1/timeline/
2022-09-28 17:01:54,187 INFO  [main] client.AHSProxy: Connecting to Application History server at server1/192.168.10.41:10200
2022-09-28 17:02:00,083 INFO  [main] zookeeper.ReadOnlyZKClient: Connect 0x0dca2615 to server1:2118,server2:2118,server3:2118 with session timeout=90000ms, retries 30, retry interval 1000ms, keepAlive=60000ms
2022-09-28 17:02:00,098 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
2022-09-28 17:02:00,098 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615] zookeeper.ZooKeeper: Client environment:host.name=server4
2022-09-28 17:02:00,098 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_144
2022-09-28 17:02:00,098 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615] zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
2022-09-28 17:02:00,098 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615] zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.8.0_144/jre
2022-09-28 17:02:00,098 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615] zookeeper.ZooKeeper: rn/hadoop-yarn-server-common-3.1.4.jar:/usr/local/bigdata/hadoop-3.1.4/share/hadoop/yarn/hadoop-yarn-server-timeline-pluginstorage-3.1.4.jar:/usr/local/bigdata/hadoop-3.1.4/share/hadoop/yarn/hadoop-yarn-server-router-3.1.4.jar:/usr/local/bigdata/hadoop-3.1.4/share/hadoop/yarn/hadoop-yarn-services-api-3.1.4.jar:/usr/local/bigdata/hadoop-3.1.4/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-3.1.4.jar:/usr/local/bigdata/hadoop-3.1.4/share/hadoop/yarn/hadoop-yarn-registry-3.1.4.jar:/usr/local/bigdata/hadoop-3.1.4/share/hadoop/yarn/hadoop-yarn-client-3.1.4.jar:/usr/local/bigdata/hadoop-3.1.4/share/hadoop/yarn/hadoop-yarn-services-core-3.1.4.jar:/usr/local/bigdata/hadoop-3.1.4/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-3.1.4.jar:/usr/local/bigdata/hadoop-3.1.4/share/hadoop/yarn/hadoop-yarn-common-3.1.4.jar:/usr/local/bigdata/hadoop-3.1.4/share/hadoop/yarn/hadoop-yarn-api-3.1.4.jar:/usr/local/bigdata/hbase-2.1.0/lib/client-facing-thirdparty/slf4j-log4j12-1.7.25.jar
2022-09-28 17:02:00,098 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615] zookeeper.ZooKeeper: Client environment:java.library.path=/usr/local/bigdata/hadoop-3.1.4/lib/native
2022-09-28 17:02:00,099 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2022-09-28 17:02:00,099 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2022-09-28 17:02:00,099 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615] zookeeper.ZooKeeper: Client environment:os.name=Linux
2022-09-28 17:02:00,099 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615] zookeeper.ZooKeeper: Client environment:os.arch=amd64
2022-09-28 17:02:00,099 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615] zookeeper.ZooKeeper: Client environment:os.version=2.6.32-696.el6.x86_64
2022-09-28 17:02:00,099 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615] zookeeper.ZooKeeper: Client environment:user.name=alanchan
2022-09-28 17:02:00,099 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615] zookeeper.ZooKeeper: Client environment:user.home=/home/alanchan
2022-09-28 17:02:00,099 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615] zookeeper.ZooKeeper: Client environment:user.dir=/usr/local/bigdata/hadoop-3.1.4/testMR
2022-09-28 17:02:00,100 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615] zookeeper.ZooKeeper: Initiating client connection, connectString=server1:2118,server2:2118,server3:2118 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$$Lambda$16/1583756749@310c600e
2022-09-28 17:02:00,116 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615-SendThread(server2:2118)] zookeeper.ClientCnxn: Opening socket connection to server server2/192.168.10.42:2118. Will not attempt to authenticate using SASL (unknown error)
2022-09-28 17:02:00,117 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615-SendThread(server2:2118)] zookeeper.ClientCnxn: Socket connection established to server2/192.168.10.42:2118, initiating session
2022-09-28 17:02:00,141 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615-SendThread(server2:2118)] zookeeper.ClientCnxn: Session establishment complete on server server2/192.168.10.42:2118, sessionid = 0x2003f52c58c012f, negotiated timeout = 40000
2022-09-28 17:02:00,264 INFO  [main] mapreduce.RegionSizeCalculator: Calculating region sizes for table "orderinfo".
2022-09-28 17:02:00,773 INFO  [main] zookeeper.ReadOnlyZKClient: Close zookeeper connection 0x0dca2615 to server1:2118,server2:2118,server3:2118
2022-09-28 17:02:00,780 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615] zookeeper.ZooKeeper: Session: 0x2003f52c58c012f closed
2022-09-28 17:02:00,781 INFO  [ReadOnlyZKClient-server1:2118,server2:2118,server3:2118@0x0dca2615-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x2003f52c58c012f
2022-09-28 17:02:00,934 INFO  [main] mapreduce.JobSubmitter: number of splits:1
2022-09-28 17:02:00,963 INFO  [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2022-09-28 17:02:01,129 INFO  [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1664255440621_0007
2022-09-28 17:02:01,546 INFO  [main] impl.YarnClientImpl: Submitted application application_1664255440621_0007
2022-09-28 17:02:01,570 INFO  [main] mapreduce.Job: The url to track the job: http://server1:8088/proxy/application_1664255440621_0007/
2022-09-28 17:02:01,571 INFO  [main] mapreduce.Job: Running job: job_1664255440621_0007
2022-09-28 17:02:07,634 INFO  [main] mapreduce.Job: Job job_1664255440621_0007 running in uber mode : true
2022-09-28 17:02:07,635 INFO  [main] mapreduce.Job:  map 100% reduce 0%
2022-09-28 17:02:09,666 INFO  [main] mapreduce.Job: Job job_1664255440621_0007 completed successfully
2022-09-28 17:02:09,741 WARN  [main] counters.FrameworkCounterGroup: MAP_PHYSICAL_MEMORY_BYTES_MAX is not a recognized counter.
2022-09-28 17:02:09,741 WARN  [main] counters.FrameworkCounterGroup: MAP_VIRTUAL_MEMORY_BYTES_MAX is not a recognized counter.
2022-09-28 17:02:09,751 INFO  [main] mapreduce.Job: UM_UBER_SUBMAPS=1
                Total time spent by all map tasks (ms)=1219
                Total vcore-milliseconds taken by all map tasks=0
                Total megabyte-milliseconds taken by all map tasks=0
        Map-Reduce Framework
                Map input records=66
                Map output records=0
                Input split bytes=202
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=113
                CPU time spent (ms)=2210
                Physical memory (bytes) snapshot=446148608
                Virtual memory (bytes) snapshot=6484230144
                Total committed heap usage (bytes)=453509120
        HBase Counters
                BYTES_IN_REMOTE_RESULTS=0
                BYTES_IN_RESULTS=5616
                MILLIS_BETWEEN_NEXTS=598
                NOT_SERVING_REGION_EXCEPTION=0
                NUM_SCANNER_RESTARTS=0
                NUM_SCAN_RESULTS_STALE=0
                REGIONS_SCANNED=1
                REMOTE_RPC_CALLS=0
                REMOTE_RPC_RETRIES=0
                ROWS_FILTERED=0
                ROWS_SCANNED=66
                RPC_CALLS=1
                RPC_RETRIES=0
        org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
                ROWS=66
        File Input Format Counters 
                Bytes Read=0
        File Output Format Counters 
                Bytes Written=0

在这里插入图片描述

7、 scan命令扫描操作

1)、查询订单所有数据

在HBase,我们可以使用scan命令来扫描HBase中的表。 语法:scan '表名' 查看orderinfo表中所有的数据

scan 'orderinfo'
# 注意:要避免scan一张大表!

hbase(main):051:0> scan 'orderinfo'
ROW                                                                 COLUMN+CELL                                                                                                                                                                                          
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:CATEGORY, timestamp=1664355077238, value=\xE6\x89\x8B\xE6\x9C\xBA;                                                                                                                         
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:OPERATION_DATE, timestamp=1664355077059, value=2020-04-25 12:09:16                                                                                                                         
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:PAYWAY, timestamp=1664355076673, value=1                                                                                                                                                   
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:PAY_MONEY, timestamp=1664355076448, value=4070                                                                                                                                             
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:STATUS, timestamp=1664355076051, value=\xE5\xB7\xB2\xE6\x8F\x90\xE4\xBA\xA4                                                                                                                
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:USER_ID, timestamp=1664355076873, value=4944191                                                                                                                                            
 0968a418-f2bc-49b4-b9a9-2157cf214cfd                               column=C1:CATEGORY, timestamp=1664355077240, value=\xE5\xAE\xB6\xE7\x94\xA8\xE7\x94\xB5\xE5\x99\xA8;;\xE7\x94\xB5\xE8\x84\x91;                                                                       
 0968a418-f2bc-49b4-b9a9-2157cf214cfd                               column=C1:OPERATION_DATE, timestamp=1664355077061, value=2020-04-25 12:09:37

2)、查询订单数据(只显示3条)

scan 'orderinfo', {LIMIT => 3, FORMATTER => 'toString'}

hbase(main):052:0> scan 'orderinfo', {LIMIT => 3, FORMATTER => 'toString'}
ROW                                                                 COLUMN+CELL                                                                                                                                                                                          
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:CATEGORY, timestamp=1664355077238, value=手机;                                                                                                                                               
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:OPERATION_DATE, timestamp=1664355077059, value=2020-04-25 12:09:16                                                                                                                         
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:PAYWAY, timestamp=1664355076673, value=1                                                                                                                                                   
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:PAY_MONEY, timestamp=1664355076448, value=4070                                                                                                                                             
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:STATUS, timestamp=1664355076051, value=已提交                                                                                                                                                 
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:USER_ID, timestamp=1664355076873, value=4944191                                                                                                                                            
 0968a418-f2bc-49b4-b9a9-2157cf214cfd                               column=C1:CATEGORY, timestamp=1664355077240, value=家用电器;;电脑;                                                                                                                                         
 0968a418-f2bc-49b4-b9a9-2157cf214cfd                               column=C1:OPERATION_DATE, timestamp=1664355077061, value=2020-04-25 12:09:37                                                                                                                         
 0968a418-f2bc-49b4-b9a9-2157cf214cfd                               column=C1:PAYWAY, timestamp=1664355076675, value=1                                                                                                                                                   
 0968a418-f2bc-49b4-b9a9-2157cf214cfd                               column=C1:PAY_MONEY, timestamp=1664355076450, value=4350                                                                                                                                             
 0968a418-f2bc-49b4-b9a9-2157cf214cfd                               column=C1:STATUS, timestamp=1664355076103, value=已完成                                                                                                                                                 
 0968a418-f2bc-49b4-b9a9-2157cf214cfd                               column=C1:USER_ID, timestamp=1664355076875, value=1625615                                                                                                                                            
 0e01edba-5e55-425e-837a-7efb91c56630                               column=C1:CATEGORY, timestamp=1664355077244, value=男装;男鞋;                                                                                                                                            
 0e01edba-5e55-425e-837a-7efb91c56630                               column=C1:OPERATION_DATE, timestamp=1664355077065, value=2020-04-25 12:09:44                                                                                                                         
 0e01edba-5e55-425e-837a-7efb91c56630                               column=C1:PAYWAY, timestamp=1664355076680, value=3                                                                                                                                                   
 0e01edba-5e55-425e-837a-7efb91c56630                               column=C1:PAY_MONEY, timestamp=1664355076455, value=6370                                                                                                                                             
 0e01edba-5e55-425e-837a-7efb91c56630                               column=C1:STATUS, timestamp=1664355076110, value=已付款                                                                                                                                                 
 0e01edba-5e55-425e-837a-7efb91c56630                               column=C1:USER_ID, timestamp=1664355076878, value=3919700                                                                                                                                            
3 row(s)
Took 0.0080 seconds

3)、查询订单状态、支付方式

#只查询订单状态以及支付方式,并且只展示3条数据
 scan 'orderinfo', {LIMIT => 3, COLUMNS => ['C1:STATUS', 'C1:PAYWAY'], FORMATTER => 'toString'}
 
 hbase(main):053:0> scan 'orderinfo', {LIMIT => 3, COLUMNS => ['C1:STATUS', 'C1:PAYWAY'], FORMATTER => 'toString'}
ROW                                                                 COLUMN+CELL                                                                                                                                                                                          
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:PAYWAY, timestamp=1664355076673, value=1                                                                                                                                                   
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:STATUS, timestamp=1664355076051, value=已提交                                                                                                                                                 
 0968a418-f2bc-49b4-b9a9-2157cf214cfd                               column=C1:PAYWAY, timestamp=1664355076675, value=1                                                                                                                                                   
 0968a418-f2bc-49b4-b9a9-2157cf214cfd                               column=C1:STATUS, timestamp=1664355076103, value=已完成                                                                                                                                                 
 0e01edba-5e55-425e-837a-7efb91c56630                               column=C1:PAYWAY, timestamp=1664355076680, value=3                                                                                                                                                   
 0e01edba-5e55-425e-837a-7efb91c56630                               column=C1:STATUS, timestamp=1664355076110, value=已付款                                                                                                                                                 
3 row(s)

注意:

[‘C1:STATUS’, …]在Ruby中[]表示一个数组

4)、查询指定订单ID的数据并以中文展示

根据ROWKEY来查询对应的数据,ROWKEY为02602f66-adc7-40d4-8485-76b5632b5b53,只查询订单状态、支付方式,并以中文展示。

要查询指定ROWKEY的数据,需要使用ROWPREFIXFILTER 语法为: scan '表名', {ROWPREFIXFILTER => 'rowkey'}

scan 'orderinfo', {ROWPREFIXFILTER => '02602f66-adc7-40d4-8485-76b5632b5b53', COLUMNS => ['C1:STATUS', 'C1:PAYWAY'], FORMATTER => 'toString'}
 
 
 hbase(main):054:0> scan 'orderinfo', {ROWPREFIXFILTER => '02602f66-adc7-40d4-8485-76b5632b5b53', COLUMNS => ['C1:STATUS', 'C1:PAYWAY'], FORMATTER => 'toString'}
ROW                                                                 COLUMN+CELL                                                                                                                                                                                          
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:PAYWAY, timestamp=1664355076673, value=1                                                                                                                                                   
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:STATUS, timestamp=1664355076051, value=已提交                                                                                                                                                 
1 row(s)

8、过滤器

在HBase中,如果要对海量的数据来进行查询,需要借助HBase中的高级语法——Filter来进行查询。Filter可以根据列簇、列、版本等条件来对数据进行过滤查询。因为在HBase中,主键、列、版本都是有序存储的,所以借助Filter可以高效地完成查询。当执行Filter时,HBase会将Filter分发给各个HBase服务器节点来进行查询。

HBase中的过滤器也是基于Java开发的,只不过在Shell中,我们是使用基于JRuby的语法来实现的交互式查询。以下是HBase 2.2的JAVA API文档。 http://hbase.apache.org/2.2/devapidocs/index.html

1)、HBase中的过滤器

在HBase的shell中,通过show_filters指令,可以查看到HBase中内置的一些过滤器

hbase(main):055:0> show_filters
DependentColumnFilter                                                                                                                                                                                                                                                    
KeyOnlyFilter                                                                                                                                                                                                                                                            
ColumnCountGetFilter                                                                                                                                                                                                                                                     
SingleColumnValueFilter                                                                                                                                                                                                                                                  
PrefixFilter                                                                                                                                                                                                                                                             
SingleColumnValueExcludeFilter                                                                                                                                                                                                                                           
FirstKeyOnlyFilter                                                                                                                                                                                                                                                       
ColumnRangeFilter                                                                                                                                                                                                                                                        
ColumnValueFilter                                                                                                                                                                                                                                                        
TimestampsFilter                                                                                                                                                                                                                                                         
FamilyFilter                                                                                                                                                                                                                                                             
QualifierFilter                                                                                                                                                                                                                                                          
ColumnPrefixFilter                                                                                                                                                                                                                                                       
RowFilter                                                                                                                                                                                                                                                                
MultipleColumnPrefixFilter                                                                                                                                                                                                                                               
InclusiveStopFilter                                                                                                                                                                                                                                                      
PageFilter                                                                                                                                                                                                                                                               
ValueFilter                                                                                                                                                                                                                                                              
ColumnPaginationFilter                                                                                                                                                                                                                                                   
Took 0.0041 seconds                                                                                                                                                                                                                                                      
=> #<Java::JavaUtil::HashMap::KeySet:0x2358443e>

在这里插入图片描述 Java API官方地址:https://hbase.apache.org/devapidocs/index.html

2)、过滤器的用法

过滤器一般结合scan命令来使用。 scan '表名', { Filter => "过滤器(比较运算符, '比较器表达式')” }

  • 比较运算符 在这里插入图片描述
  • 比较器 在这里插入图片描述
  • 比较器表达式 基本语法:比较器类型:比较器的值 在这里插入图片描述

3)、示例1:使用RowFilter查询指定订单ID的数据

查询订单的ID为:02602f66-adc7-40d4-8485-76b5632b5b53、订单状态以及支付方式

scan 'orderinfo', {FILTER => "RowFilter(=,'binary:02602f66-adc7-40d4-8485-76b5632b5b53')"}


hbase(main):056:0> scan 'orderinfo', {FILTER => "RowFilter(=,'binary:02602f66-adc7-40d4-8485-76b5632b5b53')"}
ROW                                                                 COLUMN+CELL                                                                                                                                                                                          
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:CATEGORY, timestamp=1664355077238, value=\xE6\x89\x8B\xE6\x9C\xBA;                                                                                                                         
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:OPERATION_DATE, timestamp=1664355077059, value=2020-04-25 12:09:16                                                                                                                         
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:PAYWAY, timestamp=1664355076673, value=1                                                                                                                                                   
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:PAY_MONEY, timestamp=1664355076448, value=4070                                                                                                                                             
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:STATUS, timestamp=1664355076051, value=\xE5\xB7\xB2\xE6\x8F\x90\xE4\xBA\xA4                                                                                                                
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:USER_ID, timestamp=1664355076873, value=4944191                                                                                                                                            
1 row(s)
Took 0.0476 seconds

4)、查询状态为已付款的订单

scan 'orderinfo', {FILTER => "SingleColumnValueFilter('C1', 'STATUS', =, 'binary:已付款')", FORMATTER => 'toString'}
#取几行
scan 'orderinfo', {LIMIT => 3,FILTER => "SingleColumnValueFilter('C1', 'STATUS', =, 'binary:已付款')", FORMATTER => 'toString'}


hbase(main):057:0> scan 'orderinfo', {LIMIT => 3,FILTER => "SingleColumnValueFilter('C1', 'STATUS', =, 'binary:已付款')", FORMATTER => 'toString'}
ROW                                                                 COLUMN+CELL                                                                                                                                                                                          
 0e01edba-5e55-425e-837a-7efb91c56630                               column=C1:CATEGORY, timestamp=1664355077244, value=男装;男鞋;                                                                                                                                            
 0e01edba-5e55-425e-837a-7efb91c56630                               column=C1:OPERATION_DATE, timestamp=1664355077065, value=2020-04-25 12:09:44                                                                                                                         
 0e01edba-5e55-425e-837a-7efb91c56630                               column=C1:PAYWAY, timestamp=1664355076680, value=3                                                                                                                                                   
 0e01edba-5e55-425e-837a-7efb91c56630                               column=C1:PAY_MONEY, timestamp=1664355076455, value=6370                                                                                                                                             
 0e01edba-5e55-425e-837a-7efb91c56630                               column=C1:STATUS, timestamp=1664355076110, value=已付款
 。。。
 
 hbase(main):058:0> scan 'orderinfo', {LIMIT => 3,FILTER => "SingleColumnValueFilter('C1', 'STATUS', =, 'binary:已付款')", FORMATTER => 'toString'}
ROW                                                                 COLUMN+CELL                                                                                                                                                                                          
 0e01edba-5e55-425e-837a-7efb91c56630                               column=C1:CATEGORY, timestamp=1664355077244, value=男装;男鞋;                                                                                                                                            
 0e01edba-5e55-425e-837a-7efb91c56630                               column=C1:OPERATION_DATE, timestamp=1664355077065, value=2020-04-25 12:09:44                                                                                                                         
 0e01edba-5e55-425e-837a-7efb91c56630                               column=C1:PAYWAY, timestamp=1664355076680, value=3                                                                                                                                                   
 0e01edba-5e55-425e-837a-7efb91c56630                               column=C1:PAY_MONEY, timestamp=1664355076455, value=6370                                                                                                                                             
 0e01edba-5e55-425e-837a-7efb91c56630                               column=C1:STATUS, timestamp=1664355076110, value=已付款                                                                                                                                                 
 0e01edba-5e55-425e-837a-7efb91c56630                               column=C1:USER_ID, timestamp=1664355076878, value=3919700                                                                                                                                            
 0f46d542-34cb-4ef4-b7fe-6dcfa5f14751                               column=C1:CATEGORY, timestamp=1664355077247, value=维修;手机;                                                                                                                                            
 0f46d542-34cb-4ef4-b7fe-6dcfa5f14751                               column=C1:OPERATION_DATE, timestamp=1664355077068, value=2020-04-25 12:09:46                                                                                                                         
 0f46d542-34cb-4ef4-b7fe-6dcfa5f14751                               column=C1:PAYWAY, timestamp=1664355076685, value=1                                                                                                                                                   
 0f46d542-34cb-4ef4-b7fe-6dcfa5f14751                               column=C1:PAY_MONEY, timestamp=1664355076459, value=9380                                                                                                                                             
 0f46d542-34cb-4ef4-b7fe-6dcfa5f14751                               column=C1:STATUS, timestamp=1664355076117, value=已付款                                                                                                                                                 
 0f46d542-34cb-4ef4-b7fe-6dcfa5f14751                               column=C1:USER_ID, timestamp=1664355076882, value=2993700                                                                                                                                            
 23275016-996b-420c-8edc-3e3b41de1aee                               column=C1:CATEGORY, timestamp=1664355077251, value=男鞋;汽车;                                                                                                                                            
 23275016-996b-420c-8edc-3e3b41de1aee                               column=C1:OPERATION_DATE, timestamp=1664355077072, value=2020-04-25 12:09:53                                                                                                                         
 23275016-996b-420c-8edc-3e3b41de1aee                               column=C1:PAYWAY, timestamp=1664355076689, value=1                                                                                                                                                   
 23275016-996b-420c-8edc-3e3b41de1aee                               column=C1:PAY_MONEY, timestamp=1664355076463, value=280                                                                                                                                              
 23275016-996b-420c-8edc-3e3b41de1aee                               column=C1:STATUS, timestamp=1664355076124, value=已付款                                                                                                                                                 
 23275016-996b-420c-8edc-3e3b41de1aee                               column=C1:USER_ID, timestamp=1664355076886, value=3018827                                                                                                                                            
3 row(s)
Took 0.0077 seconds

5)、查询支付方式为1,且金额大于3000的订单

#1. 查询支付方式为1
SingleColumnValueFilter('C1', 'PAYWAY', = , 'binary:1')
#2. 查询金额大于3000的订单
SingleColumnValueFilter('C1', 'PAY_MONEY', > , 'binary:3000')
#3. 组合查询
scan 'orderinfo', {FILTER => "SingleColumnValueFilter('C1', 'PAYWAY', =, 'binary:1') AND SingleColumnValueFilter('C1', 'PAY_MONEY', > , 'binary:3000')", FORMATTER => 'toString'}

#注意:
#HBase shell中比较默认都是字符串比较,所以如果是比较数值类型的,会出现不准确的情况
#例如:在字符串比较中2000是比100000大的

hbase(main):060:0> scan 'orderinfo', {FILTER => "SingleColumnValueFilter('C1', 'PAYWAY', =, 'binary:1') AND SingleColumnValueFilter('C1', 'PAY_MONEY', >=, 'binary:3000')", FORMATTER => 'toString'}
ROW                                                                 COLUMN+CELL                                                                                                                                                                                          
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:CATEGORY, timestamp=1664355077238, value=手机;                                                                                                                                               
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:OPERATION_DATE, timestamp=1664355077059, value=2020-04-25 12:09:16                                                                                                                         
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:PAYWAY, timestamp=1664355076673, value=1                                                                                                                                                   
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:PAY_MONEY, timestamp=1664355076448, value=4070                                                                                                                                             
 02602f66-adc7-40d4-8485-76b5632b5b53                               column=C1:STATUS, timestamp=1664355076051, value=已提交
 。。。

9、 INCR

incr可以实现对某个单元格的值进行原子性计数。 语法如下:incr '表名','rowkey','列簇:列名',累加值(默认累加1)

如果某一列要实现计数功能,必须要使用incr来创建对应的列 使用put创建的列是不能实现累加的 示例

# 1.获取0000000020这条数据记录数
hbase(main):063:0> get_counter 'VISIT_CNT','0000000020','C1:CNT'
COUNTER VALUE = 6
Took 0.0162 seconds   
#如果用get获取到的数据是这样的                                                                                                                                                                                                                                               
hbase(main):064:0> get 'VISIT_CNT','0000000020','C1:CNT'
COLUMN                                                              CELL                                                                                                                                                                                                 
 C1:CNT                                                             timestamp=1664414360079, value=\x00\x00\x00\x00\x00\x00\x00\x06                                                                                                                                      
1 row(s)
Took 0.0065 seconds  
    
# 2.使用incr进行累加                                                                                                                                                                                                                                                
hbase(main):065:0> incr 'VISIT_CNT','0000000020','C1:CNT'
COUNTER VALUE = 7
Took 0.0089 seconds   

# 3.再次查询该条数据的记录数                                                                                                                                                                                                                                                   
hbase(main):066:0> get_counter 'VISIT_CNT','0000000020','C1:CNT'
COUNTER VALUE = 7
Took 0.0026 seconds

三、shell管理操作

1、status

#显示服务器状态
hbase(main):067:0> status
1 active master, 1 backup masters, 3 servers, 0 dead, 1.3333 average load

2、whoami

#显示HBase当前用户
hbase(main):068:0> whoami
alanchan (auth:SIMPLE)
    groups: root

3、list

#显示当前所有的表
hbase(main):069:0> list
TABLE                                                                                                                                                                                                                                                                    
NEWS_VISIT_CNT                                                                                                                                                                                                                                                           
orderinfo                                                                                                                                                                                                                                                                
2 row(s)
Took 0.0079 seconds                                                                                                                                                                                                                                                      
=> ["NEWS_VISIT_CNT", "orderinfo"]

4、count

#统计指定表的记录数
hbase(main):070:0> count 'orderinfo'
66 row(s)
Took 0.0177 seconds                                                                                                                                                                                                                                                      
=> 66

5、describe

#展示表结构信息
hbase(main):071:0> describe 'orderinfo'
Table orderinfo is ENABLED                                                                                                                                                                                                                                               
orderinfo                                                                                                                                                                                                                                                                
COLUMN FAMILIES DESCRIPTION                                                                                                                                                                                                                                              
{NAME => 'C1', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFI
LTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                                           
1 row(s)
Took 0.0191 seconds

6、exists

#检查表是否存在,适用于表量特别多的情况
hbase(main):072:0> exists 'orderinfo'
Table orderinfo does exist                                                                                                                                                                                                                                               
Took 0.0024 seconds                                                                                                                                                                                                                                                      
=> true
hbase(main):073:0> exists 'orderinfo_test'
Table orderinfo_test does not exist                                                                                                                                                                                                                                      
Took 0.0020 seconds                                                                                                                                                                                                                                                      
=> false

7、is_enabled、is_disabled

#检查表是否启用或禁用
hbase(main):074:0> is_enabled 'orderinfo'
true                                                                                                                                                                                                                                                                     
Took 0.0041 seconds                                                                                                                                                                                                                                                      
=> true
hbase(main):075:0> is_disabled 'orderinfo'
false                                                                                                                                                                                                                                                                    
Took 0.0032 seconds                                                                                                                                                                                                                                                      
=> 1
#orderinfo_t 表不存在
hbase(main):076:0> is_disabled 'orderinfo_t'

ERROR: Unknown table orderinfo_t!

For usage try 'help "is_disabled"'

Took 0.0058 seconds

8、alter

该命令可以改变表和列簇的模式

# 创建一个user表,两个列簇C1、C2
hbase(main):077:0> create 'user','C1','C2'
Created table user
Took 2.4103 seconds                                                                                                                                                                                                                                                      
=> Hbase::Table - user

# 查看表结构,有2个列簇C1、C2                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
hbase(main):079:0> describe 'user'
Table user is ENABLED                                                                                                                                                                                                                                                    
user                                                                                                                                                                                                                                                                     
COLUMN FAMILIES DESCRIPTION                                                                                                                                                                                                                                              
{NAME => 'C1', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFI
LTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                                           
{NAME => 'C2', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFI
LTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                                           
2 row(s)
Took 0.0163 seconds  
                                                                                                                                                                                                                                                    
 # 新增列簇C4                                                                                                                                                                                                                                                    
hbase(main):081:0> alter 'user', 'C4'
Updating all regions with the new schema...
1/1 regions updated.
Done.
Took 2.9800 seconds  
  
 # 查看表结构,有3个列簇C1、C2、C4                                                                                                                                                                                                                                                     
hbase(main):082:0> describe 'user'
Table user is ENABLED                                                                                                                                                                                                                                                    
user                                                                                                                                                                                                                                                                     
COLUMN FAMILIES DESCRIPTION                                                                                                                                                                                                                                              
{NAME => 'C1', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFI
LTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                                           
{NAME => 'C2', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFI
LTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                                           
{NAME => 'C4', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFI
LTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                                           
3 row(s)
Took 0.0145 seconds

# 删除列簇C4
# 也可以写成:alter 'user', 'delete' => 'C2'                                                                                                                                                                                                                                                      
hbase(main):083:0> alter 'user' ,{'delete'=>'C2'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
Took 2.8873 seconds  

#查看表结构,有2个列簇C1、C4                                                                                                                                                                                                                                                      
hbase(main):084:0> describe 'user'
Table user is ENABLED                                                                                                                                                                                                                                                    
user                                                                                                                                                                                                                                                                     
COLUMN FAMILIES DESCRIPTION                                                                                                                                                                                                                                              
{NAME => 'C1', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFI
LTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                                           
{NAME => 'C4', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFI
LTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}                                                           
2 row(s)
Took 0.0157 seconds                                                                                                                                                                                                                                                      
hbase(main):085:0> 
# 注意:'delete' => 'C2',还是一个Map结构,只不过只有一个key,可以省略两边的{}

9、disable/enable

禁用一张表/启用一张表

10、drop

删除一张表,记得在删除表之前必须先禁用

11、truncate

清空表的数据,禁用表-删除表-创建表

hbase(main):085:0> disable 'user'
Took 1.4119 seconds                                                                                                                                                                                                                                                      
hbase(main):086:0> drop 'user'
Took 0.7600 seconds                                                                                                                                                                                                                                                      
hbase(main):088:0> exists 'user'
Table user does not exist                                                                                                                                                                                                                                                
Took 0.0021 seconds                                                                                                                                                                                                                                                      
=> false

以上,大致上介绍了hbase的shell使用。