一、创建表格

1、使用命令行来连接正在运行的Hbase实例,命令:

hbase shell

2、在使用过滤器之前先创建这样的表结构:

HBASEshell前缀过滤器 hbase shell过滤器_HBASEshell前缀过滤器

3、具体执行命令如下:

创建表:

create 'student','stuInfo','grades'

插入第一个逻辑行的数据:

put 'student', '001', 'stuInfo:name','alice'
put 'student', '001', 'stuInfo:age','18'
put 'student', '001', 'stuInfo:sex','female'
put 'student', '001', 'grades:english','80'
put 'student', '001', 'grades:math','90'

同样插入其他两行数据。

结果:

hbase(main):028:0> scan 'student'
ROW                   COLUMN+CELL                                               
 001                  column=grades:english, timestamp=1586248371684, value=80  
 001                  column=grades:math, timestamp=1586248399580, value=90     
 001                  column=stuinfo:age, timestamp=1586248291518, value=18     
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 001                  column=stuinfo:sex, timestamp=1586248315971, value=female 
 002                  column=grades:bigdata, timestamp=1586248539502, value=88  
 002                  column=grades:english, timestamp=1586248508242, value=85  
 002                  column=grades:math, timestamp=1586248524101, value=78     
 002                  column=stuinfo:class, timestamp=1586248476375, value=1802 
 002                  column=stuinfo:name, timestamp=1586248440973, value=nancy 
 002                  column=stuinfo:sex, timestamp=1586248462931, value=male   
 003                  column=grades:english, timestamp=1586248639980, value=90  
 003                  column=grades:math, timestamp=1586248651102, value=80     
 003                  column=stuinfo:age, timestamp=1586248586426, value=19     
 003                  column=stuinfo:class, timestamp=1586248611878, value=1803 
 003                  column=stuinfo:name, timestamp=1586248574358, value=harry 
 003                  column=stuinfo:sex, timestamp=1586248601271, value=male   
3 row(s) in 0.1320 seconds

二、过滤器操作

1.行键过滤器

包括RowFilter、PrefixFilter、KeyOnlyFilter、FirstKeyOnlyFilter等

格式:scan ‘表名’,{Filter =>“过滤器( 比较运算符,’比较器’)”}

(1)RowFilter:针对行键进行过滤

例1:显示行键前缀为0开头的键值对;

scan 'student',{FILTER=>"RowFilter(=,'substring:001')"}

结果如下:

 

hbase(main):031:0> scan 'student',{FILTER=>"RowFilter(=,'substring:001')"}
ROW                   COLUMN+CELL                                               
 001                  column=grades:english, timestamp=1586248371684, value=80  
 001                  column=grades:math, timestamp=1586248399580, value=90     
 001                  column=stuinfo:age, timestamp=1586248291518, value=18     
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 001                  column=stuinfo:sex, timestamp=1586248315971, value=female 
1 row(s) in 0.2700 seconds

例2:显示行键字节顺序大于002的键值对;

scan 'student',FILTER=>"RowFilter(>,'binary:002')"

结果;

hbase(main):032:0> scan 'student',{FILTER=>"RowFilter(>,'binary:001')"}
ROW                   COLUMN+CELL                                               
 002                  column=grades:bigdata, timestamp=1586248539502, value=88  
 002                  column=grades:english, timestamp=1586248508242, value=85  
 002                  column=grades:math, timestamp=1586248524101, value=78     
 002                  column=stuinfo:class, timestamp=1586248476375, value=1802 
 002                  column=stuinfo:name, timestamp=1586248440973, value=nancy 
 002                  column=stuinfo:sex, timestamp=1586248462931, value=male   
 003                  column=grades:english, timestamp=1586248639980, value=90  
 003                  column=grades:math, timestamp=1586248651102, value=80     
 003                  column=stuinfo:age, timestamp=1586248586426, value=19     
 003                  column=stuinfo:class, timestamp=1586248611878, value=1803 
 003                  column=stuinfo:name, timestamp=1586248574358, value=harry 
 003                  column=stuinfo:sex, timestamp=1586248601271, value=male   
2 row(s) in 0.2130 seconds

(2)PrefixFilter:行键前缀过滤器

例3:扫描前缀为001的行键

scan 'student',FILTER=>"PrefixFilter('001')"

结果;

hbase(main):033:0> scan 'student',FILTER=>"PrefixFilter('001')"
ROW                   COLUMN+CELL                                               
 001                  column=grades:english, timestamp=1586248371684, value=80  
 001                  column=grades:math, timestamp=1586248399580, value=90     
 001                  column=stuinfo:age, timestamp=1586248291518, value=18     
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 001                  column=stuinfo:sex, timestamp=1586248315971, value=female 
1 row(s) in 0.1300 seconds

(3)FirstKeyOnlyFilter:扫描全表,显示每个逻辑行的第一个键值对

例4: 

scan 'student',FILTER=>"FirstKeyOnlyFilter()"

结果;

hbase(main):034:0> scan 'student',FILTER=>"FirstKeyOnlyFilter()"
ROW                   COLUMN+CELL                                               
 001                  column=grades:english, timestamp=1586248371684, value=80  
 002                  column=grades:bigdata, timestamp=1586248539502, value=88  
 003                  column=grades:english, timestamp=1586248639980, value=90  
3 row(s) in 0.0780 seconds

(4)InclusiveStopFilter:替代ENDROW返回终止条件行;

例5:扫描显示行键001到002范围内的键值对

scan 'student', {STARTROW =>'001',FILTER =>"InclusiveStopFilter('002')"}

结果;

hbase(main):037:0> scan 'student',{STARTROW=>'001',FILTER=>"InclusiveStopFilter('002')"}
ROW                   COLUMN+CELL                                               
 001                  column=grades:english, timestamp=1586248371684, value=80  
 001                  column=grades:math, timestamp=1586248399580, value=90     
 001                  column=stuinfo:age, timestamp=1586248291518, value=18     
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 001                  column=stuinfo:sex, timestamp=1586248315971, value=female 
 002                  column=grades:bigdata, timestamp=1586248539502, value=88  
 002                  column=grades:english, timestamp=1586248508242, value=85  
 002                  column=grades:math, timestamp=1586248524101, value=78     
 002                  column=stuinfo:class, timestamp=1586248476375, value=1802 
 002                  column=stuinfo:name, timestamp=1586248440973, value=nancy 
 002                  column=stuinfo:sex, timestamp=1586248462931, value=male   
2 row(s) in 0.0500 seconds

此条命令等同于:

scan 'student', {STARTROW =>'001',ENDROW => '003'}

 结果;

hbase(main):038:0> scan 'student',{STARTROW=>'001',ENDROW=>'003'}
ROW                   COLUMN+CELL                                               
 001                  column=grades:english, timestamp=1586248371684, value=80  
 001                  column=grades:math, timestamp=1586248399580, value=90     
 001                  column=stuinfo:age, timestamp=1586248291518, value=18     
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 001                  column=stuinfo:sex, timestamp=1586248315971, value=female 
 002                  column=grades:bigdata, timestamp=1586248539502, value=88  
 002                  column=grades:english, timestamp=1586248508242, value=85  
 002                  column=grades:math, timestamp=1586248524101, value=78     
 002                  column=stuinfo:class, timestamp=1586248476375, value=1802 
 002                  column=stuinfo:name, timestamp=1586248440973, value=nancy 
 002                  column=stuinfo:sex, timestamp=1586248462931, value=male   
2 row(s) in 0.0540 seconds

(5)KeyOnlyFilter   ,只对单元格的键过滤和显示,不显示值

scan 'student',FILTER=>"KeyOnlyFilter()"

结果:

hbase(main):035:0> scan 'student',FILTER=>"KeyOnlyFilter()"
ROW                   COLUMN+CELL                                               
 001                  column=grades:english, timestamp=1586248371684, value=    
 001                  column=grades:math, timestamp=1586248399580, value=       
 001                  column=stuinfo:age, timestamp=1586248291518, value=       
 001                  column=stuinfo:name, timestamp=1586248245854, value=      
 001                  column=stuinfo:sex, timestamp=1586248315971, value=       
 002                  column=grades:bigdata, timestamp=1586248539502, value=    
 002                  column=grades:english, timestamp=1586248508242, value=    
 002                  column=grades:math, timestamp=1586248524101, value=       
 002                  column=stuinfo:class, timestamp=1586248476375, value=     
 002                  column=stuinfo:name, timestamp=1586248440973, value=      
 002                  column=stuinfo:sex, timestamp=1586248462931, value=       
 003                  column=grades:english, timestamp=1586248639980, value=    
 003                  column=grades:math, timestamp=1586248651102, value=       
 003                  column=stuinfo:age, timestamp=1586248586426, value=       
 003                  column=stuinfo:class, timestamp=1586248611878, value=     
 003                  column=stuinfo:name, timestamp=1586248574358, value=      
 003                  column=stuinfo:sex, timestamp=1586248601271, value=       
3 row(s) in 0.1940 seconds

2.列族与列过滤器

(1)FamilyFilter:针对列族进行比较和过滤。

例1:显示列族前缀为stu开头的键值对;

scan 'student',FILTER=>"FamilyFilter(=,'substring:stu’)”
scan 'student',FILTER=>"FamilyFilter(=,‘binary:stu’)”

结果;

hbase(main):042:0* scan 'student',{FILTER=>"FamilyFilter(=,'binary:stu')"}
ROW                           COLUMN+CELL                                                                          
0 row(s) in 0.0580 seconds

(2)QualifierFilter:列标识过滤器。

例2:显示列名为name的记录;

scan 'student',FILTER=>"QualifierFilter(=,'substring:name')"

结果;

hbase(main):001:0> scan 'student',{FILTER=>"QualifierFilter(=,'substring:name')"}
ROW                   COLUMN+CELL                                               
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 002                  column=stuinfo:name, timestamp=1586248440973, value=nancy 
 003                  column=stuinfo:name, timestamp=1586248574358, value=harry 
3 row(s) in 0.7890 seconds

(3)ColumnPrefixFilter:对列名前缀进行过滤。

例2:显示列名为name的记录;

scan 'student',FILTER=>"ColumnPrefixFilter('name’)”

结果;

hbase(main):002:0> scan 'student',FILTER=>"ColumnPrefixFilter('name')"
ROW                   COLUMN+CELL                                               
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 002                  column=stuinfo:name, timestamp=1586248440973, value=nancy 
 003                  column=stuinfo:name, timestamp=1586248574358, value=harry 
3 row(s) in 0.2490 seconds

等价于

scan 'student',FILTER=>"QualifierFilter(=,'substring:name')"

结果;

hbase(main):004:0> scan 'student',FILTER=>"QualifierFilter(=,'substring:name')"
ROW                   COLUMN+CELL                                               
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 002                  column=stuinfo:name, timestamp=1586248440973, value=nancy 
 003                  column=stuinfo:name, timestamp=1586248574358, value=harry 
3 row(s) in 0.1020 seconds

(4)MultipleColumnPrefixFilter:可以指定多个前缀

例3:显示列名为name和age的记录;

scan 'student',FILTER=>"MultipleColumnPrefixFilter('name','age')"

结果;

hbase(main):005:0> scan 'student',FILTER=>"MultipleColumnPrefixFilter('name','age')"
ROW                   COLUMN+CELL                                               
 001                  column=stuinfo:age, timestamp=1586248291518, value=18     
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 002                  column=stuinfo:name, timestamp=1586248440973, value=nancy 
 003                  column=stuinfo:age, timestamp=1586248586426, value=19     
 003                  column=stuinfo:name, timestamp=1586248574358, value=harry 
3 row(s) in 0.0870 seconds

(5)ColumnRangeFilter :设置范围按字典序对列名进行过滤;

scan 'student',FILTER=>"ColumnRangeFilter('bi',true,'na',true)"

结果;

hbase(main):001:0> scan 'student',FILTER=>"ColumnRangeFilter('bi',true,'na',true)"
ROW                           COLUMN+CELL                                                                          
 001                          column=grades:english, timestamp=1586248371684, value=80                             
 001                          column=grades:math, timestamp=1586248399580, value=90                                
 002                          column=grades:bigdata, timestamp=1586248539502, value=88                             
 002                          column=grades:english, timestamp=1586248508242, value=85                             
 002                          column=grades:math, timestamp=1586248524101, value=78                                
 002                          column=stuinfo:class, timestamp=1586248476375, value=1802                            
 003                          column=grades:english, timestamp=1586248639980, value=90                             
 003                          column=grades:math, timestamp=1586248651102, value=80                                
 003                          column=stuinfo:class, timestamp=1586248611878, value=1803                            
3 row(s) in 0.8940 seconds

3.值过滤器

(1)ValueFilter :值过滤器。

例1:查询值等于19的所有键值对

scan 'student',FILTER=>"ValueFilter(=,'binary:19') "
scan 'student',FILTER=>"ValueFilter(=,'substring:19')"

结果:

hbase(main):002:0> scan 'student',FILTER=>"ValueFilter(=,'binary:19')"
ROW                   COLUMN+CELL                                               
 003                  column=stuinfo:age, timestamp=1586248586426, value=19     
1 row(s) in 0.1480 seconds

hbase(main):003:0> scan 'student',FILTER=>"ValueFilter(=,'substring:19')"
ROW                   COLUMN+CELL                                               
 003                  column=stuinfo:age, timestamp=1586248586426, value=19     
1 row(s) in 0.0470 seconds

(2)SingleColumnValueFilter :在指定的列族和列中进行值过滤器。

例2:查询stuinfo列族age列中值等于19的所有键值对

scan 'student',{COLUMN=>'stuinfo:age',FILTER=>"SingleColumnValueFilter('stuinfo','age',=,'binary:19')"}

结果;

hbase(main):007:0> scan 'student',{COLUMN=>'stuinfo:age',FILTER=>"SingleColumnValueFilter('stuinfo','age',=,'binary:19')"}
ROW                   COLUMN+CELL                                               
 003                  column=stuinfo:age, timestamp=1586248586426, value=19     
1 row(s) in 0.1500 seconds

等同于;

scan 'student',FILTER=>"SingleColumnValueFilter('stuinfo','name',=,'binary:alice')"

结果;

hbase(main):005:0> scan 'student',FILTER=>"SingleColumnValueFilter('stuinfo','name',=,'binary:alice')"
ROW                   COLUMN+CELL                                               
 001                  column=grades:english, timestamp=1586248371684, value=80  
 001                  column=grades:math, timestamp=1586248399580, value=90     
 001                  column=stuinfo:age, timestamp=1586248291518, value=18     
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 001                  column=stuinfo:sex, timestamp=1586248315971, value=female 
1 row(s) in 0.0490 seconds

4.其他过滤器

(1)ColumnCountGetFilter :限制每个逻辑行返回的键值对数

例1:返回行键为001的前3个键值对

get 'student','001',FILTER=>"ColumnCountGetFilter(3)"

结果;

hbase(main):001:0> get 'student','002',FILTER=>"ColumnCountGetFilter(2)"
COLUMN                CELL                                                      
 grades:bigdata       timestamp=1586248539502, value=88                         
 grades:english       timestamp=1586248508242, value=85                         
1 row(s) in 0.7400 seconds

(2)PageFilter :基于行的分页过滤器,设置返回行数。

例2:显示一行

scan 'student',FILTER=>"PageFilter(1)"

结果;

hbase(main):002:0> scan 'student',FILTER=>"PageFilter(1)"
ROW                   COLUMN+CELL                                               
 001                  column=grades:english, timestamp=1586248371684, value=80  
 001                  column=grades:math, timestamp=1586248399580, value=90     
 001                  column=stuinfo:age, timestamp=1586248291518, value=18     
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 001                  column=stuinfo:sex, timestamp=1586248315971, value=female 
1 row(s) in 0.2050 seconds

(3)ColumnPaginationFilter :基于列的进行分页过滤器,需要设置偏移量与返回数量 。

例3:显示每行第1列之后的2个键值对

scan 'student',FILTER=>"ColumnPaginationFilter(2,1)"

结果;

hbase(main):004:0> scan 'student',FILTER=>"ColumnPaginationFilter(2,1)"
ROW                   COLUMN+CELL                                               
 001                  column=grades:math, timestamp=1586248399580, value=90     
 001                  column=stuinfo:age, timestamp=1586248291518, value=18     
 002                  column=grades:english, timestamp=1586248508242, value=85  
 002                  column=grades:math, timestamp=1586248524101, value=78     
 003                  column=grades:math, timestamp=1586248651102, value=80     
 003                  column=stuinfo:age, timestamp=1586248586426, value=19     
3 row(s) in 0.1980 seconds