HBASEshell前缀过滤器 hbase shell过滤器

转载

mob6454cc7c268c 2023-10-17 22:19:10

文章标签 HBASEshell前缀过滤器键值对数据命令行 文章分类 Hbase 数据库

一、创建表格

1、使用命令行来连接正在运行的Hbase实例，命令：

hbase shell

2、在使用过滤器之前先创建这样的表结构：

HBASEshell前缀过滤器 hbase shell过滤器_HBASEshell前缀过滤器

3、具体执行命令如下：

创建表：

create 'student','stuInfo','grades'

插入第一个逻辑行的数据：

put 'student', '001', 'stuInfo:name','alice'
put 'student', '001', 'stuInfo:age','18'
put 'student', '001', 'stuInfo:sex','female'
put 'student', '001', 'grades:english','80'
put 'student', '001', 'grades:math','90'

同样插入其他两行数据。

结果：

hbase(main):028:0> scan 'student'
ROW                   COLUMN+CELL                                               
 001                  column=grades:english, timestamp=1586248371684, value=80  
 001                  column=grades:math, timestamp=1586248399580, value=90     
 001                  column=stuinfo:age, timestamp=1586248291518, value=18     
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 001                  column=stuinfo:sex, timestamp=1586248315971, value=female 
 002                  column=grades:bigdata, timestamp=1586248539502, value=88  
 002                  column=grades:english, timestamp=1586248508242, value=85  
 002                  column=grades:math, timestamp=1586248524101, value=78     
 002                  column=stuinfo:class, timestamp=1586248476375, value=1802 
 002                  column=stuinfo:name, timestamp=1586248440973, value=nancy 
 002                  column=stuinfo:sex, timestamp=1586248462931, value=male   
 003                  column=grades:english, timestamp=1586248639980, value=90  
 003                  column=grades:math, timestamp=1586248651102, value=80     
 003                  column=stuinfo:age, timestamp=1586248586426, value=19     
 003                  column=stuinfo:class, timestamp=1586248611878, value=1803 
 003                  column=stuinfo:name, timestamp=1586248574358, value=harry 
 003                  column=stuinfo:sex, timestamp=1586248601271, value=male   
3 row(s) in 0.1320 seconds

二、过滤器操作

1．行键过滤器

包括RowFilter、PrefixFilter、KeyOnlyFilter、FirstKeyOnlyFilter等

格式：scan ‘表名’，{Filter =>“过滤器( 比较运算符，’比较器’)”}

（1）RowFilter：针对行键进行过滤

例1：显示行键前缀为0开头的键值对；

scan 'student',{FILTER=>"RowFilter(=,'substring:001')"}

结果如下:

hbase(main):031:0> scan 'student',{FILTER=>"RowFilter(=,'substring:001')"}
ROW                   COLUMN+CELL                                               
 001                  column=grades:english, timestamp=1586248371684, value=80  
 001                  column=grades:math, timestamp=1586248399580, value=90     
 001                  column=stuinfo:age, timestamp=1586248291518, value=18     
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 001                  column=stuinfo:sex, timestamp=1586248315971, value=female 
1 row(s) in 0.2700 seconds

例2：显示行键字节顺序大于002的键值对；

scan 'student',FILTER=>"RowFilter(>,'binary:002')"

结果;

hbase(main):032:0> scan 'student',{FILTER=>"RowFilter(>,'binary:001')"}
ROW                   COLUMN+CELL                                               
 002                  column=grades:bigdata, timestamp=1586248539502, value=88  
 002                  column=grades:english, timestamp=1586248508242, value=85  
 002                  column=grades:math, timestamp=1586248524101, value=78     
 002                  column=stuinfo:class, timestamp=1586248476375, value=1802 
 002                  column=stuinfo:name, timestamp=1586248440973, value=nancy 
 002                  column=stuinfo:sex, timestamp=1586248462931, value=male   
 003                  column=grades:english, timestamp=1586248639980, value=90  
 003                  column=grades:math, timestamp=1586248651102, value=80     
 003                  column=stuinfo:age, timestamp=1586248586426, value=19     
 003                  column=stuinfo:class, timestamp=1586248611878, value=1803 
 003                  column=stuinfo:name, timestamp=1586248574358, value=harry 
 003                  column=stuinfo:sex, timestamp=1586248601271, value=male   
2 row(s) in 0.2130 seconds

（2）PrefixFilter：行键前缀过滤器

例3：扫描前缀为001的行键

scan 'student',FILTER=>"PrefixFilter('001')"

结果;

hbase(main):033:0> scan 'student',FILTER=>"PrefixFilter('001')"
ROW                   COLUMN+CELL                                               
 001                  column=grades:english, timestamp=1586248371684, value=80  
 001                  column=grades:math, timestamp=1586248399580, value=90     
 001                  column=stuinfo:age, timestamp=1586248291518, value=18     
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 001                  column=stuinfo:sex, timestamp=1586248315971, value=female 
1 row(s) in 0.1300 seconds

（3）FirstKeyOnlyFilter：扫描全表，显示每个逻辑行的第一个键值对

例4：

scan 'student',FILTER=>"FirstKeyOnlyFilter()"

结果;

hbase(main):034:0> scan 'student',FILTER=>"FirstKeyOnlyFilter()"
ROW                   COLUMN+CELL                                               
 001                  column=grades:english, timestamp=1586248371684, value=80  
 002                  column=grades:bigdata, timestamp=1586248539502, value=88  
 003                  column=grades:english, timestamp=1586248639980, value=90  
3 row(s) in 0.0780 seconds

（4）InclusiveStopFilter：替代ENDROW返回终止条件行；

例5：扫描显示行键001到002范围内的键值对

scan 'student', {STARTROW =>'001',FILTER =>"InclusiveStopFilter('002')"}

结果;

hbase(main):037:0> scan 'student',{STARTROW=>'001',FILTER=>"InclusiveStopFilter('002')"}
ROW                   COLUMN+CELL                                               
 001                  column=grades:english, timestamp=1586248371684, value=80  
 001                  column=grades:math, timestamp=1586248399580, value=90     
 001                  column=stuinfo:age, timestamp=1586248291518, value=18     
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 001                  column=stuinfo:sex, timestamp=1586248315971, value=female 
 002                  column=grades:bigdata, timestamp=1586248539502, value=88  
 002                  column=grades:english, timestamp=1586248508242, value=85  
 002                  column=grades:math, timestamp=1586248524101, value=78     
 002                  column=stuinfo:class, timestamp=1586248476375, value=1802 
 002                  column=stuinfo:name, timestamp=1586248440973, value=nancy 
 002                  column=stuinfo:sex, timestamp=1586248462931, value=male   
2 row(s) in 0.0500 seconds

此条命令等同于：

scan 'student', {STARTROW =>'001',ENDROW => '003'}

结果;

hbase(main):038:0> scan 'student',{STARTROW=>'001',ENDROW=>'003'}
ROW                   COLUMN+CELL                                               
 001                  column=grades:english, timestamp=1586248371684, value=80  
 001                  column=grades:math, timestamp=1586248399580, value=90     
 001                  column=stuinfo:age, timestamp=1586248291518, value=18     
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 001                  column=stuinfo:sex, timestamp=1586248315971, value=female 
 002                  column=grades:bigdata, timestamp=1586248539502, value=88  
 002                  column=grades:english, timestamp=1586248508242, value=85  
 002                  column=grades:math, timestamp=1586248524101, value=78     
 002                  column=stuinfo:class, timestamp=1586248476375, value=1802 
 002                  column=stuinfo:name, timestamp=1586248440973, value=nancy 
 002                  column=stuinfo:sex, timestamp=1586248462931, value=male   
2 row(s) in 0.0540 seconds

(5)KeyOnlyFilter ,只对单元格的键过滤和显示，不显示值

scan 'student',FILTER=>"KeyOnlyFilter()"

结果：

hbase(main):035:0> scan 'student',FILTER=>"KeyOnlyFilter()"
ROW                   COLUMN+CELL                                               
 001                  column=grades:english, timestamp=1586248371684, value=    
 001                  column=grades:math, timestamp=1586248399580, value=       
 001                  column=stuinfo:age, timestamp=1586248291518, value=       
 001                  column=stuinfo:name, timestamp=1586248245854, value=      
 001                  column=stuinfo:sex, timestamp=1586248315971, value=       
 002                  column=grades:bigdata, timestamp=1586248539502, value=    
 002                  column=grades:english, timestamp=1586248508242, value=    
 002                  column=grades:math, timestamp=1586248524101, value=       
 002                  column=stuinfo:class, timestamp=1586248476375, value=     
 002                  column=stuinfo:name, timestamp=1586248440973, value=      
 002                  column=stuinfo:sex, timestamp=1586248462931, value=       
 003                  column=grades:english, timestamp=1586248639980, value=    
 003                  column=grades:math, timestamp=1586248651102, value=       
 003                  column=stuinfo:age, timestamp=1586248586426, value=       
 003                  column=stuinfo:class, timestamp=1586248611878, value=     
 003                  column=stuinfo:name, timestamp=1586248574358, value=      
 003                  column=stuinfo:sex, timestamp=1586248601271, value=       
3 row(s) in 0.1940 seconds

2．列族与列过滤器

（1）FamilyFilter：针对列族进行比较和过滤。

例1：显示列族前缀为stu开头的键值对；

scan 'student',FILTER=>"FamilyFilter(=,'substring:stu’)”
scan 'student',FILTER=>"FamilyFilter(=,‘binary:stu’)”

结果;

hbase(main):042:0* scan 'student',{FILTER=>"FamilyFilter(=,'binary:stu')"}
ROW                           COLUMN+CELL                                                                          
0 row(s) in 0.0580 seconds

（2）QualifierFilter：列标识过滤器。

例2：显示列名为name的记录；

scan 'student',FILTER=>"QualifierFilter(=,'substring:name')"

结果;

hbase(main):001:0> scan 'student',{FILTER=>"QualifierFilter(=,'substring:name')"}
ROW                   COLUMN+CELL                                               
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 002                  column=stuinfo:name, timestamp=1586248440973, value=nancy 
 003                  column=stuinfo:name, timestamp=1586248574358, value=harry 
3 row(s) in 0.7890 seconds

（3）ColumnPrefixFilter：对列名前缀进行过滤。

例2：显示列名为name的记录；

scan 'student',FILTER=>"ColumnPrefixFilter('name’)”

结果;

hbase(main):002:0> scan 'student',FILTER=>"ColumnPrefixFilter('name')"
ROW                   COLUMN+CELL                                               
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 002                  column=stuinfo:name, timestamp=1586248440973, value=nancy 
 003                  column=stuinfo:name, timestamp=1586248574358, value=harry 
3 row(s) in 0.2490 seconds

等价于

scan 'student',FILTER=>"QualifierFilter(=,'substring:name')"

结果;

hbase(main):004:0> scan 'student',FILTER=>"QualifierFilter(=,'substring:name')"
ROW                   COLUMN+CELL                                               
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 002                  column=stuinfo:name, timestamp=1586248440973, value=nancy 
 003                  column=stuinfo:name, timestamp=1586248574358, value=harry 
3 row(s) in 0.1020 seconds

（4）MultipleColumnPrefixFilter：可以指定多个前缀

例3：显示列名为name和age的记录；

scan 'student',FILTER=>"MultipleColumnPrefixFilter('name','age')"

结果;

hbase(main):005:0> scan 'student',FILTER=>"MultipleColumnPrefixFilter('name','age')"
ROW                   COLUMN+CELL                                               
 001                  column=stuinfo:age, timestamp=1586248291518, value=18     
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 002                  column=stuinfo:name, timestamp=1586248440973, value=nancy 
 003                  column=stuinfo:age, timestamp=1586248586426, value=19     
 003                  column=stuinfo:name, timestamp=1586248574358, value=harry 
3 row(s) in 0.0870 seconds

（5）ColumnRangeFilter ：设置范围按字典序对列名进行过滤；

scan 'student',FILTER=>"ColumnRangeFilter('bi',true,'na',true)"

结果;

hbase(main):001:0> scan 'student',FILTER=>"ColumnRangeFilter('bi',true,'na',true)"
ROW                           COLUMN+CELL                                                                          
 001                          column=grades:english, timestamp=1586248371684, value=80                             
 001                          column=grades:math, timestamp=1586248399580, value=90                                
 002                          column=grades:bigdata, timestamp=1586248539502, value=88                             
 002                          column=grades:english, timestamp=1586248508242, value=85                             
 002                          column=grades:math, timestamp=1586248524101, value=78                                
 002                          column=stuinfo:class, timestamp=1586248476375, value=1802                            
 003                          column=grades:english, timestamp=1586248639980, value=90                             
 003                          column=grades:math, timestamp=1586248651102, value=80                                
 003                          column=stuinfo:class, timestamp=1586248611878, value=1803                            
3 row(s) in 0.8940 seconds

3．值过滤器

（1）ValueFilter ：值过滤器。

例1：查询值等于19的所有键值对

scan 'student',FILTER=>"ValueFilter(=,'binary:19') "
scan 'student',FILTER=>"ValueFilter(=,'substring:19')"

结果：

hbase(main):002:0> scan 'student',FILTER=>"ValueFilter(=,'binary:19')"
ROW                   COLUMN+CELL                                               
 003                  column=stuinfo:age, timestamp=1586248586426, value=19     
1 row(s) in 0.1480 seconds

hbase(main):003:0> scan 'student',FILTER=>"ValueFilter(=,'substring:19')"
ROW                   COLUMN+CELL                                               
 003                  column=stuinfo:age, timestamp=1586248586426, value=19     
1 row(s) in 0.0470 seconds

（2）SingleColumnValueFilter ：在指定的列族和列中进行值过滤器。

例2：查询stuinfo列族age列中值等于19的所有键值对

scan 'student',{COLUMN=>'stuinfo:age',FILTER=>"SingleColumnValueFilter('stuinfo','age',=,'binary:19')"}

结果;

hbase(main):007:0> scan 'student',{COLUMN=>'stuinfo:age',FILTER=>"SingleColumnValueFilter('stuinfo','age',=,'binary:19')"}
ROW                   COLUMN+CELL                                               
 003                  column=stuinfo:age, timestamp=1586248586426, value=19     
1 row(s) in 0.1500 seconds

等同于;

scan 'student',FILTER=>"SingleColumnValueFilter('stuinfo','name',=,'binary:alice')"

结果;

hbase(main):005:0> scan 'student',FILTER=>"SingleColumnValueFilter('stuinfo','name',=,'binary:alice')"
ROW                   COLUMN+CELL                                               
 001                  column=grades:english, timestamp=1586248371684, value=80  
 001                  column=grades:math, timestamp=1586248399580, value=90     
 001                  column=stuinfo:age, timestamp=1586248291518, value=18     
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 001                  column=stuinfo:sex, timestamp=1586248315971, value=female 
1 row(s) in 0.0490 seconds

4．其他过滤器

（1）ColumnCountGetFilter ：限制每个逻辑行返回的键值对数

例1：返回行键为001的前3个键值对

get 'student','001',FILTER=>"ColumnCountGetFilter(3)"

结果;

hbase(main):001:0> get 'student','002',FILTER=>"ColumnCountGetFilter(2)"
COLUMN                CELL                                                      
 grades:bigdata       timestamp=1586248539502, value=88                         
 grades:english       timestamp=1586248508242, value=85                         
1 row(s) in 0.7400 seconds

（2）PageFilter ：基于行的分页过滤器，设置返回行数。

例2：显示一行

scan 'student',FILTER=>"PageFilter(1)"

结果;

hbase(main):002:0> scan 'student',FILTER=>"PageFilter(1)"
ROW                   COLUMN+CELL                                               
 001                  column=grades:english, timestamp=1586248371684, value=80  
 001                  column=grades:math, timestamp=1586248399580, value=90     
 001                  column=stuinfo:age, timestamp=1586248291518, value=18     
 001                  column=stuinfo:name, timestamp=1586248245854, value=alice 
 001                  column=stuinfo:sex, timestamp=1586248315971, value=female 
1 row(s) in 0.2050 seconds

（3）ColumnPaginationFilter ：基于列的进行分页过滤器，需要设置偏移量与返回数量。

例3：显示每行第1列之后的2个键值对

scan 'student',FILTER=>"ColumnPaginationFilter(2,1)"

结果;

hbase(main):004:0> scan 'student',FILTER=>"ColumnPaginationFilter(2,1)"
ROW                   COLUMN+CELL                                               
 001                  column=grades:math, timestamp=1586248399580, value=90     
 001                  column=stuinfo:age, timestamp=1586248291518, value=18     
 002                  column=grades:english, timestamp=1586248508242, value=85  
 002                  column=grades:math, timestamp=1586248524101, value=78     
 003                  column=grades:math, timestamp=1586248651102, value=80     
 003                  column=stuinfo:age, timestamp=1586248586426, value=19     
3 row(s) in 0.1980 seconds

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。