为hbase表数据指定过期时间,达到过期时间后,compaction时自动删除过期数据。

  • 通常Hbase表默认TTL为FOREVER, 或者你可以指定一个TTL(单位秒)值
  • 修改表结构命令有两个alter  alter_async, 异步方式还可通过alter_status查看进度。通常选择异步方式,下边也以alter_async为例。
  • 修改线上业务表时注意,修改表结构是有损的,修改表的过程region需要关闭、重新打开,所以修改过程可能会有NotServingRegionException

 TTL的变更情形:

  1. 当TTL过长,想调小一下
  2. 当前TTL太短,想调大一些
  3. 一个永久表,指定一个TTL(其实也相当于调小)
  4. 一个TTL表,指定为永久表 (其实相当于调大)

测试

前3种情形,在语法上其实是一种情况,如下:

-- 注意TTL单位为'秒',自己换算成需要的大小即可
> alter_async 'TABLE_NAME',{NAME => 'f',TTL => '10368000'}

第4种情况,要改为永久,永久是多久呢?
我们知道默认不指定TTL,会显示TTL=>'FOREVER'是否可以直接使用呢,结果是不可以;

hbase(main):004:0> alter_async 'TABLE_NAME',{NAME => 'f',TTL => 'FOREVER'}
ERROR: For input string: "FOREVER"

看下代码里这个值为2147483647

Hbase修改数据 hbase修改ttl_表结构

 

hbase(main):005:0> alter_async 'TABLE_NAME',{NAME => 'f',TTL => '2147483647'}
0 row(s) in 0.9220 seconds

-- 可以看到TTL => 'FOREVER',修改成功
hbase(main):002:0> desc 'TABLE_NAME'
Table HT:DENYLOG_TIME is ENABLED                                           
TABLE_NAME, {CONFIGURATION => {'hbase.hregion.max.filesize' => '10737418240'}}                            
COLUMN FAMILIES DESCRIPTION                                                                  
{NAME => 'f', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'SNAPPY
', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                    
1 row(s) in 0.1070 seconds

我们再试一下TTL => '-1'行不行呢?

-- 貌似修改成功了
hbase(main):003:0> alter_async 'HT:DENYLOG_TIME',{NAME => 'f',TTL => '-1'}
0 row(s) in 0.9660 seconds
-- TTL => '-1 SECONDS'?显然是不可以的
hbase(main):002:0> desc 'TABLE_NAME'
TABLE_NAME HT:DENYLOG_TIME is ENABLED                                           
Test, {CONFIGURATION => {'hbase.hregion.max.filesize' => '10737418240'}}                            
COLUMN FAMILIES DESCRIPTION                                                                  
{NAME => 'f', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => '-1 SECONDS', COMPRESSION => 'SNAPPY
', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                    
1 row(s) in 0.1070 seconds

实表测试:

在user_1表中新插入一个rowkey:0831 并写了两条数据,其中1001-1005 rowkey的数据都是几个月前的旧数据;

等下设置TTL值为30天,看1001-1005的数据会不会被删除,  目前表的TTL为FOREVER;

Hbase修改数据 hbase修改ttl_表结构_02

 

-- 此时TTL是FOREVER
hbase(main):009:0> describe 'user_1'
Table user_1 is ENABLED                                                                                                                                                                                           
user_1                                                                                                                                                                                                            
COLUMN FAMILIES DESCRIPTION                                                                                                                                                                                       
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCAC
HE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '1'}                                                                                                                                                     
1 row(s) in 0.0590 seconds

-- disable 表
hbase(main):010:0> disable 'user_1'
0 row(s) in 2.2860 seconds

-- 修改TTL
hbase(main):011:0> alter 'user_1',{NAME =>'info',TTL => '2592000'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 2.2100 seconds

-- 此时TTL变为了30天
hbase(main):013:0> describe 'user_1'
Table user_1 is ENABLED                                                                                                                                                                                           
user_1                                                                                                                                                                                                            
COLUMN FAMILIES DESCRIPTION                                                                                                                                                                                       
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => '2592000 SECONDS (30 DAYS)', COMPRESSION => 'NONE', MIN_VERSION
S => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '1'}                                                                                                                                   
1 row(s) in 0.0230 seconds

  

然后对表执行major_compact

-- major_compact
hbase(main):014:0> major_compact 'user_1'
0 row(s) in 0.1450 seconds

-- 查看数据,可见1001-1005的数据已经没有了
hbase(main):015:0> scan 'user_1'
ROW                                                   COLUMN+CELL                                                                                                                                                 
 0831                                                 column=info:age, timestamp=1661925584562, value=23                                                                                                          
 0831                                                 column=info:name, timestamp=1661925475903, value=qq831                                                                                                      
1 row(s) in 0.0170 seconds

 

这里如果只是想对表执行一次性的清理,需要再把表的TTL改为FOREVER:

alter_async 'TABLE_NAME',{NAME => 'f',TTL => '2147483647'}

 

总结

-- 设置、调大或调小TTL
alter_async 'TABLE_NAME',{NAME => 'f',TTL => '秒数'}

-- 恢复TTL为永久,其值不可以使用FOREVER或-1
alter_async 'TABLE_NAME',{NAME => 'f',TTL => '2147483647'}