为hbase表数据指定过期时间,达到过期时间后,compaction时自动删除过期数据。
- 通常Hbase表默认TTL为FOREVER, 或者你可以指定一个TTL(单位秒)值
- 修改表结构命令有两个
alter
alter_async
, 异步方式还可通过alter_status
查看进度。通常选择异步方式,下边也以alter_async为例。- 修改线上业务表时注意,修改表结构是有损的,修改表的过程region需要关闭、重新打开,所以修改过程可能会有
NotServingRegionException
TTL的变更情形:
- 当TTL过长,想调小一下
- 当前TTL太短,想调大一些
- 一个永久表,指定一个TTL(其实也相当于调小)
- 一个TTL表,指定为永久表 (其实相当于调大)
测试
前3种情形,在语法上其实是一种情况,如下:
-- 注意TTL单位为'秒',自己换算成需要的大小即可
> alter_async 'TABLE_NAME',{NAME => 'f',TTL => '10368000'}
第4种情况,要改为永久,永久是多久呢?
我们知道默认不指定TTL,会显示TTL=>'FOREVER'
是否可以直接使用呢,结果是不可以;
hbase(main):004:0> alter_async 'TABLE_NAME',{NAME => 'f',TTL => 'FOREVER'}
ERROR: For input string: "FOREVER"
hbase(main):005:0> alter_async 'TABLE_NAME',{NAME => 'f',TTL => '2147483647'}
0 row(s) in 0.9220 seconds
-- 可以看到TTL => 'FOREVER',修改成功
hbase(main):002:0> desc 'TABLE_NAME'
Table HT:DENYLOG_TIME is ENABLED
TABLE_NAME, {CONFIGURATION => {'hbase.hregion.max.filesize' => '10737418240'}}
COLUMN FAMILIES DESCRIPTION
{NAME => 'f', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'SNAPPY
', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.1070 seconds
我们再试一下TTL => '-1'
行不行呢?
-- 貌似修改成功了
hbase(main):003:0> alter_async 'HT:DENYLOG_TIME',{NAME => 'f',TTL => '-1'}
0 row(s) in 0.9660 seconds
-- TTL => '-1 SECONDS'?显然是不可以的
hbase(main):002:0> desc 'TABLE_NAME'
TABLE_NAME HT:DENYLOG_TIME is ENABLED
Test, {CONFIGURATION => {'hbase.hregion.max.filesize' => '10737418240'}}
COLUMN FAMILIES DESCRIPTION
{NAME => 'f', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => '-1 SECONDS', COMPRESSION => 'SNAPPY
', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
1 row(s) in 0.1070 seconds
实表测试:
在user_1表中新插入一个rowkey:0831 并写了两条数据,其中1001-1005 rowkey的数据都是几个月前的旧数据;
等下设置TTL值为30天,看1001-1005的数据会不会被删除, 目前表的TTL为FOREVER;
-- 此时TTL是FOREVER
hbase(main):009:0> describe 'user_1'
Table user_1 is ENABLED
user_1
COLUMN FAMILIES DESCRIPTION
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCAC
HE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '1'}
1 row(s) in 0.0590 seconds
-- disable 表
hbase(main):010:0> disable 'user_1'
0 row(s) in 2.2860 seconds
-- 修改TTL
hbase(main):011:0> alter 'user_1',{NAME =>'info',TTL => '2592000'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 2.2100 seconds
-- 此时TTL变为了30天
hbase(main):013:0> describe 'user_1'
Table user_1 is ENABLED
user_1
COLUMN FAMILIES DESCRIPTION
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => '2592000 SECONDS (30 DAYS)', COMPRESSION => 'NONE', MIN_VERSION
S => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '1'}
1 row(s) in 0.0230 seconds
然后对表执行major_compact
-- major_compact
hbase(main):014:0> major_compact 'user_1'
0 row(s) in 0.1450 seconds
-- 查看数据,可见1001-1005的数据已经没有了
hbase(main):015:0> scan 'user_1'
ROW COLUMN+CELL
0831 column=info:age, timestamp=1661925584562, value=23
0831 column=info:name, timestamp=1661925475903, value=qq831
1 row(s) in 0.0170 seconds
这里如果只是想对表执行一次性的清理,需要再把表的TTL改为FOREVER:
alter_async 'TABLE_NAME',{NAME => 'f',TTL => '2147483647'}
总结
-- 设置、调大或调小TTL
alter_async 'TABLE_NAME',{NAME => 'f',TTL => '秒数'}
-- 恢复TTL为永久,其值不可以使用FOREVER或-1
alter_async 'TABLE_NAME',{NAME => 'f',TTL => '2147483647'}