- 背景
昨天同事反馈hbase查询返回无数据了,查了下返回RowTooBigEXcepiton,初步看是value过大导致?以为是有异常数据写入,我们hbase设计上一个列族,然后对应每个时间分片会通过动态列会记录时间分片类的数据指标,每天查询的时候通过rowkey +时间窗户, get返回对应的指标, - 分析
初步分析,有数据异常,某个value值过大,因为默认value值的大小是1G
/**
* Default max row size (1 Gb).
*/
public static final long TABLE_MAX_ROWSIZE_DEFAULT = 1024 * 1024 * 1024L;
正常写入不会返回那么大的数据,于是进入hbase shell中单独请求对应窗户的指标数据
#get '命名空间:表明','rowkey','列族:列'
get 'toffline:tmetric','top','slicetop:96430'
循环整个请求窗口发现都没有问题,于是直接请求整个rowkey 果然这里有问题
,异常出现了,应该是查询的时候没有设置对应的列做过滤,rowkey对应的value过大导致的,(也可以通过修改client参数hbase.table.max.rowsize获得查询结果,但这不是我们想要的)
get 'tjrecoffline:tmetric','top'
扫了一眼代码,有过滤啊,无法请求了?怎么办,先让数据变小,查询能响应
hbase(main):002:0> desc 'toffline:tmetric'
Table toffline:tmetric is DISABLED
toffline:tmetric
COLUMN FAMILIES DESCRIPTION
{NAME => 'content', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS
=> '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'slicetop', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSION
S => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
2 row(s) in 0.1090 seconds
#TTL => 'FOREVER' 数据没有设置过期时间,所以项目上线起初没问题,现在有问题了,数据量随着动态列的增加一直在累积。所以考虑了下业务环境保留三个月的数据可以满足要求,
#修改列的过期时间
hbase(main):003:0> alter 'toffline:tmetric',{NAME=>'slicetop',TTL=>'7776000'}
Updating all regions with the new schema...
80/80 regions updated.
Done.
0 row(s) in 2.7270 seconds
#查看
hbase(main):004:0> desc 'toffline:tmetric'
Table tjrecoffline:doc_rt_metric is DISABLED
toffline:tmetric
COLUMN FAMILIES DESCRIPTION
{NAME => 'content', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS
=> '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'slicetop', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => '7776000 SECONDS (90 DAYS)', COMPRESSION => '
NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
2 row(s) in 0.0130 seconds
#major_compact 合并删除旧数据
hbase(main):005:0> major_compact 'toffline:tmetric'
0 row(s) in 1.6660 seconds
#打开表
hbase(main):006:0> enable 'toffline:tmetric'
0 row(s) in 4.3420 seconds
修该完成,查询一次试试
没有报错了,但是2天的数据20分钟返回结果,这tm,还是看代码吧
才发现,我们封装的hbase get请求虽然加了过滤但是只是针对列祖的没有针对列,于是稍作修改,
public String get(String tableName, String family, String qualifier, String rawKey) throws IOException {
if (!tableMap.containsKey(tableName)) {
HTable table = new HTable(conf, tableName);
tableMap.put(tableName, table);
}
HTable table = tableMap.get(tableName);
Get get = new Get(Bytes.toBytes(rawKey));
#FilterList 代表一个过滤器链,它可以包含一组即将应用于目标数据集的过滤器,增加列族和列
FilterList filters = new FilterList();
Filter qualifierFilter = new QualifierFilter(CompareFilter.CompareOp.EQUAL , new BinaryComparator(Bytes.toBytes(qualifier)));
Filter familyFilter = new FamilyFilter(CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes(family)));
filters.addFilter(qualifierFilter);
filters.addFilter(familyFilter);
#get请求设置过滤组
get.setFilter(filters);
Result result = table.get(get);
if (!result.isEmpty()) {
if (result.getValue(Bytes.toBytes(family), Bytes.toBytes(qualifier)) != null) {
return new String(result.getValue(Bytes.toBytes(family), Bytes.toBytes(qualifier)));
} else {
return null;
}
} else {
return null;
}
}
打包发布继续请求,4秒返回,终于ok了,(同时反馈的这个表别导致full gc严重问题也得到了解决)
- 下面看看Hbase FIiter有哪些东西
Fliter中主要有两个参数 CompareFilter.Compare比较用算符,比较器 ByteArrayComparable
(1)比较运算符 CompareFilter.CompareOp
比较运算符用于定义比较关系,可以有以下几类值供选择:
EQUAL 相等
GREATER 大于
GREATER_OR_EQUAL 大于等于
LESS 小于
LESS_OR_EQUAL 小于等于
NOT_EQUAL 不等于
(2)比较器 ByteArrayComparable
通过比较器可以实现多样化目标匹配效果,比较器有以下子类可以使用:
BinaryComparator 匹配完整字节数组
BinaryPrefixComparator 匹配字节数组前缀
BitComparator
NullComparator
RegexStringComparator 正则表达式匹配
比较是过滤器的核心可以实现字节比较字符串比较等
如字符串比较
Scan scan = new Scan();
RegexStringComparator comp = new RegexStringComparator("wukong."); // 正则以 wukong 开头的字符
SubstringComparator comp = new SubstringComparator("2018"); // 查找包含 2018 的字符串,并且不区分大小写
BinaryPrefixComparator comp = new BinaryPrefixComparator(Bytes.toBytes("sf")); //sf开头的,前缀二进制比较器。与二进制比较器不同的是,只比较前缀是否相同
SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes("family"), Bytes.toBytes("qualifier"), CompareOp.EQUAL, comp);
scan.setFilter(filter);