1. 背景
    昨天同事反馈hbase查询返回无数据了,查了下返回RowTooBigEXcepiton,初步看是value过大导致?以为是有异常数据写入,我们hbase设计上一个列族,然后对应每个时间分片会通过动态列会记录时间分片类的数据指标,每天查询的时候通过rowkey +时间窗户, get返回对应的指标,
  2. 分析
    初步分析,有数据异常,某个value值过大,因为默认value值的大小是1G
/**
   * Default max row size (1 Gb).
   */
  public static final long TABLE_MAX_ROWSIZE_DEFAULT = 1024 * 1024 * 1024L;

正常写入不会返回那么大的数据,于是进入hbase shell中单独请求对应窗户的指标数据

#get '命名空间:表明','rowkey','列族:列'
get 'toffline:tmetric','top','slicetop:96430'

循环整个请求窗口发现都没有问题,于是直接请求整个rowkey 果然这里有问题
,异常出现了,应该是查询的时候没有设置对应的列做过滤,rowkey对应的value过大导致的,(也可以通过修改client参数hbase.table.max.rowsize获得查询结果,但这不是我们想要的)

get 'tjrecoffline:tmetric','top'

扫了一眼代码,有过滤啊,无法请求了?怎么办,先让数据变小,查询能响应

hbase(main):002:0> desc  'toffline:tmetric'
Table toffline:tmetric is DISABLED                                                                                                                                                        
toffline:tmetric                                                                                                                                                                          
COLUMN FAMILIES DESCRIPTION                                                                                                                                                                         
{NAME => 'content', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS
 => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                                                      
{NAME => 'slicetop', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSION
S => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                                                     
2 row(s) in 0.1090 seconds
#TTL => 'FOREVER' 数据没有设置过期时间,所以项目上线起初没问题,现在有问题了,数据量随着动态列的增加一直在累积。所以考虑了下业务环境保留三个月的数据可以满足要求,
#修改列的过期时间
hbase(main):003:0> alter 'toffline:tmetric',{NAME=>'slicetop',TTL=>'7776000'} 
Updating all regions with the new schema...
80/80 regions updated.
Done.
0 row(s) in 2.7270 seconds
#查看
hbase(main):004:0> desc  'toffline:tmetric'
Table tjrecoffline:doc_rt_metric is DISABLED                                                                                                                                                        
toffline:tmetric                                                                                                                                                                          
COLUMN FAMILIES DESCRIPTION                                                                                                                                                                         
{NAME => 'content', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS
 => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                                                      
{NAME => 'slicetop', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => '7776000 SECONDS (90 DAYS)', COMPRESSION => '
NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}                                                                                                   
2 row(s) in 0.0130 seconds
#major_compact 合并删除旧数据
hbase(main):005:0> major_compact 'toffline:tmetric'
0 row(s) in 1.6660 seconds
#打开表
hbase(main):006:0> enable 'toffline:tmetric'
0 row(s) in 4.3420 seconds

修该完成,查询一次试试
没有报错了,但是2天的数据20分钟返回结果,这tm,还是看代码吧
才发现,我们封装的hbase get请求虽然加了过滤但是只是针对列祖的没有针对列,于是稍作修改,

public String get(String tableName, String family, String qualifier,  String rawKey) throws IOException {
        if (!tableMap.containsKey(tableName)) {
            HTable table = new HTable(conf, tableName);
            tableMap.put(tableName, table);
        }

        HTable table = tableMap.get(tableName);
        Get get = new Get(Bytes.toBytes(rawKey));
        #FilterList 代表一个过滤器链,它可以包含一组即将应用于目标数据集的过滤器,增加列族和列
        FilterList filters = new FilterList();
        Filter qualifierFilter = new QualifierFilter(CompareFilter.CompareOp.EQUAL , new BinaryComparator(Bytes.toBytes(qualifier)));
        Filter familyFilter = new FamilyFilter(CompareFilter.CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes(family)));
        filters.addFilter(qualifierFilter);
        filters.addFilter(familyFilter);
        #get请求设置过滤组
        get.setFilter(filters);
        Result result = table.get(get);
        if (!result.isEmpty()) {
            if (result.getValue(Bytes.toBytes(family), Bytes.toBytes(qualifier)) != null) {
                return new String(result.getValue(Bytes.toBytes(family), Bytes.toBytes(qualifier)));
            } else {
                return null;
            }

        } else {
            return null;
        }
    }

打包发布继续请求,4秒返回,终于ok了,(同时反馈的这个表别导致full gc严重问题也得到了解决)

  1. 下面看看Hbase FIiter有哪些东西
    Fliter中主要有两个参数 CompareFilter.Compare比较用算符,比较器 ByteArrayComparable
    (1)比较运算符 CompareFilter.CompareOp
    比较运算符用于定义比较关系,可以有以下几类值供选择:
    EQUAL 相等
    GREATER 大于
    GREATER_OR_EQUAL 大于等于
    LESS 小于
    LESS_OR_EQUAL 小于等于
    NOT_EQUAL 不等于

(2)比较器 ByteArrayComparable
通过比较器可以实现多样化目标匹配效果,比较器有以下子类可以使用:
BinaryComparator 匹配完整字节数组
BinaryPrefixComparator 匹配字节数组前缀
BitComparator
NullComparator
RegexStringComparator 正则表达式匹配
比较是过滤器的核心可以实现字节比较字符串比较等

如字符串比较

Scan scan = new Scan();
RegexStringComparator comp = new RegexStringComparator("wukong."); // 正则以 wukong 开头的字符
SubstringComparator comp = new SubstringComparator("2018"); // 查找包含 2018 的字符串,并且不区分大小写
BinaryPrefixComparator comp = new BinaryPrefixComparator(Bytes.toBytes("sf")); //sf开头的,前缀二进制比较器。与二进制比较器不同的是,只比较前缀是否相同
SingleColumnValueFilter filter = new SingleColumnValueFilter(Bytes.toBytes("family"), Bytes.toBytes("qualifier"), CompareOp.EQUAL, comp);
scan.setFilter(filter);