HBase分页

hbase中的数据是按照rowkey字典排序存储的,实现分页的思路有两个,一个是获取页面的起始rowkey,然后使用PageFilter来限制每页的数量。

另一种是获取页面的起始rowkey和结束rowkey,然后直接调用scan的withStartRow和withStopRow查询即可。

方案一、PageFilter

hbase中有自带的PageFilter,能够实现分页功能,但是局限性很大,实际很少使用。

hbase代码行数 hbase startrow endrow_分页

就是说,PageFilter的作用域是单个的region,会从每个region中返回符合条件的数据,如果某个表有多个region,那么返回的数量就不准确了。因此,使用PageFilter的前提条件是必须只有一个region(这对于hbase来说显然不合适)。

public static ResultScanner queryDataByPage(String tableName, int currentPage, int pageSize) {
        Table table;
        byte[] startRow;
        ResultScanner resultScanner = null;
        try {
            table = connection.getTable(TableName.valueOf(tableName));
            int from = (currentPage - 1) * pageSize;
            startRow = getStartRow(table,from);
            Scan scan = new Scan();
            PageFilter pageFilter = new PageFilter(pageSize);
            scan.setFilter(pageFilter);

            if (startRow != null){
                scan.withStartRow(startRow,false);
            }
            resultScanner = table.getScanner(scan);
        } catch (IOException e) {
            logger.error(e.getMessage());
        }
        return resultScanner;
    }

    private static byte[] getStartRow(Table table,int from){
        byte[] startRow = null;
        Scan scan = new Scan();
        PageFilter pageFilter = new PageFilter(from);
        scan.setFilter(pageFilter);
        ResultScanner scanner = null;
        try {
            scanner = table.getScanner(scan);
        } catch (IOException e) {
            logger.error(e.getMessage());
        }
        Iterator<Result> iterator = scanner.iterator();
        Result result = null;
        while (iterator.hasNext()){
            result = iterator.next();
        }
        if (result != null){
            startRow = result.getRow();
        }
        return startRow;
    }

方案二、Limit

从2.0开始,HBase提供了一个limit功能来从RegionServer层面限制返回的数量。

hbase代码行数 hbase startrow endrow_作用域_02

public static ResultScanner queryDataByPage(String tableName, int currentPage, int pageSize) {
        Table table;
        byte[] startRow;
        byte[] stopRow;
        ResultScanner resultScanner = null;
        try {
            table = connection.getTable(TableName.valueOf(tableName));
            int from = (currentPage - 1) * pageSize;
            startRow = getStartRow(table,from);
            int to = currentPage * pageSize;
            stopRow = getStopRow(table,to);
            Scan scan = new Scan();

            if (startRow != null){
                scan.withStartRow(startRow,false);
            }
            if (stopRow != null){
                scan.withStopRow(stopRow,true);
            }
            resultScanner = table.getScanner(scan);
        } catch (IOException e) {
            logger.error(e.getMessage());
        }
        return resultScanner;
    }

    private static byte[] getStartRow(Table table,int from){
        byte[] startRow = null;
        Scan scan = new Scan();
        scan.setLimit(from);
        ResultScanner scanner = null;
        try {
            scanner = table.getScanner(scan);
        } catch (IOException e) {
            logger.error(e.getMessage());
        }
        Iterator<Result> iterator = scanner.iterator();
        Result result = null;
        while (iterator.hasNext()){
            result = iterator.next();
        }
        if (result != null){
            startRow = result.getRow();
        }
        return startRow;
    }
    private static byte[] getStopRow(Table table,int to){
        byte[] stopRow = null;
        Scan scan = new Scan();
        scan.setLimit(to);
        ResultScanner scanner = null;
        try {
            scanner = table.getScanner(scan);
        } catch (IOException e) {
            logger.error(e.getMessage());
        }
        Iterator<Result> iterator = scanner.iterator();
        Result result = null;
        while (iterator.hasNext()){
            result = iterator.next();
        }
        if (result != null){
            stopRow = result.getRow();
        }
        return stopRow;
    }

其实方案二这种方式主要是利用setLimit获取页面的起止rowkey,和方案一的区别就是解决了PageFilter的单region局限。

当然方案二也非常笨,需要先通过两次扫描表来获取起止rowkey,第三次才是取所需的页面数据。目前不知道其他的方案,先用这种。