HBase分页
hbase中的数据是按照rowkey字典排序存储的,实现分页的思路有两个,一个是获取页面的起始rowkey,然后使用PageFilter来限制每页的数量。
另一种是获取页面的起始rowkey和结束rowkey,然后直接调用scan的withStartRow和withStopRow查询即可。
方案一、PageFilter
hbase中有自带的PageFilter,能够实现分页功能,但是局限性很大,实际很少使用。
就是说,PageFilter的作用域是单个的region,会从每个region中返回符合条件的数据,如果某个表有多个region,那么返回的数量就不准确了。因此,使用PageFilter的前提条件是必须只有一个region(这对于hbase来说显然不合适)。
public static ResultScanner queryDataByPage(String tableName, int currentPage, int pageSize) {
Table table;
byte[] startRow;
ResultScanner resultScanner = null;
try {
table = connection.getTable(TableName.valueOf(tableName));
int from = (currentPage - 1) * pageSize;
startRow = getStartRow(table,from);
Scan scan = new Scan();
PageFilter pageFilter = new PageFilter(pageSize);
scan.setFilter(pageFilter);
if (startRow != null){
scan.withStartRow(startRow,false);
}
resultScanner = table.getScanner(scan);
} catch (IOException e) {
logger.error(e.getMessage());
}
return resultScanner;
}
private static byte[] getStartRow(Table table,int from){
byte[] startRow = null;
Scan scan = new Scan();
PageFilter pageFilter = new PageFilter(from);
scan.setFilter(pageFilter);
ResultScanner scanner = null;
try {
scanner = table.getScanner(scan);
} catch (IOException e) {
logger.error(e.getMessage());
}
Iterator<Result> iterator = scanner.iterator();
Result result = null;
while (iterator.hasNext()){
result = iterator.next();
}
if (result != null){
startRow = result.getRow();
}
return startRow;
}
方案二、Limit
从2.0开始,HBase提供了一个limit功能来从RegionServer层面限制返回的数量。
public static ResultScanner queryDataByPage(String tableName, int currentPage, int pageSize) {
Table table;
byte[] startRow;
byte[] stopRow;
ResultScanner resultScanner = null;
try {
table = connection.getTable(TableName.valueOf(tableName));
int from = (currentPage - 1) * pageSize;
startRow = getStartRow(table,from);
int to = currentPage * pageSize;
stopRow = getStopRow(table,to);
Scan scan = new Scan();
if (startRow != null){
scan.withStartRow(startRow,false);
}
if (stopRow != null){
scan.withStopRow(stopRow,true);
}
resultScanner = table.getScanner(scan);
} catch (IOException e) {
logger.error(e.getMessage());
}
return resultScanner;
}
private static byte[] getStartRow(Table table,int from){
byte[] startRow = null;
Scan scan = new Scan();
scan.setLimit(from);
ResultScanner scanner = null;
try {
scanner = table.getScanner(scan);
} catch (IOException e) {
logger.error(e.getMessage());
}
Iterator<Result> iterator = scanner.iterator();
Result result = null;
while (iterator.hasNext()){
result = iterator.next();
}
if (result != null){
startRow = result.getRow();
}
return startRow;
}
private static byte[] getStopRow(Table table,int to){
byte[] stopRow = null;
Scan scan = new Scan();
scan.setLimit(to);
ResultScanner scanner = null;
try {
scanner = table.getScanner(scan);
} catch (IOException e) {
logger.error(e.getMessage());
}
Iterator<Result> iterator = scanner.iterator();
Result result = null;
while (iterator.hasNext()){
result = iterator.next();
}
if (result != null){
stopRow = result.getRow();
}
return stopRow;
}
其实方案二这种方式主要是利用setLimit获取页面的起止rowkey,和方案一的区别就是解决了PageFilter的单region局限。
当然方案二也非常笨,需要先通过两次扫描表来获取起止rowkey,第三次才是取所需的页面数据。目前不知道其他的方案,先用这种。