目录
- 背景
- HBase写入流程
- 总结put操作
- HBase读取流程
- get方法
- gets方法
- 总结
背景
在部署Flink流式作业读取Kafka消息,往HBase写数据的时候,因为时间的不同,往往需要写入不同的HBase table。随着时间的推移,会发现Flink的GC逐渐变高,导致Flink作业的吞吐下降。通过dump内存进行分析之后,发现内存中最大的部分是 org.apache.hadoop.hbase.client.AsyncRegionLocator 这一对象。因此,对于long running的流式作业应该在切表的时候,调用hbase connection的clearRegionLocationCache方法,清理客户端中的cache,防止task出现频繁GC的情况。下面对HBase读写过程进行分析,看一看哪里用到了RegionLocationCache。
HBase写入流程
- HBase客户端的put请求主要是调用HTable put方法中的getBufferedMutator().mutate(put)。mutate方法的具体实现为BufferedMutatorImpl类中的mutate方法。进入mutate函数可以看到它会把用户提交的此次put操作放入到列表writeAsyncBuffer中,当buffer中的数据超过规定值时,由后台进程进行提交。其中值得注意的是,对于一个put请求,HBase会通过validatePut方法校验cell的大小是否超出限制。
public void mutate(List<? extends Mutation> ms) throws InterruptedIOException, RetriesExhaustedWithDetailsException {
this.checkClose();
long toAddSize = 0L;
int toAddCount = 0;
for(Iterator var5 = ms.iterator(); var5.hasNext(); ++toAddCount) {
Mutation m = (Mutation)var5.next();
if (m instanceof Put) {
ConnectionUtils.validatePut((Put)m, this.maxKeyValueSize);
}
toAddSize += m.heapSize();
}
if (this.currentWriteBufferSize.get() == 0L) {
this.firstRecordInBufferTimestamp.set(System.currentTimeMillis());
}
this.currentWriteBufferSize.addAndGet(toAddSize);
this.writeAsyncBuffer.addAll(ms);
this.undealtMutationCount.addAndGet(toAddCount);
this.doFlush(false);
}
接下来进一步深入查看doFlush方法。如果flushAll为True则发送所有数据到HBase并等待其返回值,否则只有到数据量超过Buffer Size(2M)之后进行发送。其中真正进行数据发送的是通过AsyncProcess ap对象调用submit方法,提交task。
private void doFlush(boolean flushAll) throws InterruptedIOException,
RetriesExhaustedWithDetailsException {
List<RetriesExhaustedWithDetailsException> errors = new ArrayList<>();
while (true) {
if (!flushAll && currentWriteBufferSize.get() <= writeBufferSize) {
// There is the room to accept more mutations.
break;
}
AsyncRequestFuture asf;
try (QueueRowAccess access = createQueueRowAccess()) {
if (access.isEmpty()) {
// It means someone has gotten the ticker to run the flush.
break;
}
asf = ap.submit(createTask(access));
}
// DON'T do the wait in the try-with-resources. Otherwise, the undealt mutations won't
// be released.
asf.waitUntilDone();
if (asf.hasError()) {
errors.add(asf.getErrors());
}
}
......
}
进一步深入查看AsyncProcess的submit方法,下面列举submit方法的部分关键代码。通过actionsByServer这一Map对象,将所有要put的数据根据不同的HRegionServer进行存储,同时将writeAsyncBuffer里的put封装为Action对象。然后对于writeAsyncBuffer中的每一个put实例调用connection的locateRegion方法,确定其要写到的远程Region的地址。最后通过submitMultiActions方法进行提交
private <CResult> AsyncRequestFuture submit(AsyncProcessTask<CResult> task,
boolean atLeastOne) throws InterruptedIOException {
TableName tableName = task.getTableName();
RowAccess<? extends Row> rows = task.getRowAccess();
Map<ServerName, MultiAction> actionsByServer = new HashMap<>();
......
boolean firstIter = true;
do {
......
Iterator<? extends Row> it = rows.iterator();
while (it.hasNext()) {
Row r = it.next();
HRegionLocation loc;
try {
if (r == null) {
throw new IllegalArgumentException("#" + id + ", row cannot be null");
}
// Make sure we get 0-s replica.
RegionLocations locs = connection.locateRegion(
tableName, r.getRow(), true, true, RegionReplicaUtil.DEFAULT_REPLICA_ID);
if (locs == null || locs.isEmpty() || locs.getDefaultRegionLocation() == null) {
throw new IOException("#" + id + ", no location found, aborting submit for"
+ " tableName=" + tableName + " rowkey=" + Bytes.toStringBinary(r.getRow()));
}
loc = locs.getDefaultRegionLocation();
} catch (IOException ex) {
......
}
ReturnCode code = checker.canTakeRow(loc, r);
if (code == ReturnCode.END) {
break;
}
if (code == ReturnCode.INCLUDE) {
int priority = HConstants.NORMAL_QOS;
if (r instanceof Mutation) {
priority = ((Mutation) r).getPriority();
}
Action action = new Action(r, ++posInList, priority);
setNonce(ng, r, action);
retainedActions.add(action);
// TODO: replica-get is not supported on this path
byte[] regionName = loc.getRegionInfo().getRegionName();
addAction(loc.getServerName(), regionName, action, actionsByServer, nonceGroup);
it.remove();
}
}
firstIter = false;
} while (retainedActions.isEmpty() && atLeastOne && (locationErrors == null));
if (retainedActions.isEmpty()) return NO_REQS_RESULT;
return submitMultiActions(task, retainedActions, nonceGroup,
locationErrors, locationErrorRows, actionsByServer);
}
为了进一步说明Region Cache的作用,我们进一步查看HConnection的locateRegion方法。其具体的实现类为ConnectionImplementation,具体实现的方法为locateRegionInMeta。先判定本地cache中是否存在该Region location的信息,如果不存在的话则scan Meta表获取相应的Region(此时应该值得注意的是,如果频繁清理HBase client中的Region Cache会对HBase Meta产生不小的负载)。最后将Region信息加入本地cache中。
private RegionLocations locateRegionInMeta(TableName tableName, byte[] row, boolean useCache,
boolean retry, int replicaId) throws IOException {
// If we are supposed to be using the cache, look in the cache to see if we already have the
// region.
if (useCache) {
RegionLocations locations = getCachedLocation(tableName, row);
if (locations != null && locations.getRegionLocation(replicaId) != null) {
return locations;
}
}
......
byte[] metaStartKey = RegionInfo.createRegionName(tableName, row, HConstants.NINES, false);
byte[] metaStopKey =
RegionInfo.createRegionName(tableName, HConstants.EMPTY_START_ROW, "", false);
Scan s = new Scan().withStartRow(metaStartKey).withStopRow(metaStopKey, true)
.addFamily(HConstants.CATALOG_FAMILY).setReversed(true).setCaching(5)
.setReadType(ReadType.PREAD);
......
// Instantiate the location
cacheLocation(tableName, locations);
进入sendMultiAction()方法,看它是如何发送put请求的。每个任务都是通过HBase的RPC框架与服务器进行通信,并获取返回的结果。client端通过MultiServerCallable.call()方法调用res的rpc的multi()方法,来实现put提交请求。
void sendMultiAction(Map<ServerName, MultiAction> actionsByServer,
int numAttempt, List<Action> actionsForReplicaThread, boolean reuseThread) {
for (Map.Entry<ServerName, MultiAction> e : actionsByServer.entrySet()) {
Collection<? extends Runnable> runnables = getNewMultiActionRunnable(server, multiAction,
numAttempt);
pool.submit(runnable);
......
}
......
}
final class SingleServerRequestRunnable implements Runnable {
public void run() {
AbstractResponse res = null;
CancellableRegionServerCallable callable = currentCallable;
try {
// setup the callable based on the actions, if we don't have one already from the request
if (callable == null) {
callable = createCallable(server, tableName, multiAction);
}
RpcRetryingCaller<AbstractResponse> caller = asyncProcess.createCaller(callable,rpcTimeout);
try {
if (callsInProgress != null) {
callsInProgress.add(callable);
}
res = caller.callWithoutRetries(callable, operationTimeout);
......
}
}
}
总结put操作
(1)把put操作添加到writeAsyncBuffer队列里面,符合条件(自动flush或者超过了阀值writeBufferSize)就通过AsyncProcess异步批量提交。
(2)在提交之前,我们要根据每个rowkey找到它们归属的region server,这个定位的过程是通过HConnection的locateRegion方法获得的,然后再把这些rowkey按照HRegionLocation分组。在获得具体region位置的时候,会对最近使用的region server做缓存,如果缓存中保存了相应的region server信息,就直接使用这个region信息,连接这个region server,否则会对meta表进行一次rpc操作,获得region server信息,客户端的操作put、get、delete等操作每次都是封装在一个Action对象中进行提交操作的,都是一系列的的action一起提交,这就是MultiAction。
(3)通过多线程,一个HRegionLocation构造MultiServerCallable,然后通过rpcCallerFactory. newCaller()执行调用,忽略掉失败重新提交和错误处理,客户端的提交操作到此结束。
HBase读取流程
HBase客户端的put请求主要是调用HTable get方法。其中HBase connection一次rpc请求中,可包含一个或多个get请求。二者实现的逻辑稍有不同。下面进行逐一分析。
get方法
如果是单个get请求发送一次RPC call的话,实际上调用的是HTable类中的get方法,具体代码如下所示。get方法直接调用RPC请求返回结果,代码较为简单。
private Result get(Get get, final boolean checkExistenceOnly) throws IOException {
// if we are changing settings to the get, clone it.
if (get.isCheckExistenceOnly() != checkExistenceOnly || get.getConsistency() == null) {
get = ReflectionUtils.newInstance(get.getClass(), get);
get.setCheckExistenceOnly(checkExistenceOnly);
if (get.getConsistency() == null){
get.setConsistency(DEFAULT_CONSISTENCY);
}
}
if (get.getConsistency() == Consistency.STRONG) {
final Get configuredGet = get;
ClientServiceCallable<Result> callable = new ClientServiceCallable<Result>(this.connection, getName(),
get.getRow(), this.rpcControllerFactory.newController(), get.getPriority()) {
@Override
protected Result rpcCall() throws Exception {
ClientProtos.GetRequest request = RequestConverter.buildGetRequest(
getLocation().getRegionInfo().getRegionName(), configuredGet);
ClientProtos.GetResponse response = doGet(request);
return response == null? null:
ProtobufUtil.toResult(response.getResult(), getRpcControllerCellScanner());
}
};
return rpcCallerFactory.<Result>newCaller(readRpcTimeoutMs).callWithRetries(callable,
this.operationTimeoutMs);
}
// Call that takes into account the replica
RpcRetryingCallerWithReadReplicas callable = new RpcRetryingCallerWithReadReplicas(
rpcControllerFactory, tableName, this.connection, get, pool,
connConfiguration.getRetriesNumber(), operationTimeoutMs, readRpcTimeoutMs,
connConfiguration.getPrimaryCallTimeoutMicroSecond());
return callable.call(operationTimeoutMs);
}
此时读者可能会问,此时哪里调用了HBase的Region Cache?其中rpcCallerFactory类中callWithRetries方法的具体实现方法 RpcRetryingCallerImpl.callWithRetries方法中调用了RegionServerCallable.prepare方法中使用了Region Cache。即当本地HBase client构建RPC请求往远程Region Server进行交互的时候,当出现RegionInfo为空等异常的时候,客户端会进行重试,刷新本地client中的Region cache。其中主要调用的是connection.getRegionLocator方法,前文中已经进行过分析,此处就不再赘述。
public void prepare(final boolean reload) throws IOException {
// check table state if this is a retry
if (reload && tableName != null && !tableName.equals(TableName.META_TABLE_NAME)
&& getConnection().isTableDisabled(tableName)) {
throw new TableNotEnabledException(tableName.getNameAsString() + " is disabled.");
}
try (RegionLocator regionLocator = connection.getRegionLocator(tableName)) {
this.location = regionLocator.getRegionLocation(row);
}
if (this.location == null) {
throw new IOException("Failed to find location, tableName=" + tableName +
", row=" + Bytes.toString(row) + ", reload=" + reload);
}
setStubByServiceName(this.location.getServerName());
}
gets方法
HBase client中的gets方法指的是一次RPC请求中包含多个get,其具体调用的是HTable中的gets方法。如果发现list中只有一个get请求的时候,则调用get方法。如果get请求的个数大于1的时候,则具体调用batch方法。
public Result[] get(List<Get> gets) throws IOException {
if (gets.size() == 1) {
return new Result[]{get(gets.get(0))};
}
try {
Object[] r1 = new Object[gets.size()];
batch((List<? extends Row>)gets, r1, readRpcTimeoutMs);
// Translate.
Result [] results = new Result[r1.length];
int i = 0;
for (Object obj: r1) {
// Batch ensures if there is a failure we get an exception instead
results[i++] = (Result)obj;
}
return results;
} catch (InterruptedException e) {
throw (InterruptedIOException)new InterruptedIOException().initCause(e);
}
}
其中batch方法调用的是AsyncProcess ap对象的submit方法,进行提交task。具体实现已经在put方法中进行了描述,此时就不再进行赘述。
public void batch(final List<? extends Row> actions, final Object[] results, int rpcTimeout)
throws InterruptedException, IOException {
AsyncProcessTask task = AsyncProcessTask.newBuilder()
.setPool(pool)
.setTableName(tableName)
.setRowAccess(actions)
.setResults(results)
.setRpcTimeout(rpcTimeout)
.setOperationTimeout(operationTimeoutMs)
.setSubmittedRows(AsyncProcessTask.SubmittedRows.ALL)
.build();
AsyncRequestFuture ars = multiAp.submit(task);
ars.waitUntilDone();
if (ars.hasError()) {
throw ars.getErrors();
}
}
总结
HBase connection中的Region cache在HBase的get/put请求中都发挥着极为重要的作用,虽然在频繁切表的情况下可能会导致本地内存上涨,但是应该在合理的范围内清理Cache,不然可能会导致HBase吞吐下降的情况。