目录

  • 背景
  • HBase写入流程
  • 总结put操作
  • HBase读取流程
  • get方法
  • gets方法
  • 总结


背景

在部署Flink流式作业读取Kafka消息,往HBase写数据的时候,因为时间的不同,往往需要写入不同的HBase table。随着时间的推移,会发现Flink的GC逐渐变高,导致Flink作业的吞吐下降。通过dump内存进行分析之后,发现内存中最大的部分是 org.apache.hadoop.hbase.client.AsyncRegionLocator 这一对象。因此,对于long running的流式作业应该在切表的时候,调用hbase connection的clearRegionLocationCache方法,清理客户端中的cache,防止task出现频繁GC的情况。下面对HBase读写过程进行分析,看一看哪里用到了RegionLocationCache。

HBase写入流程

  1. HBase客户端的put请求主要是调用HTable put方法中的getBufferedMutator().mutate(put)。mutate方法的具体实现为BufferedMutatorImpl类中的mutate方法。进入mutate函数可以看到它会把用户提交的此次put操作放入到列表writeAsyncBuffer中,当buffer中的数据超过规定值时,由后台进程进行提交。其中值得注意的是,对于一个put请求,HBase会通过validatePut方法校验cell的大小是否超出限制。
public void mutate(List<? extends Mutation> ms) throws InterruptedIOException, RetriesExhaustedWithDetailsException {
	this.checkClose();
    long toAddSize = 0L;
    int toAddCount = 0;

    for(Iterator var5 = ms.iterator(); var5.hasNext(); ++toAddCount) {
      Mutation m = (Mutation)var5.next();
      if (m instanceof Put) {
        ConnectionUtils.validatePut((Put)m, this.maxKeyValueSize);
      }

      toAddSize += m.heapSize();
    }

    if (this.currentWriteBufferSize.get() == 0L) {
      this.firstRecordInBufferTimestamp.set(System.currentTimeMillis());
    }

    this.currentWriteBufferSize.addAndGet(toAddSize);
    this.writeAsyncBuffer.addAll(ms);
    this.undealtMutationCount.addAndGet(toAddCount);
    this.doFlush(false);
}

接下来进一步深入查看doFlush方法。如果flushAll为True则发送所有数据到HBase并等待其返回值,否则只有到数据量超过Buffer Size(2M)之后进行发送。其中真正进行数据发送的是通过AsyncProcess ap对象调用submit方法,提交task。

private void doFlush(boolean flushAll) throws InterruptedIOException,
      RetriesExhaustedWithDetailsException {
	List<RetriesExhaustedWithDetailsException> errors = new ArrayList<>();
    while (true) {
      if (!flushAll && currentWriteBufferSize.get() <= writeBufferSize) {
        // There is the room to accept more mutations.
        break;
      }
      AsyncRequestFuture asf;
      try (QueueRowAccess access = createQueueRowAccess()) {
        if (access.isEmpty()) {
          // It means someone has gotten the ticker to run the flush.
          break;
        }
        asf = ap.submit(createTask(access));
      }
      // DON'T do the wait in the try-with-resources. Otherwise, the undealt mutations won't
      // be released.
      asf.waitUntilDone();
      if (asf.hasError()) {
        errors.add(asf.getErrors());
      }
 	}
 ......
  }

进一步深入查看AsyncProcess的submit方法,下面列举submit方法的部分关键代码。通过actionsByServer这一Map对象,将所有要put的数据根据不同的HRegionServer进行存储,同时将writeAsyncBuffer里的put封装为Action对象。然后对于writeAsyncBuffer中的每一个put实例调用connection的locateRegion方法,确定其要写到的远程Region的地址。最后通过submitMultiActions方法进行提交

private <CResult> AsyncRequestFuture submit(AsyncProcessTask<CResult> task,
    boolean atLeastOne) throws InterruptedIOException {
    TableName tableName = task.getTableName();
    RowAccess<? extends Row> rows = task.getRowAccess();
    Map<ServerName, MultiAction> actionsByServer = new HashMap<>();
    ......
    boolean firstIter = true;
    do {
      ......
      Iterator<? extends Row> it = rows.iterator();
      while (it.hasNext()) {
        Row r = it.next();
        HRegionLocation loc;
        try {
          if (r == null) {
            throw new IllegalArgumentException("#" + id + ", row cannot be null");
          }
          // Make sure we get 0-s replica.
          RegionLocations locs = connection.locateRegion(
              tableName, r.getRow(), true, true, RegionReplicaUtil.DEFAULT_REPLICA_ID);
          if (locs == null || locs.isEmpty() || locs.getDefaultRegionLocation() == null) {
            throw new IOException("#" + id + ", no location found, aborting submit for"
                + " tableName=" + tableName + " rowkey=" + Bytes.toStringBinary(r.getRow()));
          }
          loc = locs.getDefaultRegionLocation();
        } catch (IOException ex) {
          ......
        }
        ReturnCode code = checker.canTakeRow(loc, r);
        if (code == ReturnCode.END) {
          break;
        }
        if (code == ReturnCode.INCLUDE) {
          int priority = HConstants.NORMAL_QOS;
          if (r instanceof Mutation) {
            priority = ((Mutation) r).getPriority();
          }
          Action action = new Action(r, ++posInList, priority);
          setNonce(ng, r, action);
          retainedActions.add(action);
          // TODO: replica-get is not supported on this path
          byte[] regionName = loc.getRegionInfo().getRegionName();
          addAction(loc.getServerName(), regionName, action, actionsByServer, nonceGroup);
          it.remove();
        }
      }
      firstIter = false;
    } while (retainedActions.isEmpty() && atLeastOne && (locationErrors == null));

    if (retainedActions.isEmpty()) return NO_REQS_RESULT;

    return submitMultiActions(task, retainedActions, nonceGroup,
        locationErrors, locationErrorRows, actionsByServer);
  }

为了进一步说明Region Cache的作用,我们进一步查看HConnection的locateRegion方法。其具体的实现类为ConnectionImplementation,具体实现的方法为locateRegionInMeta。先判定本地cache中是否存在该Region location的信息,如果不存在的话则scan Meta表获取相应的Region(此时应该值得注意的是,如果频繁清理HBase client中的Region Cache会对HBase Meta产生不小的负载)。最后将Region信息加入本地cache中。

private RegionLocations locateRegionInMeta(TableName tableName, byte[] row, boolean useCache,
      boolean retry, int replicaId) throws IOException {
    // If we are supposed to be using the cache, look in the cache to see if we already have the
    // region.
    if (useCache) {
      RegionLocations locations = getCachedLocation(tableName, row);
      if (locations != null && locations.getRegionLocation(replicaId) != null) {
        return locations;
      }
    }
   	......
   	byte[] metaStartKey = RegionInfo.createRegionName(tableName, row, HConstants.NINES, false);
    byte[] metaStopKey =
      RegionInfo.createRegionName(tableName, HConstants.EMPTY_START_ROW, "", false);
    Scan s = new Scan().withStartRow(metaStartKey).withStopRow(metaStopKey, true)
      .addFamily(HConstants.CATALOG_FAMILY).setReversed(true).setCaching(5)
      .setReadType(ReadType.PREAD);
    ......
    // Instantiate the location
    cacheLocation(tableName, locations);

进入sendMultiAction()方法,看它是如何发送put请求的。每个任务都是通过HBase的RPC框架与服务器进行通信,并获取返回的结果。client端通过MultiServerCallable.call()方法调用res的rpc的multi()方法,来实现put提交请求。

void sendMultiAction(Map<ServerName, MultiAction> actionsByServer,
                               int numAttempt, List<Action> actionsForReplicaThread, boolean reuseThread) {
	for (Map.Entry<ServerName, MultiAction> e : actionsByServer.entrySet()) {
		Collection<? extends Runnable> runnables = getNewMultiActionRunnable(server, multiAction,
          numAttempt);
		pool.submit(runnable);
		......
	}
	......
}

final class SingleServerRequestRunnable implements Runnable {
	public void run() {
		AbstractResponse res = null;
      CancellableRegionServerCallable callable = currentCallable;
      try {
        // setup the callable based on the actions, if we don't have one already from the request
        if (callable == null) {
          callable = createCallable(server, tableName, multiAction);
        }
        RpcRetryingCaller<AbstractResponse> caller = asyncProcess.createCaller(callable,rpcTimeout);
        try {
          if (callsInProgress != null) {
            callsInProgress.add(callable);
          }
          res = caller.callWithoutRetries(callable, operationTimeout);
          ......
        }
	}
}

总结put操作

(1)把put操作添加到writeAsyncBuffer队列里面,符合条件(自动flush或者超过了阀值writeBufferSize)就通过AsyncProcess异步批量提交。
  (2)在提交之前,我们要根据每个rowkey找到它们归属的region server,这个定位的过程是通过HConnection的locateRegion方法获得的,然后再把这些rowkey按照HRegionLocation分组。在获得具体region位置的时候,会对最近使用的region server做缓存,如果缓存中保存了相应的region server信息,就直接使用这个region信息,连接这个region server,否则会对meta表进行一次rpc操作,获得region server信息,客户端的操作put、get、delete等操作每次都是封装在一个Action对象中进行提交操作的,都是一系列的的action一起提交,这就是MultiAction。
  (3)通过多线程,一个HRegionLocation构造MultiServerCallable,然后通过rpcCallerFactory. newCaller()执行调用,忽略掉失败重新提交和错误处理,客户端的提交操作到此结束。

HBase读取流程

HBase客户端的put请求主要是调用HTable get方法。其中HBase connection一次rpc请求中,可包含一个或多个get请求。二者实现的逻辑稍有不同。下面进行逐一分析。

get方法

如果是单个get请求发送一次RPC call的话,实际上调用的是HTable类中的get方法,具体代码如下所示。get方法直接调用RPC请求返回结果,代码较为简单。

private Result get(Get get, final boolean checkExistenceOnly) throws IOException {
    // if we are changing settings to the get, clone it.
    if (get.isCheckExistenceOnly() != checkExistenceOnly || get.getConsistency() == null) {
      get = ReflectionUtils.newInstance(get.getClass(), get);
      get.setCheckExistenceOnly(checkExistenceOnly);
      if (get.getConsistency() == null){
        get.setConsistency(DEFAULT_CONSISTENCY);
      }
    }

    if (get.getConsistency() == Consistency.STRONG) {
      final Get configuredGet = get;
      ClientServiceCallable<Result> callable = new ClientServiceCallable<Result>(this.connection, getName(),
          get.getRow(), this.rpcControllerFactory.newController(), get.getPriority()) {
        @Override
        protected Result rpcCall() throws Exception {
          ClientProtos.GetRequest request = RequestConverter.buildGetRequest(
              getLocation().getRegionInfo().getRegionName(), configuredGet);
          ClientProtos.GetResponse response = doGet(request);
          return response == null? null:
            ProtobufUtil.toResult(response.getResult(), getRpcControllerCellScanner());
        }
      };
      return rpcCallerFactory.<Result>newCaller(readRpcTimeoutMs).callWithRetries(callable,
          this.operationTimeoutMs);
    }

    // Call that takes into account the replica
    RpcRetryingCallerWithReadReplicas callable = new RpcRetryingCallerWithReadReplicas(
        rpcControllerFactory, tableName, this.connection, get, pool,
        connConfiguration.getRetriesNumber(), operationTimeoutMs, readRpcTimeoutMs,
        connConfiguration.getPrimaryCallTimeoutMicroSecond());
    return callable.call(operationTimeoutMs);
  }

此时读者可能会问,此时哪里调用了HBase的Region Cache?其中rpcCallerFactory类中callWithRetries方法的具体实现方法 RpcRetryingCallerImpl.callWithRetries方法中调用了RegionServerCallable.prepare方法中使用了Region Cache。即当本地HBase client构建RPC请求往远程Region Server进行交互的时候,当出现RegionInfo为空等异常的时候,客户端会进行重试,刷新本地client中的Region cache。其中主要调用的是connection.getRegionLocator方法,前文中已经进行过分析,此处就不再赘述。

public void prepare(final boolean reload) throws IOException {
    // check table state if this is a retry
    if (reload && tableName != null && !tableName.equals(TableName.META_TABLE_NAME)
        && getConnection().isTableDisabled(tableName)) {
      throw new TableNotEnabledException(tableName.getNameAsString() + " is disabled.");
    }
    try (RegionLocator regionLocator = connection.getRegionLocator(tableName)) {
      this.location = regionLocator.getRegionLocation(row);
    }
    if (this.location == null) {
      throw new IOException("Failed to find location, tableName=" + tableName +
          ", row=" + Bytes.toString(row) + ", reload=" + reload);
    }
    setStubByServiceName(this.location.getServerName());
  }

gets方法

HBase client中的gets方法指的是一次RPC请求中包含多个get,其具体调用的是HTable中的gets方法。如果发现list中只有一个get请求的时候,则调用get方法。如果get请求的个数大于1的时候,则具体调用batch方法。

public Result[] get(List<Get> gets) throws IOException {
    if (gets.size() == 1) {
      return new Result[]{get(gets.get(0))};
    }
    try {
      Object[] r1 = new Object[gets.size()];
      batch((List<? extends Row>)gets, r1, readRpcTimeoutMs);
      // Translate.
      Result [] results = new Result[r1.length];
      int i = 0;
      for (Object obj: r1) {
        // Batch ensures if there is a failure we get an exception instead
        results[i++] = (Result)obj;
      }
      return results;
    } catch (InterruptedException e) {
      throw (InterruptedIOException)new InterruptedIOException().initCause(e);
    }
  }

其中batch方法调用的是AsyncProcess ap对象的submit方法,进行提交task。具体实现已经在put方法中进行了描述,此时就不再进行赘述。

public void batch(final List<? extends Row> actions, final Object[] results, int rpcTimeout)
      throws InterruptedException, IOException {
    AsyncProcessTask task = AsyncProcessTask.newBuilder()
            .setPool(pool)
            .setTableName(tableName)
            .setRowAccess(actions)
            .setResults(results)
            .setRpcTimeout(rpcTimeout)
            .setOperationTimeout(operationTimeoutMs)
            .setSubmittedRows(AsyncProcessTask.SubmittedRows.ALL)
            .build();
    AsyncRequestFuture ars = multiAp.submit(task);
    ars.waitUntilDone();
    if (ars.hasError()) {
      throw ars.getErrors();
    }
  }

总结

HBase connection中的Region cache在HBase的get/put请求中都发挥着极为重要的作用,虽然在频繁切表的情况下可能会导致本地内存上涨,但是应该在合理的范围内清理Cache,不然可能会导致HBase吞吐下降的情况。