问题描述

公司内部查询服务将hive从1.1升级到2.3.3版本之后,通过HiveStatement.getQueryLog无法拉取到执行日志了,原来是可以拉取到类似控制台执行hive时候那样的日志,包括queryid, mapreduce个数等等信息。

问题排查

  • 首先将测试环境的hiveserver2开启debug模式,远程debug端口默认是8000,这里我改成了8101,因为8000被其他服务占用了
nohup ${HIVE_HOME}/bin/hive --service hiveserver2 --debug:port=8101 2>&1 >> /data/logs/hive/hive.log &
  • 然后在idea中启动远程调试
  • hive3不支持mr hive sql不支持decode查询_hive

一顿debug猛如虎,最终我发现所有的jdbc操作的执行入口在org.apache.hive.service.cli.operation.SQLOperation.BackgroundWork#run方法:

@Override
    public void run() {
      PrivilegedExceptionAction<Object> doAsAction = new PrivilegedExceptionAction<Object>() {
        @Override
        public Object run() throws HiveSQLException {
          Hive.set(parentHive);
          // TODO: can this result in cross-thread reuse of session state?
          SessionState.setCurrentSessionState(parentSessionState);
          PerfLogger.setPerfLogger(parentPerfLogger);
          // Set current OperationLog in this async thread for keeping on saving query log.
          registerCurrentOperationLog();
          registerLoggingContext();
          try {
            if (asyncPrepare) {
              prepare(queryState);
            }
            runQuery();
          } catch (HiveSQLException e) {
            // TODO: why do we invent our own error path op top of the one from Future.get?
            setOperationException(e);
            LOG.error("Error running hive query: ", e);
          } finally {
            unregisterLoggingContext();
            unregisterOperationLog();
          }
          return null;
        }
      };

      try {
        currentUGI.doAs(doAsAction);
      } catch (Exception e) {
        setOperationException(new HiveSQLException(e));
        LOG.error("Error running hive query as user : " + currentUGI.getShortUserName(), e);
      } finally {
        /**
         * We'll cache the ThreadLocal RawStore object for this background thread for an orderly cleanup
         * when this thread is garbage collected later.
         * @see org.apache.hive.service.server.ThreadWithGarbageCleanup#finalize()
         */
        if (ThreadWithGarbageCleanup.currentThread() instanceof ThreadWithGarbageCleanup) {
          ThreadWithGarbageCleanup currentThread =
              (ThreadWithGarbageCleanup) ThreadWithGarbageCleanup.currentThread();
          currentThread.cacheThreadLocalRawStore();
        }
      }
    }
  }

其中registerCurrentOperationLog();这句往当前线程中设置了一个OperationLog对象:

private void registerCurrentOperationLog() {
    if (isOperationLogEnabled) {
      if (operationLog == null) {
        LOG.warn("Failed to get current OperationLog object of Operation: " +
            getHandle().getHandleIdentifier());
        isOperationLogEnabled = false;
        return;
      }
      OperationLog.setCurrentOperationLog(operationLog);
    }
  }

SQL的执行日志主要就是通过这个对象来读写的:

/**
   * Write operation execution logs into log file
   * @param operationLogMessage one line of log emitted from log4j
   */
  public void writeOperationLog(String operationLogMessage) {
    logFile.write(operationLogMessage);
  }

  /**
   * Write operation execution logs into log file
   * @param operationLogMessage one line of log emitted from log4j
   */
  public void writeOperationLog(LoggingLevel level, String operationLogMessage) {
    if (opLoggingLevel.compareTo(level) < 0) return;
    logFile.write(operationLogMessage);
  }


  /**
   * Read operation execution logs from log file
   * @param isFetchFirst true if the Enum FetchOrientation value is Fetch_First
   * @param maxRows the max number of fetched lines from log
   * @return
   * @throws java.sql.SQLException
   */
  public List<String> readOperationLog(boolean isFetchFirst, long maxRows)
      throws SQLException{
    return logFile.read(isFetchFirst, maxRows);
  }

jdbc客户端的getQueryLog就是调用readOperationLog这个方法来读取日志,hiveserver2上通过writeOperationLog来写入日志,sql执行完之后,日志文件会被删除。

通过查找方法使用关系,我发现日志是从org.apache.hive.service.cli.operation.LogDivertAppender这个Appender中调用writeOperationLog来写入日志的。由org.apache.hive.service.cli.operation.LogDivertAppender.NameFilter#filter方法来决定是否要输出到日志文件:

@Override
    public Result filter(LogEvent event) {
      OperationLog log = operationManager.getOperationLogByThread();
      boolean excludeMatches = (loggingMode == OperationLog.LoggingLevel.VERBOSE);

      if (log == null) {
        return Result.DENY;
      }

      OperationLog.LoggingLevel currentLoggingMode = log.getOpLoggingLevel();
      // If logging is disabled, deny everything.
      if (currentLoggingMode == OperationLog.LoggingLevel.NONE) {
        return Result.DENY;
      }
      // Look at the current session's setting
      // and set the pattern and excludeMatches accordingly.
      if (currentLoggingMode != loggingMode) {
        loggingMode = currentLoggingMode;
        excludeMatches = (loggingMode == OperationLog.LoggingLevel.VERBOSE);
        setCurrentNamePattern(loggingMode);
      }

      boolean isMatch = namePattern.matcher(event.getLoggerName()).matches();

      if (excludeMatches == isMatch) {
        // Deny if this is black-list filter (excludeMatches = true) and it
        // matched or if this is whitelist filter and it didn't match
        return Result.DENY;
      }
      return Result.NEUTRAL;
    }
  }

这个方法的第一行就是从当前线程中获取OperationLog对象实例,然后判断是否要进行输出

OperationLog log = operationManager.getOperationLogByThread();
      boolean excludeMatches = (loggingMode == OperationLog.LoggingLevel.VERBOSE);

      if (log == null) {
        return Result.DENY;
      }

重点就在这里了,我在这里打了断点,发现每次取出来的OperationLog log都为null,然后下面的if条件满足,输出日志被DENY拒绝处理了。
这个地方很奇怪,为啥前面代码明明设置了TreadLocal的Operation对象实例,怎么取不到?
于是重新debug,发现这个方法是在另外一个线程中执行的并不是在SQL操作执行线程中执行。于是恍然大悟,是不是开启了日志异步输出,然后导致LogEvent在其他异步线程中被处理了。
继续重头debug,端点打在了HiveServer2的main函数入口上org.apache.hive.service.server.HiveServer2#main:

public static void main(String[] args) {
    HiveConf.setLoadHiveServer2Config(true);
    try {
      ServerOptionsProcessor oproc = new ServerOptionsProcessor("hiveserver2");
      ServerOptionsProcessorResponse oprocResponse = oproc.parse(args);

      // NOTE: It is critical to do this here so that log4j is reinitialized
      // before any of the other core hive classes are loaded
      String initLog4jMessage = LogUtils.initHiveLog4j();
      LOG.debug(initLog4jMessage);
      HiveStringUtils.startupShutdownMessage(HiveServer2.class, args, LOG);

      // Logger debug message from "oproc" after log4j initialize properly
      LOG.debug(oproc.getDebugMessage().toString());

      // Call the executor which will execute the appropriate command based on the parsed options
      oprocResponse.getServerOptionsExecutor().execute();
    } catch (LogInitializationException e) {
      LOG.error("Error initializing log: " + e.getMessage(), e);
      System.exit(-1);
    }
  }

其中

String initLog4jMessage = LogUtils.initHiveLog4j();

这句初始化了log4j,最终是在org.apache.hadoop.hive.common.LogUtils#initHiveLog4jDefault这个方法中:

private static String initHiveLog4jDefault(
    HiveConf conf, String logMessage, ConfVars confVarName)
    throws LogInitializationException {
    URL hive_l4j = null;
    switch (confVarName) {
      case HIVE_EXEC_LOG4J_FILE:
        hive_l4j = LogUtils.class.getClassLoader().getResource(HIVE_EXEC_L4J);
        if (hive_l4j == null) {
          hive_l4j = LogUtils.class.getClassLoader().getResource(HIVE_L4J);
        }
        System.setProperty(HiveConf.ConfVars.HIVEQUERYID.toString(),
          HiveConf.getVar(conf, HiveConf.ConfVars.HIVEQUERYID));
        break;
      case HIVE_LOG4J_FILE:
        hive_l4j = LogUtils.class.getClassLoader().getResource(HIVE_L4J);
        break;
      default:
        break;
    }
    if (hive_l4j != null) {
      //是否开启异步输出日志
      final boolean async = checkAndSetAsyncLogging(conf);
      Configurator.initialize(null, hive_l4j.toString());
      logConfigLocation(conf);
      return (logMessage + "\n" + "Logging initialized using configuration in " + hive_l4j +
          " Async: " + async);
    } else {
      throw new LogInitializationException(
        logMessage + "Unable to initialize logging using "
        + LogUtils.HIVE_L4J + ", not found on CLASSPATH!");
    }
  }

//是否开启异步输出
final boolean async = checkAndSetAsyncLogging(conf);

其中有一行检查了是否开启异步输出日志,对应的配置是hive.async.log.enabled,默认值是true,即默认开启了异步输出日志:

HIVE_ASYNC_LOG_ENABLED("hive.async.log.enabled", true,
 “Whether to enable Log4j2’s asynchronous logging. Asynchronous logging can give\n” +
 " significant performance improvement as logging will be handled in separate thread\n" +
 " that uses LMAX disruptor queue for buffering log messages.\n" +
 " Refer https://logging.apache.org/log4j/2.x/manual/async.html for benefits and\n" +
 " drawbacks."),

于是在hive-site.xml中增加配置,将异步输出关闭:

<property>
  <name>hive.async.log.enabled</name>
  <value>false</value>
</property>

最后,在beeline上测试一下:

beeline
!connect jdbc:hive2://xxx:10000/default
select ds,event,count(*) as cnt from bdl_wutiao_event where ds='2018-11-18' group by ds,event;

可以正确输出执行日志了:

hive3不支持mr hive sql不支持decode查询_hive3不支持mr_02

另外查看github上的源码发现,该配置是在2.1.0版本以后引入的,所以如果你使用的hive版本小于这个版本的话,就不用配置这个属性了。