问题描述
公司内部查询服务将hive从1.1升级到2.3.3版本之后,通过HiveStatement.getQueryLog无法拉取到执行日志了,原来是可以拉取到类似控制台执行hive时候那样的日志,包括queryid, mapreduce个数等等信息。
问题排查
- 首先将测试环境的hiveserver2开启debug模式,远程debug端口默认是
8000
,这里我改成了8101
,因为8000
被其他服务占用了
nohup ${HIVE_HOME}/bin/hive --service hiveserver2 --debug:port=8101 2>&1 >> /data/logs/hive/hive.log &
- 然后在idea中启动远程调试
一顿debug猛如虎,最终我发现所有的jdbc操作的执行入口在org.apache.hive.service.cli.operation.SQLOperation.BackgroundWork#run
方法:
@Override
public void run() {
PrivilegedExceptionAction<Object> doAsAction = new PrivilegedExceptionAction<Object>() {
@Override
public Object run() throws HiveSQLException {
Hive.set(parentHive);
// TODO: can this result in cross-thread reuse of session state?
SessionState.setCurrentSessionState(parentSessionState);
PerfLogger.setPerfLogger(parentPerfLogger);
// Set current OperationLog in this async thread for keeping on saving query log.
registerCurrentOperationLog();
registerLoggingContext();
try {
if (asyncPrepare) {
prepare(queryState);
}
runQuery();
} catch (HiveSQLException e) {
// TODO: why do we invent our own error path op top of the one from Future.get?
setOperationException(e);
LOG.error("Error running hive query: ", e);
} finally {
unregisterLoggingContext();
unregisterOperationLog();
}
return null;
}
};
try {
currentUGI.doAs(doAsAction);
} catch (Exception e) {
setOperationException(new HiveSQLException(e));
LOG.error("Error running hive query as user : " + currentUGI.getShortUserName(), e);
} finally {
/**
* We'll cache the ThreadLocal RawStore object for this background thread for an orderly cleanup
* when this thread is garbage collected later.
* @see org.apache.hive.service.server.ThreadWithGarbageCleanup#finalize()
*/
if (ThreadWithGarbageCleanup.currentThread() instanceof ThreadWithGarbageCleanup) {
ThreadWithGarbageCleanup currentThread =
(ThreadWithGarbageCleanup) ThreadWithGarbageCleanup.currentThread();
currentThread.cacheThreadLocalRawStore();
}
}
}
}
其中registerCurrentOperationLog();
这句往当前线程中设置了一个OperationLog对象:
private void registerCurrentOperationLog() {
if (isOperationLogEnabled) {
if (operationLog == null) {
LOG.warn("Failed to get current OperationLog object of Operation: " +
getHandle().getHandleIdentifier());
isOperationLogEnabled = false;
return;
}
OperationLog.setCurrentOperationLog(operationLog);
}
}
SQL的执行日志主要就是通过这个对象来读写的:
/**
* Write operation execution logs into log file
* @param operationLogMessage one line of log emitted from log4j
*/
public void writeOperationLog(String operationLogMessage) {
logFile.write(operationLogMessage);
}
/**
* Write operation execution logs into log file
* @param operationLogMessage one line of log emitted from log4j
*/
public void writeOperationLog(LoggingLevel level, String operationLogMessage) {
if (opLoggingLevel.compareTo(level) < 0) return;
logFile.write(operationLogMessage);
}
/**
* Read operation execution logs from log file
* @param isFetchFirst true if the Enum FetchOrientation value is Fetch_First
* @param maxRows the max number of fetched lines from log
* @return
* @throws java.sql.SQLException
*/
public List<String> readOperationLog(boolean isFetchFirst, long maxRows)
throws SQLException{
return logFile.read(isFetchFirst, maxRows);
}
jdbc客户端的getQueryLog
就是调用readOperationLog
这个方法来读取日志,hiveserver2上通过writeOperationLog
来写入日志,sql执行完之后,日志文件会被删除。
通过查找方法使用关系,我发现日志是从org.apache.hive.service.cli.operation.LogDivertAppender
这个Appender中调用writeOperationLog
来写入日志的。由org.apache.hive.service.cli.operation.LogDivertAppender.NameFilter#filter
方法来决定是否要输出到日志文件:
@Override
public Result filter(LogEvent event) {
OperationLog log = operationManager.getOperationLogByThread();
boolean excludeMatches = (loggingMode == OperationLog.LoggingLevel.VERBOSE);
if (log == null) {
return Result.DENY;
}
OperationLog.LoggingLevel currentLoggingMode = log.getOpLoggingLevel();
// If logging is disabled, deny everything.
if (currentLoggingMode == OperationLog.LoggingLevel.NONE) {
return Result.DENY;
}
// Look at the current session's setting
// and set the pattern and excludeMatches accordingly.
if (currentLoggingMode != loggingMode) {
loggingMode = currentLoggingMode;
excludeMatches = (loggingMode == OperationLog.LoggingLevel.VERBOSE);
setCurrentNamePattern(loggingMode);
}
boolean isMatch = namePattern.matcher(event.getLoggerName()).matches();
if (excludeMatches == isMatch) {
// Deny if this is black-list filter (excludeMatches = true) and it
// matched or if this is whitelist filter and it didn't match
return Result.DENY;
}
return Result.NEUTRAL;
}
}
这个方法的第一行就是从当前线程中获取OperationLog
对象实例,然后判断是否要进行输出
OperationLog log = operationManager.getOperationLogByThread();
boolean excludeMatches = (loggingMode == OperationLog.LoggingLevel.VERBOSE);
if (log == null) {
return Result.DENY;
}
重点就在这里了,我在这里打了断点,发现每次取出来的OperationLog log
都为null
,然后下面的if条件满足,输出日志被DENY
拒绝处理了。
这个地方很奇怪,为啥前面代码明明设置了TreadLocal的Operation对象实例,怎么取不到?
于是重新debug,发现这个方法是在另外一个线程中执行的并不是在SQL操作执行线程中执行。于是恍然大悟,是不是开启了日志异步输出,然后导致LogEvent
在其他异步线程中被处理了。
继续重头debug,端点打在了HiveServer2的main函数入口上org.apache.hive.service.server.HiveServer2#main
:
public static void main(String[] args) {
HiveConf.setLoadHiveServer2Config(true);
try {
ServerOptionsProcessor oproc = new ServerOptionsProcessor("hiveserver2");
ServerOptionsProcessorResponse oprocResponse = oproc.parse(args);
// NOTE: It is critical to do this here so that log4j is reinitialized
// before any of the other core hive classes are loaded
String initLog4jMessage = LogUtils.initHiveLog4j();
LOG.debug(initLog4jMessage);
HiveStringUtils.startupShutdownMessage(HiveServer2.class, args, LOG);
// Logger debug message from "oproc" after log4j initialize properly
LOG.debug(oproc.getDebugMessage().toString());
// Call the executor which will execute the appropriate command based on the parsed options
oprocResponse.getServerOptionsExecutor().execute();
} catch (LogInitializationException e) {
LOG.error("Error initializing log: " + e.getMessage(), e);
System.exit(-1);
}
}
其中
String initLog4jMessage = LogUtils.initHiveLog4j();
这句初始化了log4j,最终是在org.apache.hadoop.hive.common.LogUtils#initHiveLog4jDefault
这个方法中:
private static String initHiveLog4jDefault(
HiveConf conf, String logMessage, ConfVars confVarName)
throws LogInitializationException {
URL hive_l4j = null;
switch (confVarName) {
case HIVE_EXEC_LOG4J_FILE:
hive_l4j = LogUtils.class.getClassLoader().getResource(HIVE_EXEC_L4J);
if (hive_l4j == null) {
hive_l4j = LogUtils.class.getClassLoader().getResource(HIVE_L4J);
}
System.setProperty(HiveConf.ConfVars.HIVEQUERYID.toString(),
HiveConf.getVar(conf, HiveConf.ConfVars.HIVEQUERYID));
break;
case HIVE_LOG4J_FILE:
hive_l4j = LogUtils.class.getClassLoader().getResource(HIVE_L4J);
break;
default:
break;
}
if (hive_l4j != null) {
//是否开启异步输出日志
final boolean async = checkAndSetAsyncLogging(conf);
Configurator.initialize(null, hive_l4j.toString());
logConfigLocation(conf);
return (logMessage + "\n" + "Logging initialized using configuration in " + hive_l4j +
" Async: " + async);
} else {
throw new LogInitializationException(
logMessage + "Unable to initialize logging using "
+ LogUtils.HIVE_L4J + ", not found on CLASSPATH!");
}
}
//是否开启异步输出
final boolean async = checkAndSetAsyncLogging(conf);
其中有一行检查了是否开启异步输出日志,对应的配置是hive.async.log.enabled
,默认值是true,即默认开启了异步输出日志:
HIVE_ASYNC_LOG_ENABLED("hive.async.log.enabled", true,
“Whether to enable Log4j2’s asynchronous logging. Asynchronous logging can give\n” +
" significant performance improvement as logging will be handled in separate thread\n" +
" that uses LMAX disruptor queue for buffering log messages.\n" +
" Refer https://logging.apache.org/log4j/2.x/manual/async.html for benefits and\n" +
" drawbacks."),
于是在hive-site.xml中增加配置,将异步输出关闭:
<property>
<name>hive.async.log.enabled</name>
<value>false</value>
</property>
最后,在beeline上测试一下:
beeline
!connect jdbc:hive2://xxx:10000/default
select ds,event,count(*) as cnt from bdl_wutiao_event where ds='2018-11-18' group by ds,event;
可以正确输出执行日志了:
另外查看github上的源码发现,该配置是在2.1.0
版本以后引入的,所以如果你使用的hive版本小于这个版本的话,就不用配置这个属性了。