解决dolphinscheduler连接hive read time out的问题
在使用DolphinScheduler时,有时会遇到连接Hive时出现read time out的问题。本文将介绍如何解决这个问题。
问题描述
当使用DolphinScheduler连接Hive时,有时会遇到以下错误信息:
java.io.IOException: Could not retrieve the results of the query
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:232)
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:298)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:370)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:501)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:488)
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:314)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:509)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1373)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1358)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_ExecuteStatement(TCLIService.java:269)
at org.apache.hive.service.cli.thrift.TCLIService$Client.ExecuteStatement(TCLIService.java:256)
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:227)
... 15 more
问题原因
这个错误通常是由于Hive服务端处理请求时间过长,导致客户端与服务端之间的连接超时。Hive服务端的默认超时时间是60秒,如果查询时间超过这个时间,就会出现read time out的错误。
解决方案
解决这个问题的方法是增加Hive服务端的超时时间。具体的步骤如下:
-
打开Hive安装目录下的
hive-site.xml
文件。 -
在
hive-site.xml
文件中添加以下配置:
<property>
<name>hive.server2.long.polling.timeout</name>
<value>600</value>
</property>
这里将hive.server2.long.polling.timeout
的值设置为600,表示超时时间为600秒(10分钟)。根据实际需求,你可以根据需要调整这个值。
-
保存并关闭
hive-site.xml
文件。 -
重启Hive服务。
重启Hive服务后,连接Hive的超时时间就会被延长,从而解决了read time out的问题。
示例代码
下面是一个使用DolphinScheduler连接Hive的示例代码:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.security.UserGroupInformation;
import org.apache.hive.jdbc.HiveConnection;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
public class DolphinSchedulerHiveExample {
public static void main(String[] args) throws Exception {
// 使用Kerberos认证连接Hive
Configuration conf = new Configuration();
conf.setBoolean("hadoop.security.authentication", true);
UserGroupInformation.setConfiguration(conf);
User