入口:
bin/hive脚本中,环境检查后执行ext中的cli.sh,进入主类:CliDriver.main。
CliDriver.main:
进入cli.processLine,处理分号";"分割为一条一条语句,再进processCmd。
processCmd:
处理quit/exit,再处理source,处理!,处理list;else建立CommandProcessor(实现有Driver和各种Processor),set/dfs/add/delete命令有单独的Processor,剩下的走Driver。
如果是Driver类型的Processor:
把cmd发到这个driver的run,再进到compile,在compile中,用一个parseDriver去生成ASTNode(生成使用了antlr,主要过程:经过文法分析器切割,进解析器,出来一个TREE),这里有细节的compile的过程说明 http://fromheartgo.wordpress.com/2010/04/02/hive%E7%9A%84compile%E8%BF%87%E7%A8%8B%EF%BC%881%EF%BC%89/ ;
根据得到的ASTNode,开始语义分析,把结果设置到一个QueryPlan对象中,初始化一些task放在QueryPlan中;
run里的test only代码读了test.serialize.qplan的设置,test状态会把这些查询记录写到文件里;权限检查。
退出complie,在Driver的run中分解执行MR后,退出来到了processCmd:
如果装填一切正常,通过getResults取到MR运行结果。
全过程如下:
CliDriver.main > processLine > processCmd >> Driver.run(cmd) > compile >> BaseSemanticAnalyzer >> xxxSemanticAnalyzer(常规select走SemanticAnalyzer) > analyze(sem.analyze) >> SemanticAnalyzer的analyzeInternal方法 >> new Optimizer.optimize(进行列剪裁等优化后生成Task) > genMapRedTasks >> 返回到Driver.run(cmd) >>ret = execute() >> launchTask >> TaskRunner.run > Task.executeTask > ExecDriver.execute > 执行MR(submitJob) >> getResults.
即:
HiveCLI [Java Application]
org.apache.hadoop.hive.cli.CliDriver at localhost:38723
Thread [main] (Suspended)
Driver.execute() line: 1344
Driver.runExecute() line: 1219
Driver.run(String, Map<String,Object[]>) line: 1177
Driver.run(String) line: 1159
CliDriver.processLocalCmd(String, CommandProcessor, CliSessionState) line: 258
CliDriver.processCmd(String) line: 215
CliDriver.processLine(String, boolean) line: 411
CliDriver.run(String[]) line: 679
CliDriver.main(String[]) line: 562
/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/bin/java (2012-7-6 上午11:36:07)
主函数是CliDriver类的main函数,然后走run函数,再做了一些初始化和检测后,再调用processLine,再调用processCmd。processLocalCmd则调用了Driver类的run函数和runExcute函数。
直到:
while ((line = reader.readLine(curPrompt + "> ")) != null) {
表示重复请求读入 SQL>
1,cli/src/java CliDriver.main是主函数。
1. public static void main(String[] args) throws Exception {
2. int ret = run(args);
3. System.exit(ret);
4. }
2,进入run函数
1. public static int run(String[] args) throws Exception {
2.
3. new OptionsProcessor();
1. //(1) 解析(Parse)args,放入cmdLine,处理 –hiveconf var=val 用于增加或者覆盖hive/hadoop配置,设置到System的属性中。
2. if (!oproc.process_stage1(args)) {
3. return 1;
4. }
5. //(2) 配置log4j,加载hive-log4j.properties里的配置信息。
1. // NOTE: It is critical to do this here so that log4j is reinitialized
2. // before any of the other core hive classes are loaded
3. boolean logInitFailed = false;
4. String logInitDetailMessage;
5. try {
6. logInitDetailMessage = LogUtils.initHiveLog4j();
7. } catch (LogInitializationException e) {
8. true;
9. logInitDetailMessage = e.getMessage();
10. }
11. //(3) 创建一个CliSessionState(SessionState)
12. CliSessionState ss = new CliSessionState(new HiveConf(SessionState.class));
13. ss.in = System.in;
14. try {
15. new PrintStream(System.out, true, "UTF-8");
16. new PrintStream(System.err, true, "UTF-8");
17. } catch (UnsupportedEncodingException e) {
18. return 3;
19. }
1. //(4) 处理-S, -e, -f, -h,-i等信息,保存在SessionState中。如果是-h,打印提示信息,并退出。
2. if (!oproc.process_stage2(ss)) {
3. return 2;
4. }
5. //(5) 如果不是-S,就是说不是静默状态,就输出一些提示信息,表示初始化好了。
6. if (!ss.getIsSilent()) {
7. if (logInitFailed) {
8. System.err.println(logInitDetailMessage);
9. else {
1. //(5)输出一些信息:12/07/05 16:52:34 INFO SessionState:
1. SessionState.getConsole().printInfo(logInitDetailMessage);
2. }
3. }
4. //(6)创建一个HiveConf,通过命令行配置所有属性。
5. // set all properties specified via command line
6. HiveConf conf = ss.getConf();
7. for (Map.Entry<Object, Object> item : ss.cmdProperties.entrySet()) {
8. conf.set((String) item.getKey(), (String) item.getValue());
9. }
10. //(7)启动CliSessionState ss。
11. SessionState.start(ss);
12.
13. // (8)连接到 Hive Server
14. if (ss.getHost() != null) {
15. ss.connect();
16. if (ss.isRemoteMode()) {
17. "[" + ss.host + ':' + ss.port + "] " + prompt;
18. char[] spaces = new char[prompt.length()];
19. ' ');
20. new String(spaces);
21. }
22. }
1. //(9) ShimLoader,load HadoopShims
2. // CLI remote mode is a thin client: only load auxJars in local mode
3. if (!ss.isRemoteMode() && !ShimLoader.getHadoopShims().usesJobShell()) {
4. // hadoop-20 and above - we need to augment classpath using hiveconf
5. // components
6. // see also: code in ExecDriver.java
7. ClassLoader loader = conf.getClassLoader();
1. //(9)设置hiveJar= hive-exec-0.6.0.jar ,初始化加载hive-default.xml、 hive-site.xml。
2. String auxJars = HiveConf.getVar(conf, HiveConf.ConfVars.HIVEAUXJARS);
3. if (StringUtils.isNotBlank(auxJars)) {
4. ","));
5. }
6. conf.setClassLoader(loader);
7. Thread.currentThread().setContextClassLoader(loader);
8. }
9. //(10) 创建CliDriver.
10. CliDriver cli = new CliDrive();
11. cli.setHiveVariables(oproc.getHiveVariables());
12. //(10)在接受hivesql命令前,执行一些初始化命令,这些命令存在文件中,文件可以通过-i选项设置,如果没有设置就去查找是否有$HIVE_HOME/bin/.hiverc和System.getProperty("user.home")/.hiverc两个文件,如果有就执行这两个文件中的命令。
13. // Execute -i init files (always in silent mode)
14. cli.processInitFiles(ss);
15. //(10) 如果是–e,执行命令并退出,如果是-f,执行文件中的命令并退出。
16. if (ss.execString != null) {
17. return cli.processLine(ss.execString);
18. }
19.
20. try {
21. if (ss.fileName != null) {
22. return cli.processFile(ss.fileName);
23. }
24. } catch (FileNotFoundException e) {
25. "Could not open input file for reading. (" + e.getMessage() + ")");
26. return 3;
27. }
28. //(11)创建ConsoleReader,读取用户输入,遇到“;”为一个完整的命令,执行该命令(CliDriver.processLine ),接着读取处理用户的输入。用户输入的命令记录在user.home/.hivehistory文件中。
29. ConsoleReader reader = new ConsoleReader();
30. reader.setBellEnabled(false);
31. // reader.setDebug(new PrintWriter(new FileWriter("writer.debug", true)));
32. reader.addCompletor(getCommandCompletor());
33.
34. String line;
35. final String HISTORYFILE = ".hivehistory";
36. String historyFile = System.getProperty("user.home") + File.separator + HISTORYFILE;
37. reader.setHistory(new History(new File(historyFile)));
38. int ret = 0;
39.
40. String prefix = "";
41. String curDB = getFormattedDb(conf, ss);
42. String curPrompt = prompt + curDB;
43. String dbSpaces = spacesForString(curDB);
44.
45. while ((line = reader.readLine(curPrompt + "> ")) != null) {
46. if (!prefix.equals("")) {
47. '\n';
48. }
49. if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
50. line = prefix + line;
51. true);
52. "";
53. curDB = getFormattedDb(conf, ss);
54. curPrompt = prompt + curDB;
55. dbSpaces = dbSpaces.length() == curDB.length() ? dbSpaces : spacesForString(curDB);
56. else {
57. prefix = prefix + line;
58. curPrompt = prompt2 + dbSpaces;
59. continue;
60. }
61. }
62.
63. ss.close();
64.
65. return ret;
3,主要是调用了 processLine。
ProcessLine又调用了 processCmd。
CliDriver.processLine 去掉命令末尾的;,
1. public int processLine(String line, boolean allowInterupting) {
2. null;
3. null;
4. //(1)整理允许中断 ctrl+C
5. if (allowInterupting) {
6. // Remember all threads that were running at the time we started line processing.
7. // Hook up the custom Ctrl+C handler while processing this line
8. new Signal("INT");
9. new SignalHandler() {
10. private final Thread cliThread = Thread.currentThread();
11. private boolean interruptRequested;
12.
13. @Override
14. public void handle(Signal signal) {
15. boolean initialRequest = !interruptRequested;
16. true;
17.
18. // Kill the VM on second ctrl+c
19. if (!initialRequest) {
20. "Exiting the JVM");
21. 127);
22. }
23.
24. // Interrupt the CLI thread to stop the current statement and return
25. // to prompt
26. "Interrupting... Be patient, this might take some time.");
27. "Press Ctrl+C again to kill JVM");
28.
29. // First, kill any running MR jobs
30. HadoopJobExecHelper.killRunningJobs();
31. HiveInterruptUtils.interrupt();
32. this.cliThread.interrupt();
33. }
34. });
35. }
36.
37. try {
38. int lastRet = 0, ret = 0;
39.
40. "";
1. //(2)循环处理每一个以分号结尾的语句。
2. for (String oneCmd : line.split(";")) {
3.
4. if (StringUtils.endsWith(oneCmd, "\\")) {
5. ";";
6. continue;
7. else {
8. command += oneCmd;
9. }
10. if (StringUtils.isBlank(command)) {
11. continue;
12. }
13. //(3)执行处理命令
14. ret = processCmd(command);
15. //(4)清除query State的状态。wipe cli query state
16. SessionState ss = SessionState.get();
17. null);
18. "";
19. lastRet = ret;
20. boolean ignoreErrors = HiveConf.getBoolVar(conf, HiveConf.ConfVars.CLIIGNOREERRORS);
21. if (ret != 0 && !ignoreErrors) {
22. CommandProcessorFactory.clean((HiveConf) conf);
23. return ret;
24. }
25. }
26. CommandProcessorFactory.clean((HiveConf) conf);
27. return lastRet;
28. finally {
29. // Once we are done processing the line, restore the old handler
30. if (oldSignal != null && interupSignal != null) {
31. Signal.handle(interupSignal, oldSignal);
32. }
33. }
34. }
4,processCmd
CliDriver.processCmd
Split命令,分析第一个单词:
(1)如果是quit或者exit,退出。
(2)source,执行文件中的HiveQL
(3)!,执行命令,如!ls,列出当前目录的文件信息。
(4)list,列出jar/file/archive。
(5)如果是其他,则生成调用相应的CommandProcessor处理。
1. public int processCmd(String cmd) {
2. CliSessionState ss = (CliSessionState) SessionState.get();
3. String cmd_trimmed = cmd.trim();
4. String[] tokens = tokenizeCmd(cmd_trimmed);
5. int ret = 0;
6. //(1)如果是quit或者exit,退出。
7. if (cmd_trimmed.toLowerCase().equals("quit") || cmd_trimmed.toLowerCase().equals("exit")) {
8.
9. // if we have come this far - either the previous commands
10. // are all successful or this is command line. in either case
11. // this counts as a successful run
12. ss.close();
13. 0);
14. //(2)source,执行文件中的HiveQL
15. else if (tokens[0].equalsIgnoreCase("source")) {
16. 0].length());
17.
18. new File(cmd_1);
19. if (! sourceFile.isFile()){
20. "File: "+ cmd_1 + " is not a file.");
21. 1;
22. else {
23. try {
24. this.processFile(cmd_1);
25. catch (IOException e) {
26. "Failed processing file "+ cmd_1 +" "+ e.getLocalizedMessage(),
27. org.apache.hadoop.util.StringUtils.stringifyException(e));
28. 1;
29. }
30. //(3)!,执行命令,如!ls,列出当前目录的文件信息。
31. else if (cmd_trimmed.startsWith("!")) {
32.
33. 1);
34. new VariableSubstitution().substitute(ss.getConf(), shell_cmd);
35.
36. // shell_cmd = "/bin/bash -c \'" + shell_cmd + "\'";
37. try {
38. Process executor = Runtime.getRuntime().exec(shell_cmd);
39. new StreamPrinter(executor.getInputStream(), null, ss.out);
40. new StreamPrinter(executor.getErrorStream(), null, ss.err);
41.
42. outPrinter.start();
43. errPrinter.start();
44.
45. ret = executor.waitFor();
46. if (ret != 0) {
47. "Command failed with exit code = " + ret);
48. }
49. catch (Exception e) {
50. "Exception raised from Shell command " + e.getLocalizedMessage(),
51. org.apache.hadoop.util.StringUtils.stringifyException(e));
52. 1;
53. }
54. //(4)list,列出jar/file/archive。
55. else if (tokens[0].toLowerCase().equals("list")) {
56.
57. SessionState.ResourceType t;
58. if (tokens.length < 2 || (t = SessionState.find_resource_type(tokens[1])) == null) {
59. "Usage: list ["
60. "|") + "] [<value> [<value>]*]");
61. 1;
62. else {
63. null;
64. if (tokens.length >= 3) {
65. 2, tokens, 0, tokens.length - 2);
66. filter = Arrays.asList(tokens);
67. }
68. Set<String> s = ss.list_resource(t, filter);
69. if (s != null && !s.isEmpty()) {
70. "\n"));
71. }
72. //(5)如果是其他,则生成调用相应的CommandProcessor处理。//如果是远端
73. else if (ss.isRemoteMode()) { // remote mode -- connecting to remote hive server
74. HiveClient client = ss.getClient();
75. PrintStream out = ss.out;
76. PrintStream err = ss.err;
77.
78. try {
79. client.execute(cmd_trimmed);
80. List<String> results;
81. do {
82. results = client.fetchN(LINES_TO_FETCH);
83. for (String line : results) {
84. out.println(line);
85. }
86. while (results.size() == LINES_TO_FETCH);
87. catch (HiveServerException e) {
88. ret = e.getErrorCode();
89. if (ret != 0) { // OK if ret == 0 -- reached the EOF
90. String errMsg = e.getMessage();
91. if (errMsg == null) {
92. errMsg = e.toString();
93. }
94. ret = e.getErrorCode();
95. "[Hive Error]: " + errMsg);
96. }
97. catch (TException e) {
98. String errMsg = e.getMessage();
99. if (errMsg == null) {
100. errMsg = e.toString();
101. }
102. 10002;
103. "[Thrift Error]: " + errMsg);
104. finally {
105. try {
106. client.clean();
107. catch (TException e) {
108. String errMsg = e.getMessage();
109. if (errMsg == null) {
110. errMsg = e.toString();
111. }
112. "[Thrift Error]: Hive server is not cleaned due to thrift exception: "
113. + errMsg);
114. }
115. //如果是本地
116. else { // local mode
117. 0], (HiveConf) conf);
118. ret = processLocalCmd(cmd, proc, ss);
119. }
120.
121. return ret;
122. }
1. <span style="font-family:Arial, Helvetica, sans-serif;"><span style="white-space: normal;">
2. </span></span>
1. <span style="font-family:Arial, Helvetica, sans-serif;"><span style="white-space: normal;">
2. </span></span>
5,processLoacalCmd
1. int processLocalCmd(String cmd, CommandProcessor proc, CliSessionState ss) {
2. int tryCount = 0;
3. boolean needRetry;
4. int ret = 0;
5.
6. do {
7. try {
8. false;
9. if (proc != null) {
10. if (proc instanceof Driver) {
11. Driver qp = (Driver) proc;
12. PrintStream out = ss.out;
13. long start = System.currentTimeMillis();
14. if (ss.getIsVerbose()) {
15. out.println(cmd);
16. }
17.
18. qp.setTryCount(tryCount);
19. ret = qp.run(cmd).getResponseCode();
20. if (ret != 0) {
21. qp.close();
22. return ret;
23. }
24.
25. new ArrayList<String>();
26.
27. printHeader(qp, out);
28.
29. try {
30. while (qp.getResults(res)) {
31. for (String r : res) {
32. out.println(r);
33. }
34. res.clear();
35. if (out.checkError()) {
36. break;
37. }
38. }
39. catch (IOException e) {
40. "Failed with exception " + e.getClass().getName() + ":"
41. "\n"
42. + org.apache.hadoop.util.StringUtils.stringifyException(e));
43. 1;
44. }
45.
46. int cret = qp.close();
47. if (ret == 0) {
48. ret = cret;
49. }
50.
51. long end = System.currentTimeMillis();
52. if (end > start) {
53. double timeTaken = (end - start) / 1000.0;
54. "Time taken: " + timeTaken + " seconds", null);
55. }
56.
57. else {
58. 0];
59. String cmd_1 = getFirstCmd(cmd.trim(), firstToken.length());
60.
61. if (ss.getIsVerbose()) {
62. " " + cmd_1);
63. }
64. ret = proc.run(cmd_1).getResponseCode();
65. }
66. }
67. catch (CommandNeedRetryException e) {
68. "Retry query with a different approach...");
69. tryCount++;
70. true;
71. }
72. while (needRetry);
73.
74. return ret;
75. }
6,Driver 类 的run 方法。
Driver
Driver.run(String command) // 处理一条命令
{
int ret =compile(command); // 分析命令,生成Task。
ret = execute(); // 运行Task。
}
Driver.compile
Driver.compile(String command) // 处理一条命令
{
(1) Parser(antlr):HiveQL->AbstractSyntaxTree(AST)
ParseDriver pd = new ParseDriver();
ASTNode tree = pd.parse(command, ctx);
(2) SemanticAnalyzer
BaseSemanticAnalyzer sem = SemanticAnalyzerFactory.get(conf, tree);
// Do semantic analysis and plan generation
sem.analyze(tree, ctx);
}
7,plan生成位置
可以通过跟踪到Driver.java文件的Line 663可知其路径为:
/tmp/hive-gexing111/hive_2012-07-09_10-37-27_511_5073252372102100766/test2.py
如果系统自己的plan:
/tmp/hive-gexing111/hive_2012-07-09_12-45-55_479_6444298560478274273/-local-10002/plan.xml
8,Debug show tables "ge*";
在hive/ metastore/src/java /com.aliyun.apsara.odps.metastore.ots /OTSObjectStore.java
中的Gettables;
9,配置文件位置
hive/build/dist/conf/hive-site.xml
设置为改写版本:
<property>
<name>com.aliyun.odps.mode</name>
<value>true</value>
</property>
<property>
<name>native.hive.mode</name>
<value>false</value>
</property>