入口:
bin/hive脚本中,环境检查后执行ext中的cli.sh,进入主类:CliDriver.main。

CliDriver.main:
进入cli.processLine,处理分号";"分割为一条一条语句,再进processCmd。

processCmd:
处理quit/exit,再处理source,处理!,处理list;else建立CommandProcessor(实现有Driver和各种Processor),set/dfs/add/delete命令有单独的Processor,剩下的走Driver。

如果是Driver类型的Processor:
把cmd发到这个driver的run,再进到compile,在compile中,用一个parseDriver去生成ASTNode(生成使用了antlr,主要过程:经过文法分析器切割,进解析器,出来一个TREE),这里有细节的compile的过程说明 http://fromheartgo.wordpress.com/2010/04/02/hive%E7%9A%84compile%E8%BF%87%E7%A8%8B%EF%BC%881%EF%BC%89/ ;
根据得到的ASTNode,开始语义分析,把结果设置到一个QueryPlan对象中,初始化一些task放在QueryPlan中;

run里的test only代码读了test.serialize.qplan的设置,test状态会把这些查询记录写到文件里;权限检查。

退出complie,在Driver的run中分解执行MR后,退出来到了processCmd:
如果装填一切正常,通过getResults取到MR运行结果。

全过程如下:

CliDriver.main > processLine > processCmd >> Driver.run(cmd) > compile >> BaseSemanticAnalyzer >> xxxSemanticAnalyzer(常规select走SemanticAnalyzer) > analyze(sem.analyze) >> SemanticAnalyzer的analyzeInternal方法 >> new Optimizer.optimize(进行列剪裁等优化后生成Task) > genMapRedTasks >> 返回到Driver.run(cmd) >>ret = execute() >> launchTask >> TaskRunner.run > Task.executeTask > ExecDriver.execute > 执行MR(submitJob) >> getResults.

即:

HiveCLI [Java Application]

org.apache.hadoop.hive.cli.CliDriver at localhost:38723

Thread [main] (Suspended)

Driver.execute() line: 1344

Driver.runExecute() line: 1219

Driver.run(String, Map<String,Object[]>) line: 1177

Driver.run(String) line: 1159

CliDriver.processLocalCmd(String, CommandProcessor, CliSessionState) line: 258

CliDriver.processCmd(String) line: 215

CliDriver.processLine(String, boolean) line: 411

CliDriver.run(String[]) line: 679

CliDriver.main(String[]) line: 562

/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/bin/java (2012-7-6 上午11:36:07)

主函数是CliDriver类的main函数,然后走run函数,再做了一些初始化和检测后,再调用processLine,再调用processCmd。processLocalCmd则调用了Driver类的run函数和runExcute函数。

直到:

 while ((line = reader.readLine(curPrompt + "> ")) != null) {

表示重复请求读入 SQL>

1,cli/src/java   CliDriver.main是主函数。

1. public static void main(String[] args) throws Exception {  
2. int ret = run(args);  
3.    System.exit(ret);  
4.  }

2,进入run函数

1. public static int run(String[] args) throws Exception {  
2.       
3. new OptionsProcessor();

 

1. //(1) 解析(Parse)args,放入cmdLine,处理 –hiveconf var=val  用于增加或者覆盖hive/hadoop配置,设置到System的属性中。   
2. if (!oproc.process_stage1(args)) {  
3. return 1;  
4. }  
5. //(2) 配置log4j,加载hive-log4j.properties里的配置信息。

 

1. // NOTE: It is critical to do this here so that log4j is reinitialized  
2. // before any of the other core hive classes are loaded  
3. boolean logInitFailed = false;  
4. String logInitDetailMessage;  
5. try {  
6.   logInitDetailMessage = LogUtils.initHiveLog4j();  
7. } catch (LogInitializationException e) {  
8. true;  
9.   logInitDetailMessage = e.getMessage();  
10. }  
11. //(3) 创建一个CliSessionState(SessionState)   
12. CliSessionState ss = new CliSessionState(new HiveConf(SessionState.class));  
13. ss.in = System.in;  
14. try {  
15. new PrintStream(System.out, true, "UTF-8");  
16. new PrintStream(System.err, true, "UTF-8");  
17. } catch (UnsupportedEncodingException e) {  
18. return 3;  
19. }


1. //(4) 处理-S, -e, -f, -h,-i等信息,保存在SessionState中。如果是-h,打印提示信息,并退出。  
2. if (!oproc.process_stage2(ss)) {  
3. return 2;  
4. }  
5. //(5) 如果不是-S,就是说不是静默状态,就输出一些提示信息,表示初始化好了。  
6. if (!ss.getIsSilent()) {  
7. if (logInitFailed) {  
8.     System.err.println(logInitDetailMessage);  
9. else {


1. //(5)输出一些信息:12/07/05 16:52:34 INFO SessionState:

 

1.     SessionState.getConsole().printInfo(logInitDetailMessage);  
2.   }  
3. }  
4. //(6)创建一个HiveConf,通过命令行配置所有属性。  
5. // set all properties specified via command line  
6. HiveConf conf = ss.getConf();  
7. for (Map.Entry<Object, Object> item : ss.cmdProperties.entrySet()) {  
8.   conf.set((String) item.getKey(), (String) item.getValue());  
9. }  
10. //(7)启动CliSessionState ss。   
11. SessionState.start(ss);  
12.   
13. // (8)连接到 Hive Server  
14. if (ss.getHost() != null) {  
15.   ss.connect();  
16. if (ss.isRemoteMode()) {  
17. "[" + ss.host + ':' + ss.port + "] " + prompt;  
18. char[] spaces = new char[prompt.length()];  
19. ' ');  
20. new String(spaces);  
21.   }  
22. }

 

1. //(9) ShimLoader,load  HadoopShims   
2. // CLI remote mode is a thin client: only load auxJars in local mode  
3. if (!ss.isRemoteMode() && !ShimLoader.getHadoopShims().usesJobShell()) {  
4. // hadoop-20 and above - we need to augment classpath using hiveconf  
5. // components  
6. // see also: code in ExecDriver.java  
7.   ClassLoader loader = conf.getClassLoader();

 

1. //(9)设置hiveJar= hive-exec-0.6.0.jar ,初始化加载hive-default.xml、 hive-site.xml。  
2.   String auxJars = HiveConf.getVar(conf, HiveConf.ConfVars.HIVEAUXJARS);  
3. if (StringUtils.isNotBlank(auxJars)) {  
4. ","));  
5.   }  
6.   conf.setClassLoader(loader);  
7.   Thread.currentThread().setContextClassLoader(loader);  
8. }  
9. //(10) 创建CliDriver.   
10. CliDriver cli = new CliDrive();  
11. cli.setHiveVariables(oproc.getHiveVariables());  
12. //(10)在接受hivesql命令前,执行一些初始化命令,这些命令存在文件中,文件可以通过-i选项设置,如果没有设置就去查找是否有$HIVE_HOME/bin/.hiverc和System.getProperty("user.home")/.hiverc两个文件,如果有就执行这两个文件中的命令。   
13. // Execute -i init files (always in silent mode)  
14. cli.processInitFiles(ss);  
15. //(10) 如果是–e,执行命令并退出,如果是-f,执行文件中的命令并退出。   
16. if (ss.execString != null) {  
17. return cli.processLine(ss.execString);  
18. }  
19.   
20. try {  
21. if (ss.fileName != null) {  
22. return cli.processFile(ss.fileName);  
23.   }  
24. } catch (FileNotFoundException e) {  
25. "Could not open input file for reading. (" + e.getMessage() + ")");  
26. return 3;  
27. }  
28. //(11)创建ConsoleReader,读取用户输入,遇到“;”为一个完整的命令,执行该命令(CliDriver.processLine ),接着读取处理用户的输入。用户输入的命令记录在user.home/.hivehistory文件中。   
29. ConsoleReader reader = new ConsoleReader();  
30. reader.setBellEnabled(false);  
31. // reader.setDebug(new PrintWriter(new FileWriter("writer.debug", true)));  
32. reader.addCompletor(getCommandCompletor());  
33.   
34. String line;  
35. final String HISTORYFILE = ".hivehistory";  
36. String historyFile = System.getProperty("user.home") + File.separator + HISTORYFILE;  
37. reader.setHistory(new History(new File(historyFile)));  
38. int ret = 0;  
39.   
40. String prefix = "";  
41. String curDB = getFormattedDb(conf, ss);  
42. String curPrompt = prompt + curDB;  
43. String dbSpaces = spacesForString(curDB);  
44.   
45. while ((line = reader.readLine(curPrompt + "> ")) != null) {  
46. if (!prefix.equals("")) {  
47. '\n';  
48.   }  
49. if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {  
50.     line = prefix + line;  
51. true);  
52. "";  
53.     curDB = getFormattedDb(conf, ss);  
54.     curPrompt = prompt + curDB;  
55.     dbSpaces = dbSpaces.length() == curDB.length() ? dbSpaces : spacesForString(curDB);  
56. else {  
57.     prefix = prefix + line;  
58.     curPrompt = prompt2 + dbSpaces;  
59. continue;  
60.   }  
61. }  
62.   
63. ss.close();  
64.   
65. return ret;

 

3,主要是调用了 processLine。

     ProcessLine又调用了 processCmd。

    CliDriver.processLine   去掉命令末尾的;, 

1. public int processLine(String line, boolean allowInterupting) {  
2. null;  
3. null;  
4. //(1)整理允许中断 ctrl+C  
5. if (allowInterupting) {  
6. // Remember all threads that were running at the time we started line processing.  
7. // Hook up the custom Ctrl+C handler while processing this line  
8. new Signal("INT");  
9. new SignalHandler() {  
10. private final Thread cliThread = Thread.currentThread();  
11. private boolean interruptRequested;  
12.   
13. @Override  
14. public void handle(Signal signal) {  
15. boolean initialRequest = !interruptRequested;  
16. true;  
17.   
18. // Kill the VM on second ctrl+c  
19. if (!initialRequest) {  
20. "Exiting the JVM");  
21. 127);  
22.          }  
23.   
24. // Interrupt the CLI thread to stop the current statement and return  
25. // to prompt  
26. "Interrupting... Be patient, this might take some time.");  
27. "Press Ctrl+C again to kill JVM");  
28.   
29. // First, kill any running MR jobs  
30.          HadoopJobExecHelper.killRunningJobs();  
31.          HiveInterruptUtils.interrupt();  
32. this.cliThread.interrupt();  
33.        }  
34.      });  
35.    }  
36.   
37. try {  
38. int lastRet = 0, ret = 0;  
39.   
40. "";


1. //(2)循环处理每一个以分号结尾的语句。  
2. for (String oneCmd : line.split(";")) {  
3.   
4. if (StringUtils.endsWith(oneCmd, "\\")) {  
5. ";";  
6. continue;  
7. else {  
8.         command += oneCmd;  
9.       }  
10. if (StringUtils.isBlank(command)) {  
11. continue;  
12.       }  
13. //(3)执行处理命令  
14.       ret = processCmd(command);  
15. //(4)清除query State的状态。wipe cli query state  
16.       SessionState ss = SessionState.get();  
17. null);  
18. "";  
19.       lastRet = ret;  
20. boolean ignoreErrors = HiveConf.getBoolVar(conf, HiveConf.ConfVars.CLIIGNOREERRORS);  
21. if (ret != 0 && !ignoreErrors) {  
22.         CommandProcessorFactory.clean((HiveConf) conf);  
23. return ret;  
24.       }  
25.     }  
26.     CommandProcessorFactory.clean((HiveConf) conf);  
27. return lastRet;  
28. finally {  
29. // Once we are done processing the line, restore the old handler  
30. if (oldSignal != null && interupSignal != null) {  
31.       Signal.handle(interupSignal, oldSignal);  
32.     }  
33.   }  
34. }

 

4,processCmd

CliDriver.processCmd 

Split命令,分析第一个单词: 
(1)如果是quit或者exit,退出。 
(2)source,执行文件中的HiveQL 
(3)!,执行命令,如!ls,列出当前目录的文件信息。 
(4)list,列出jar/file/archive。 
(5)如果是其他,则生成调用相应的CommandProcessor处理。

 

1. public int processCmd(String cmd) {  
2.     CliSessionState ss = (CliSessionState) SessionState.get();  
3.     String cmd_trimmed = cmd.trim();  
4.     String[] tokens = tokenizeCmd(cmd_trimmed);  
5. int ret = 0;  
6. //(1)如果是quit或者exit,退出。   
7. if (cmd_trimmed.toLowerCase().equals("quit") || cmd_trimmed.toLowerCase().equals("exit")) {  
8.   
9. // if we have come this far - either the previous commands  
10. // are all successful or this is command line. in either case  
11. // this counts as a successful run  
12.       ss.close();  
13. 0);  
14. //(2)source,执行文件中的HiveQL   
15. else if (tokens[0].equalsIgnoreCase("source")) {  
16. 0].length());  
17.   
18. new File(cmd_1);  
19. if (! sourceFile.isFile()){  
20. "File: "+ cmd_1 + " is not a file.");  
21. 1;  
22. else {  
23. try {  
24. this.processFile(cmd_1);  
25. catch (IOException e) {  
26. "Failed processing file "+ cmd_1 +" "+ e.getLocalizedMessage(),  
27.             org.apache.hadoop.util.StringUtils.stringifyException(e));  
28. 1;  
29.         }  
30. //(3)!,执行命令,如!ls,列出当前目录的文件信息。   
31. else if (cmd_trimmed.startsWith("!")) {  
32.   
33. 1);  
34. new VariableSubstitution().substitute(ss.getConf(), shell_cmd);  
35.   
36. // shell_cmd = "/bin/bash -c \'" + shell_cmd + "\'";  
37. try {  
38.         Process executor = Runtime.getRuntime().exec(shell_cmd);  
39. new StreamPrinter(executor.getInputStream(), null, ss.out);  
40. new StreamPrinter(executor.getErrorStream(), null, ss.err);  
41.   
42.         outPrinter.start();  
43.         errPrinter.start();  
44.   
45.         ret = executor.waitFor();  
46. if (ret != 0) {  
47. "Command failed with exit code = " + ret);  
48.         }  
49. catch (Exception e) {  
50. "Exception raised from Shell command " + e.getLocalizedMessage(),  
51.             org.apache.hadoop.util.StringUtils.stringifyException(e));  
52. 1;  
53.       }  
54. //(4)list,列出jar/file/archive。   
55. else if (tokens[0].toLowerCase().equals("list")) {  
56.   
57.       SessionState.ResourceType t;  
58. if (tokens.length < 2 || (t = SessionState.find_resource_type(tokens[1])) == null) {  
59. "Usage: list ["  
60. "|") + "] [<value> [<value>]*]");  
61. 1;  
62. else {  
63. null;  
64. if (tokens.length >= 3) {  
65. 2, tokens, 0, tokens.length - 2);  
66.           filter = Arrays.asList(tokens);  
67.         }  
68.         Set<String> s = ss.list_resource(t, filter);  
69. if (s != null && !s.isEmpty()) {  
70. "\n"));  
71.         }  
72. //(5)如果是其他,则生成调用相应的CommandProcessor处理。//如果是远端  
73. else if (ss.isRemoteMode()) { // remote mode -- connecting to remote hive server  
74.       HiveClient client = ss.getClient();  
75.       PrintStream out = ss.out;  
76.       PrintStream err = ss.err;  
77.   
78. try {  
79.         client.execute(cmd_trimmed);  
80.         List<String> results;  
81. do {  
82.           results = client.fetchN(LINES_TO_FETCH);  
83. for (String line : results) {  
84.             out.println(line);  
85.           }  
86. while (results.size() == LINES_TO_FETCH);  
87. catch (HiveServerException e) {  
88.         ret = e.getErrorCode();  
89. if (ret != 0) { // OK if ret == 0 -- reached the EOF  
90.           String errMsg = e.getMessage();  
91. if (errMsg == null) {  
92.             errMsg = e.toString();  
93.           }  
94.           ret = e.getErrorCode();  
95. "[Hive Error]: " + errMsg);  
96.         }  
97. catch (TException e) {  
98.         String errMsg = e.getMessage();  
99. if (errMsg == null) {  
100.           errMsg = e.toString();  
101.         }  
102. 10002;  
103. "[Thrift Error]: " + errMsg);  
104. finally {  
105. try {  
106.           client.clean();  
107. catch (TException e) {  
108.           String errMsg = e.getMessage();  
109. if (errMsg == null) {  
110.             errMsg = e.toString();  
111.           }  
112. "[Thrift Error]: Hive server is not cleaned due to thrift exception: "  
113.               + errMsg);  
114.         }  
115. //如果是本地  
116. else { // local mode  
117. 0], (HiveConf) conf);  
118.       ret = processLocalCmd(cmd, proc, ss);  
119.     }  
120.   
121. return ret;  
122.   }

 

1. <span style="font-family:Arial, Helvetica, sans-serif;"><span style="white-space: normal;">  
2. </span></span>

 

1. <span style="font-family:Arial, Helvetica, sans-serif;"><span style="white-space: normal;">  
2. </span></span>

5,processLoacalCmd

1. int processLocalCmd(String cmd, CommandProcessor proc, CliSessionState ss) {  
2. int tryCount = 0;  
3. boolean needRetry;  
4. int ret = 0;  
5.   
6. do {  
7. try {  
8. false;  
9. if (proc != null) {  
10. if (proc instanceof Driver) {  
11.           Driver qp = (Driver) proc;  
12.           PrintStream out = ss.out;  
13. long start = System.currentTimeMillis();  
14. if (ss.getIsVerbose()) {  
15.             out.println(cmd);  
16.           }  
17.   
18.           qp.setTryCount(tryCount);  
19.           ret = qp.run(cmd).getResponseCode();  
20. if (ret != 0) {  
21.             qp.close();  
22. return ret;  
23.           }  
24.   
25. new ArrayList<String>();  
26.   
27.           printHeader(qp, out);  
28.   
29. try {  
30. while (qp.getResults(res)) {  
31. for (String r : res) {  
32.                 out.println(r);  
33.               }  
34.               res.clear();  
35. if (out.checkError()) {  
36. break;  
37.               }  
38.             }  
39. catch (IOException e) {  
40. "Failed with exception " + e.getClass().getName() + ":"  
41. "\n"  
42.                 + org.apache.hadoop.util.StringUtils.stringifyException(e));  
43. 1;  
44.           }  
45.   
46. int cret = qp.close();  
47. if (ret == 0) {  
48.             ret = cret;  
49.           }  
50.   
51. long end = System.currentTimeMillis();  
52. if (end > start) {  
53. double timeTaken = (end - start) / 1000.0;  
54. "Time taken: " + timeTaken + " seconds", null);  
55.           }  
56.   
57. else {  
58. 0];  
59.           String cmd_1 = getFirstCmd(cmd.trim(), firstToken.length());  
60.   
61. if (ss.getIsVerbose()) {  
62. " " + cmd_1);  
63.           }  
64.           ret = proc.run(cmd_1).getResponseCode();  
65.         }  
66.       }  
67. catch (CommandNeedRetryException e) {  
68. "Retry query with a different approach...");  
69.       tryCount++;  
70. true;  
71.     }  
72. while (needRetry);  
73.   
74. return ret;  
75. }

6,Driver 类 的run 方法。

Driver 
Driver.run(String command) // 处理一条命令 

  int ret =compile(command);  // 分析命令,生成Task。 
  ret = execute();  // 运行Task。 


Driver.compile 
 
Driver.compile(String command) // 处理一条命令 

(1) Parser(antlr):HiveQL->AbstractSyntaxTree(AST) 
      ParseDriver pd = new ParseDriver(); 
      ASTNode tree = pd.parse(command, ctx); 
(2) SemanticAnalyzer 
      BaseSemanticAnalyzer sem = SemanticAnalyzerFactory.get(conf, tree); 
      // Do semantic analysis and plan generation 
      sem.analyze(tree, ctx); 

 

7,plan生成位置

可以通过跟踪到Driver.java文件的Line 663可知其路径为:

/tmp/hive-gexing111/hive_2012-07-09_10-37-27_511_5073252372102100766/test2.py

如果系统自己的plan:

/tmp/hive-gexing111/hive_2012-07-09_12-45-55_479_6444298560478274273/-local-10002/plan.xml

8,Debug  show tables "ge*";

在hive/   metastore/src/java   /com.aliyun.apsara.odps.metastore.ots   /OTSObjectStore.java   

中的Gettables;

9,配置文件位置

hive/build/dist/conf/hive-site.xml

设置为改写版本:

<property>
  <name>com.aliyun.odps.mode</name>
  <value>true</value>
</property>

<property>
<name>native.hive.mode</name>
<value>false</value>
</property>