Hadoop Tool,ToolRunner原理分析

原创

wbj0110 2023-07-24 18:03:32 博主文章分类：Hadoop ©著作权

文章标签 Hadoop hadoop hive jar 文章分类 Html/CSS 前端开发

©著作权归作者所有：来自51CTO博客作者wbj0110的原创作品，请联系作者获取转载授权，否则将追究法律责任

先看Configurable 接口：

public        interface Configurable        {       
void setConf       (Configuration conf       )       ;       
  Configuration getConf       (       )       ;       
}

Configurable接口只定义了两个方法：setConf与 getConf。
Configured类实现了Configurable接口：

public        class Configured        implements Configurable        {       
         private Configuration conf       ;       
           public Configured       (       )        {       
           this       (       null       )       ;       
         }       

         public Configured       (Configuration conf       )        {       
    setConf       (conf       )       ;       
         }       
        
         public        void setConf       (Configuration conf       )        {       
           this.       conf        = conf       ;       
         }       
        public Configuration getConf       (       )        {       
           return conf       ;       
         }       
}

Tool接口继承了Configurable接口，只有一个run()方法。(接口继承接口)

public        interface Tool        extends Configurable        {       
         int run       (       String        [       ] args       )        throws        Exception       ;       
}

继承关系如下：

再看ToolRunner类的一部分：

public        class ToolRunner        {       
         public        static        int run       (Configuration conf, Tool tool,        String       [       ] args       )       
         throws        Exception       {       
           if       (conf        ==        null       )        {       
     conf        =        new Configuration       (       )       ;       
           }       

    GenericOptionsParser parser        =        new GenericOptionsParser       (conf, args       )       ;       
           //set the configuration back, so that Tool can configure itself       
    tool.       setConf       (conf       )       ;       
           //get the args w/o generic hadoop args       
           String       [       ] toolArgs        = parser.       getRemainingArgs       (       )       ;       
           return tool.       run       (toolArgs       )       ;       

         }       
}

从ToolRunner的静态方法run()可以看到，其通过GenericOptionsParser 来读取传递给run的job的conf和命令行参数args，处理hadoop的通用命令行参数，然后将剩下的job自己定义的参数(toolArgs = parser.getRemainingArgs();)交给tool来处理,再由tool来运行自己的run方法。

通用命令行参数指的是对任意的一个job都可以添加的，如：

-conf < configuration file >     specify a configuration file-D < property=value >            use value for given property
-fs < local|namenode:port >      specify a namenode
-jt < local|jobtracker:port >    specify a job tracker
-files < comma separated list of files >    specify comma separated files to be copied to the map reduce cluster
-libjars < comma separated list of jars >   specify comma separated jar files to include in the classpath.
-archives < comma separated list of archives >    specify comma separated archives to be unarchived on the compute machines.

一个典型的实现Tool的程序：

/**

MyApp 需要从命令行读取参数，用户输入命令如，

$bin/hadoop jar MyApp.jar -archives test.tgz  arg1 arg2

-archives 为hadoop通用参数，arg1 ,arg2为job的参数

*/       

public        class MyApp        extends Configured        implements Tool        {       

//implemet Tool’s run       

           public        int run       (       String       [       ] args       )        throws        Exception        {       

        Configuration conf        = getConf       (       )       ;       

// Create a JobConf using the processed conf       

        JobConf job        =        new JobConf       (conf, MyApp.       class       )       ;       

// Process custom command-line options       

        Path in        =        new Path       (args       [       1       ]       )       ;       

        Path out        =        new Path       (args       [       2       ]       )       ;       

// Specify various job-specific parameters       

        job.       setJobName       (       "my-app"       )       ;       

        job.       setInputPath       (in       )       ;       

        job.       setOutputPath       (out       )       ;       

        job.       setMapperClass       (MyApp.       MyMapper.       class       )       ;       

        job.       setReducerClass       (MyApp.       MyReducer.       class       )       ;       

               

        JobClient.       runJob       (job       )       ;       

           }       

           public        static        void main       (       String       [       ] args       )        throws        Exception        {       

// args由ToolRunner来处理       

               int res        = ToolRunner.       run       (       new Configuration       (       ),        new MyApp       (       ), args       )       ;       

               System.       exit       (res       )       ;       

           }       

}

http://hnote.org/big-data/hadoop/hadoop-tool-toolrunner