先看Configurable 接口:
public interface Configurable {
void setConf (Configuration conf ) ;
Configuration getConf ( ) ;
}
Configurable接口只定义了两个方法:setConf与 getConf。
Configured类实现了Configurable接口:
public class Configured implements Configurable {
private Configuration conf ;
public Configured ( ) {
this ( null ) ;
}
public Configured (Configuration conf ) {
setConf (conf ) ;
}
public void setConf (Configuration conf ) {
this. conf = conf ;
}
public Configuration getConf ( ) {
return conf ;
}
}
Tool接口继承了Configurable接口,只有一个run()方法。(接口继承接口)
public interface Tool extends Configurable {
int run ( String [ ] args ) throws Exception ;
}
继承关系如下:
再看ToolRunner类的一部分:
public class ToolRunner {
public static int run (Configuration conf, Tool tool, String [ ] args )
throws Exception {
if (conf == null ) {
conf = new Configuration ( ) ;
}
GenericOptionsParser parser = new GenericOptionsParser (conf, args ) ;
//set the configuration back, so that Tool can configure itself
tool. setConf (conf ) ;
//get the args w/o generic hadoop args
String [ ] toolArgs = parser. getRemainingArgs ( ) ;
return tool. run (toolArgs ) ;
}
}
从ToolRunner的静态方法run()可以看到,其通过GenericOptionsParser 来读取传递给run的job的conf和命令行参数args,处理hadoop的通用命令行参数,然后将剩下的job自己定义的参数(toolArgs = parser.getRemainingArgs();)交给tool来处理,再由tool来运行自己的run方法。
通用命令行参数指的是对任意的一个job都可以添加的,如:
-conf < configuration file > specify a configuration file-D < property=value > use value for given property
-fs < local|namenode:port > specify a namenode
-jt < local|jobtracker:port > specify a job tracker
-files < comma separated list of files > specify comma separated files to be copied to the map reduce cluster
-libjars < comma separated list of jars > specify comma separated jar files to include in the classpath.
-archives < comma separated list of archives > specify comma separated archives to be unarchived on the compute machines.
一个典型的实现Tool的程序:
/**
MyApp 需要从命令行读取参数,用户输入命令如,
$bin/hadoop jar MyApp.jar -archives test.tgz arg1 arg2
-archives 为hadoop通用参数,arg1 ,arg2为job的参数
*/
public class MyApp extends Configured implements Tool {
//implemet Tool’s run
public int run ( String [ ] args ) throws Exception {
Configuration conf = getConf ( ) ;
// Create a JobConf using the processed conf
JobConf job = new JobConf (conf, MyApp. class ) ;
// Process custom command-line options
Path in = new Path (args [ 1 ] ) ;
Path out = new Path (args [ 2 ] ) ;
// Specify various job-specific parameters
job. setJobName ( "my-app" ) ;
job. setInputPath (in ) ;
job. setOutputPath (out ) ;
job. setMapperClass (MyApp. MyMapper. class ) ;
job. setReducerClass (MyApp. MyReducer. class ) ;
JobClient. runJob (job ) ;
}
public static void main ( String [ ] args ) throws Exception {
// args由ToolRunner来处理
int res = ToolRunner. run ( new Configuration ( ), new MyApp ( ), args ) ;
System. exit (res ) ;
}
}
http://hnote.org/big-data/hadoop/hadoop-tool-toolrunner