hadoop 的核心还是 Map-Reduce过程和 hadoop分布式文件系统
第一步:定义Map过程
- /**
- *
- * Description:
- *
- * @author charles.wang
- * @created Mar 12, 2012 1:41:57 PM
- *
- */
- public class MyMap extends Mapper<Object, Text, Text, IntWritable> {
- private static final IntWritable one = new IntWritable(1);
- private Text word;
- public void map(Object key ,Text value,Context context)
- throws IOException,InterruptedException{
- String line=value.toString();
- StringTokenizer tokenizer = new StringTokenizer(line);
- while(tokenizer.hasMoreTokens()){
- word = new Text();
- word.set(tokenizer.nextToken());
- context.write(word, one);
- }
- }
- }
第二步: 定义 Reduce 过程
- /**
- *
- * Description:
- *
- * @author charles.wang
- * @created Mar 12, 2012 1:48:18 PM
- *
- */
- public class MyReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
- public void reduce (Text key,Iterable<IntWritable> values,Context context)
- throws IOException ,InterruptedException{
- int sum=0;
- for(IntWritable val: values){
- sum+=val.get();
- }
- context.write(key, new IntWritable(sum));
- }
- }
编写一个Driver 来执行Map-Reduce过程
- public class MyDriver {
- public static void main(String [] args) throws Exception{
- Configuration conf = new Configuration();
- conf.set("hadoop.job.ugi", "root,root123");
- Job job = new Job(conf,"Hello,hadoop! ^_^");
- job.setJarByClass(MyDriver.class);
- job.setMapOutputKeyClass(Text.class);
- job.setMapOutputValueClass(IntWritable.class);
- job.setMapperClass(MyMap.class);
- job.setCombinerClass(MyReduce.class);
- job.setReducerClass(MyReduce.class);
- job.setInputFormatClass(TextInputFormat.class);
- job.setOutputFormatClass(TextOutputFormat.class);
- FileInputFormat.setInputPaths(job, new Path(args[0]));
- FileOutputFormat.setOutputPath(job,new Path(args[1]));
- job.waitForCompletion(true);
- }
- }