Hadoop 统计单词字数的例子

原创

charles_wang888 2012-04-21 18:06:33 博主文章分类：Hadoop ©著作权

文章标签 Hadoop Map-Reduce 文章分类 Hadoop 大数据

©著作权归作者所有：来自51CTO博客作者charles_wang888的原创作品，如需转载，请与作者联系，否则将追究法律责任

hadoop 的核心还是 Map-Reduce过程和 hadoop分布式文件系统

第一步：定义Map过程

/** 
 * 
 * Description: 
 * 
 * @author charles.wang 
 * @created Mar 12, 2012 1:41:57 PM 
 *  
 */ 
public class MyMap extends Mapper<Object, Text, Text, IntWritable> { 
     
    private static final IntWritable one = new IntWritable(1); 
    private Text word; 
     
     
    public void map(Object key ,Text value,Context context)  
            throws IOException,InterruptedException{ 
         
        String line=value.toString(); 
        StringTokenizer tokenizer = new StringTokenizer(line); 
        while(tokenizer.hasMoreTokens()){ 
            word = new Text(); 
            word.set(tokenizer.nextToken()); 
            context.write(word, one); 
        } 
         
    } 
 
}

第二步：定义 Reduce 过程

/** 
 * 
 * Description: 
 * 
 * @author charles.wang 
 * @created Mar 12, 2012 1:48:18 PM 
 *  
 */ 
public class MyReduce extends Reducer<Text, IntWritable, Text, IntWritable> { 
     
    public void reduce (Text key,Iterable<IntWritable> values,Context context) 
        throws IOException ,InterruptedException{ 
         
        int sum=0; 
        for(IntWritable val: values){ 
            sum+=val.get(); 
        } 
         
        context.write(key, new IntWritable(sum)); 
    } 
 
}

编写一个Driver 来执行Map-Reduce过程

public class MyDriver { 
 
    public static void main(String [] args) throws Exception{ 
            
        Configuration conf = new Configuration(); 
        conf.set("hadoop.job.ugi", "root,root123"); 
         
        Job job = new Job(conf,"Hello,hadoop! ^_^"); 
         
        job.setJarByClass(MyDriver.class); 
        job.setMapOutputKeyClass(Text.class); 
        job.setMapOutputValueClass(IntWritable.class); 
        job.setMapperClass(MyMap.class); 
        job.setCombinerClass(MyReduce.class); 
        job.setReducerClass(MyReduce.class); 
        job.setInputFormatClass(TextInputFormat.class); 
        job.setOutputFormatClass(TextOutputFormat.class); 
         
        FileInputFormat.setInputPaths(job, new Path(args[0])); 
        FileOutputFormat.setOutputPath(job,new Path(args[1])); 
         
        job.waitForCompletion(true); 
    } 
}