mapreduce编程的包 mapreduce的编程模式

转载

mob6454cc77db30 2024-04-28 19:59:50

文章标签 mapreduce编程的包 Hadoop MapReduce 大数据 hadoop 文章分类 架构后端开发

MapReduce是什么

MapReduce是Hadoop（这种大数据处理生态环境）的编程模型。既然称为模型，则意味着它有固定的形式。

MapReduce编程模型，就是Hadoop生态环境进行数据分析处理的固定的编程形式。

这种固定的编程形式描述如下：

MapReduce任务过程被分为两个阶段：map阶段和reduce阶段。每个阶段都以键/值对作为输入和输出，并由程序员选择他们的类型。

也就是说，程序员只需要定义两个函数：map函数和reduce函数就好了，其他的计算过程交给hadoop就好了。

通过以上描述，我们可以看出：

MapReduce所能处理的场景实际是非常具体的，非常有限的，只是“数据的统计分析”场景。

输入数据准备

天气预报官方网址：ftp://ftp.ncdc.noaa.gov/pub/data/gsod/

但是，发现这个官方网址的文件格式和《Hadoop权威指南》( http://www.linuxidc.com/Linux/2012-07/65972.htm )所用的格式不一致，不知道是时间久了，官网的格式变了，还是作者对原始格式进行过处理，亦或这个网址根本不对，所以继而又到《Hadoop权威指南》指定的地址下载了一个，地址如下：

https://github.com/tomwhite/hadoop-book/tree/master/input/ncdc/all

如果简单测试，也可以把下面这几行粘贴到一个文本文件也行，这就是正确的天气文件：

0035029070999991902010113004+64333+023450FM-12+000599999V0201401N011819999999N0000001N9-01001+99999100311ADDGF104991999999999999999999MW1381

0035029070999991902010120004+64333+023450FM-12+000599999V0201401N013919999999N0000001N9-01171+99999100121ADDGF108991999999999999999999MW1381

0035029070999991902010206004+64333+023450FM-12+000599999V0200901N009819999999N0000001N9-01611+99999100121ADDGF108991999999999999999999MW1381

0029029070999991902010213004+64333+023450FM-12+000599999V0200901N011819999999N0000001N9-01721+99999100121ADDGF108991999999999999999999

0029029070999991902010220004+64333+023450FM-12+000599999V0200901N009819999999N0000001N9-01781+99999100421ADDGF108991999999999999999999

本文中，我们把存储天气格式的文本文件命名为：temperature.txt

MapReduce Java编程

有两套JavaAPI，旧的是org.apache.hadoop.mapred包，MapReduce编程是使用实现接口的方式；新的是org.apache.hadoop.marreduce包，MapReduce编程是使用继承抽象基类的方式；其实都差不多，下面都会有显示。

Maven

<groupId>org.apache.hadoop</groupId>

<artifactId>hadoop-core</artifactId>

</dependency>

也可以不用官方的，用别人修改重新编译过的，可以直接在Eclipse里面像运行普通Java程序一样运行MapReduce。

编译过的hadoop-core-1.0.4.jar，可以在本地模拟MapReduce

如果Eclipse workspace在d:，则我们可以把d:的某个目录，比如d:\input作为输入目录；d:\output作为输出目录。

MapReduce编程模型里面这样写就可以了：

FileInputFormat.setInputPaths(job, new Path("/input"));

FileOutputFormat.setOutputPath(job, new Path("/output"));

下载地址：

免费下载地址在 http://linux.linuxidc.com/

用户名与密码都是www.linuxidc.com

具体下载目录在 /2014年资料/4月/16日/MapReduce编程实战

下载方法见 http://www.linuxidc.com/Linux/2013-07/87684.htm

----------------------------------------------------------------------------

或者：

------------------------------------------分割线------------------------------------------

FTP地址：ftp://ftp1.linuxidc.com

用户名：ftp1.linuxidc.com

密码：www.linuxidc.com

在 2014年LinuxIDC.com\4月\MapReduce编程实战

下载方法见 http://www.linuxidc.com/Linux/2013-10/91140.htm

------------------------------------------分割线------------------------------------------

下载后，直接覆盖maven资源库位置的文件即可。

接口方式

import java.io.IOException;

import java.util.Iterator;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.FileInputFormat;

import org.apache.hadoop.mapred.FileOutputFormat;

import org.apache.hadoop.mapred.JobClient;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.mapred.MapReduceBase;

import org.apache.hadoop.mapred.Mapper;

import org.apache.hadoop.mapred.OutputCollector;

import org.apache.hadoop.mapred.Reducer;

import org.apache.hadoop.mapred.Reporter;

public class MaxTemperature {

public static void main(String[] args) throws Exception {

JobConf conf = new JobConf(MaxTemperature.class);

conf.setJobName("Max Temperature");

// FileInputFormat.addInputPaths(conf, new Path(args[0]));

// FileOutputFormat.setOutputPath(conf, new Path(args[1]));

FileInputFormat.setInputPaths(conf, new Path("/hadooptemp/input/2"));

FileOutputFormat.setOutputPath(conf, new Path("/hadooptemp/output"));

conf.setMapperClass(MaxTemperatureMapper.class);

conf.setReducerClass(MaxTemperatureReduce.class);

conf.setOutputKeyClass(Text.class);

conf.setOutputValueClass(IntWritable.class);

JobClient.runJob(conf);

}

class MaxTemperatureMapper extends MapReduceBase implements

Mapper<LongWritable, Text, Text, IntWritable> {

private static final int MISSING = 9999;

public void map(LongWritable key, Text value,

OutputCollector<Text, IntWritable> output, Reporter reporter)

throws IOException {

String line = value.toString();

String year = line.substring(15, 19);

int airTemperature;

if (line.charAt(87) == '+') {

airTemperature = Integer.parseInt(line.substring(88, 92));

} else {

airTemperature = Integer.parseInt(line.substring(87, 92));

}

String quality = line.substring(92, 93);

if (airTemperature != MISSING && quality.matches("[01459]")) {

output.collect(new Text(year), new IntWritable(airTemperature));

}

class MaxTemperatureReduce extends MapReduceBase implements

Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator<IntWritable> values,

OutputCollector<Text, IntWritable> output, Reporter reporter)

throws IOException {

int maxValue = Integer.MIN_VALUE;

while (values.hasNext()) {

maxValue = Math.max(maxValue, values.next().get());

}

output.collect(key, new IntWritable(maxValue));

}

抽象类方式

import java.io.IOException;

import java.util.Iterator;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class NewMaxTemperature {

public static void main(String[] args) throws Exception {

Job job = new Job();

job.setJarByClass(NewMaxTemperature.class);

// FileInputFormat.setInputPaths(job, new Path(args[0]));

// FileOutputFormat.setOutputPath(job, new Path(args[1]));

FileInputFormat.setInputPaths(job, new Path("/hadooptemp/input/2"));

FileOutputFormat.setOutputPath(job, new Path("/hadooptemp/output"));

job.setMapperClass(NewMaxTemperatureMapper.class);

job.setReducerClass(NewMaxTemperatureReduce.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

class NewMaxTemperatureMapper extends

Mapper<LongWritable, Text, Text, IntWritable> {

private static final int MISSING = 9999;

public void map(LongWritable key, Text value, Context context)

throws IOException, InterruptedException {

String line = value.toString();

String year = line.substring(15, 19);

int airTemperature;

if (line.charAt(87) == '+') {

airTemperature = Integer.parseInt(line.substring(88, 92));

} else {

airTemperature = Integer.parseInt(line.substring(87, 92));

}

String quality = line.substring(92, 93);

if (airTemperature != MISSING && quality.matches("[01459]")) {

context.write(new Text(year), new IntWritable(airTemperature));

}

class NewMaxTemperatureReduce extends

Reducer<Text, IntWritable, Text, IntWritable> {

public void reduce(Text key, Iterator<IntWritable> values, Context context)

throws IOException, InterruptedException {

int maxValue = Integer.MIN_VALUE;

while (values.hasNext()) {

maxValue = Math.max(maxValue, values.next().get());

}

context.write(key, new IntWritable(maxValue));

}

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：mysql的redo log在什么位置存储 mysql redo log

下一篇：unity rawimage 加载纹理很暗 unity纹理怎么弄

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

mapreduce编程的包 mapreduce的编程模式

mapreduce编程的包 mapreduce的编程模式

51CTO博客