参考

  • hadoop权威指南 第六章,6.4节

背景

hadoop,mapreduce就如MVC,spring一样现在已经是烂大街了,虽然用过,但是说看过源码么,没有,调过参数么?调过,调到刚好能跑起来。现在有时间看看hadoop权威指南,感觉真是走了许多弯路。

MR流程

参数

共同影响

io.sort.factor

多路合并允许的最大输入路数。设成较大的值可以减少合并轮数,从而减少磁盘读写次数。

map端

io.sort.mb

map端输出缓冲区大小,map输出先放到这里然后在通过排序和partition再写入本地磁盘,等待再次merge直到map过程结束数据被reduce端获取。

io.sort.spill.percent

map端输出数据占输出缓冲区多少比例时开始刷出到磁盘。这个应该取决于map端输出速度和磁盘写入速度比例,就是一个一般的有界缓冲+生产者消费者问题。

reduce端

mapred.job.shuffle.input.buffer.percent

reduce端的输入缓冲区比例(占JVM堆空间),如果从map端拉取到的数据全部能够放下则可以直接在内存中完成map的输出合并,不用写入磁盘,直接作为reduce的输入。

注意这是个坑,当此比例与JVM可用堆空间乘积超过Intger.MAX_VALUE时会不声不响的使用Intger.MAX_VALUE作为上限。必须强行设置mapreduce.reduce.memory.totalbytes参数来定义最大可用堆大小

mapred.job.shuffle.merge.percent

reduce端的输入缓冲区使用达到多少比例时开始merge到磁盘的过程。即当reduce端接收map端数据超过heapsize * mapred.job.shuffle.input.buffer.percent * mapred.job.shuffle.merge.percent开始向本地磁盘输出merge结果。

mapred.inmem.merge.threshold

reduce端的输入缓冲区使用达到多少大小(MB)时开始merge到磁盘的过程。这里使用的是一个具体数值而不是比例。如果把这项设为0,则控制有比例参数计算得出。

mapred.job.reduce.input.buffer.percent

reduce端在进行reduce操作之前剩余在输入缓冲区的数据占堆空间的比例。因为reduce端最后一趟reduce的输入不用完全来自磁盘,它可以通过多路merge的过程直接获取来自磁盘或者内存(内存中的是已合并但为输出到磁盘的map输出数据)的数据。如果设定为0的话就是强制把缓冲清空,将所有合并结果写入磁盘。

这里也是有个坑,和上面的一样最多得到的大小不会超过2GB,也没什么附加参数可以修正的。

实验

对4GB整数(存储数字文件的大小,以文本形式存储)进行一个排序。

环境

Hadoop 2.6.0 
1 Namenode + 1 ResourceManager + 3 DataNode&NodeManager

实验准备

使用如下命令产生4个包含随机整数的文本文件:

echo $(od -An -N4 -i /dev/urandom) >> out.data

生成的每个文件约1GB。因为使用了Linux上的随机数发生器,生成数据的过程有些慢,可以在四台机器上分别进行,最后将得到的数据文件上传到HDFS的(当前用户的home目录中)sort_integer文件夹中:

ubuntu@dev00:~/sort-mr$ hadoop fs -ls sort_integer
Found 4 items
-rw-r--r--   1 ubuntu supergroup 1043473519 2015-08-06 13:05 sort_integer/int00.data
-rw-r--r--   1 ubuntu supergroup 1075257196 2015-08-06 13:07 sort_integer/int01.data
-rw-r--r--   1 ubuntu supergroup 1063854482 2015-08-06 13:08 sort_integer/int02.data
-rw-r--r--   1 ubuntu supergroup 1086774112 2015-08-06 13:08 sort_integer/int03.data

MapReduce程序

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import java.io.IOException;


class SortMapper extends Mapper<Object, Text, LongWritable, IntWritable> {
    private LongWritable num = new LongWritable(0);
    private IntWritable one = new IntWritable(1);
    @Override
    protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        num.set(Long.valueOf(value.toString().trim()));
        context.write(num, one);
    }
}

class SortReducer extends Reducer<LongWritable, IntWritable, LongWritable, NullWritable> {
    @Override
    protected void reduce(LongWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int count = 0;
        for (IntWritable i : values) {
            count += i.get();
        }
        for (int i=0; i<count; i++) {
            context.write(key, NullWritable.get());
        }
    }
}

public class SortMR {
    public static void main(String[] args) throws Exception {
        if (args.length != 2) {
            System.err.println("sort <input file/dir> <output dir>");
            return;
        }

        Configuration conf = new Configuration();

        conf.set();

        Job job = Job.getInstance(conf, "sort-int");

        job.setJarByClass(SortMR.class);

        job.setMapperClass(SortMapper.class);

        job.setReducerClass(SortReducer.class);

        job.setMapOutputKeyClass(LongWritable.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(LongWritable.class);
        job.setOutputValueClass(NullWritable.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.setNumReduceTasks(1);
        job.waitForCompletion(true);
    }
}

实验CASE

默认配置运行

15/08/07 04:38:03 INFO mapreduce.Job: Running job: job_1438916755596_0002
15/08/07 04:38:10 INFO mapreduce.Job: Job job_1438916755596_0002 running in uber mode : false
15/08/07 04:38:10 INFO mapreduce.Job:  map 0% reduce 0%
15/08/07 04:38:26 INFO mapreduce.Job:  map 3% reduce 0%
15/08/07 04:38:27 INFO mapreduce.Job:  map 4% reduce 0%
...
15/08/07 04:39:14 INFO mapreduce.Job:  map 33% reduce 0%
15/08/07 04:39:15 INFO mapreduce.Job:  map 34% reduce 0%
15/08/07 04:39:17 INFO mapreduce.Job:  map 36% reduce 0%
15/08/07 04:39:18 INFO mapreduce.Job:  map 36% reduce 4%
15/08/07 04:39:19 INFO mapreduce.Job:  map 37% reduce 4%
15/08/07 04:39:21 INFO mapreduce.Job:  map 37% reduce 5%
15/08/07 04:39:24 INFO mapreduce.Job:  map 37% reduce 7%
15/08/07 04:39:28 INFO mapreduce.Job:  map 38% reduce 9%
...
15/08/07 04:41:06 INFO mapreduce.Job:  map 92% reduce 25%
15/08/07 04:41:08 INFO mapreduce.Job:  map 93% reduce 26%
15/08/07 04:41:09 INFO mapreduce.Job:  map 94% reduce 26%
15/08/07 04:41:14 INFO mapreduce.Job:  map 95% reduce 27%
15/08/07 04:41:17 INFO mapreduce.Job:  map 96% reduce 28%
15/08/07 04:41:19 INFO mapreduce.Job:  map 97% reduce 28%
15/08/07 04:41:21 INFO mapreduce.Job:  map 98% reduce 28%
15/08/07 04:41:23 INFO mapreduce.Job:  map 99% reduce 29%
15/08/07 04:41:26 INFO mapreduce.Job:  map 100% reduce 31%
15/08/07 04:41:29 INFO mapreduce.Job:  map 100% reduce 32%
15/08/07 04:41:32 INFO mapreduce.Job:  map 100% reduce 33%
15/08/07 04:46:22 INFO mapreduce.Job:  map 100% reduce 34%
15/08/07 04:46:25 INFO mapreduce.Job:  map 100% reduce 35%
15/08/07 04:46:28 INFO mapreduce.Job:  map 100% reduce 36%
...
15/08/07 04:56:56 INFO mapreduce.Job:  map 100% reduce 98%
15/08/07 04:57:14 INFO mapreduce.Job:  map 100% reduce 99%
15/08/07 04:57:32 INFO mapreduce.Job:  map 100% reduce 100%
15/08/07 04:57:42 INFO mapreduce.Job: Job job_1438916755596_0002 completed successfully

整个过程耗时约20分钟,mapper用了3分钟全部完成。reducer上面的时间则比较长。

Job Counters
                Killed map tasks=1
                Launched map tasks=66
                Launched reduce tasks=1
                Data-local map tasks=51
                Rack-local map tasks=15
                Total time spent by all maps in occupied slots (ms)=3658033
                Total time spent by all reduces in occupied slots (ms)=1116041
                Total time spent by all map tasks (ms)=3658033
                Total time spent by all reduce tasks (ms)=1116041
                Total vcore-seconds taken by all map tasks=3658033
                Total vcore-seconds taken by all reduce tasks=1116041
                Total megabyte-seconds taken by all map tasks=3745825792
                Total megabyte-seconds taken by all reduce tasks=1142825984
        Map-Reduce Framework
                Map input records=388738323
                Map output records=388738323
                Map output bytes=4664859876
                Map output materialized bytes=5442336912
                Input split bytes=7670
                Combine input records=0
                Combine output records=0
                Reduce input groups=371667599
                Reduce shuffle bytes=5442336912
                Reduce input records=388738323
                Reduce output records=388738323
                Spilled Records=1527999455
                Shuffled Maps =65
                Failed Shuffles=0
                Merged Map outputs=65
                GC time elapsed (ms)=59949
                CPU time spent (ms)=2838000
                Physical memory (bytes) snapshot=17842167808
                Virtual memory (bytes) snapshot=46181015552
                Total committed heap usage (bytes)=13626769408

从输出统计可以看到总共输入记录数为388738323,最后reduce输出记录总数为388738323两者是一致的,至少数量上没有问题。

增大合并路数

即修改io.sort.factor,同时进行合并的路数,减少反复合并写入读取磁盘的次数。这个factor数值越大则需要进行merge的轮数就越少。

conf.setInt("io.sort.factor", 100);

按照书上的例举,把它先设置为100。

15/08/07 08:29:31 INFO mapreduce.Job: Running job: job_1438916755596_0005
15/08/07 08:29:38 INFO mapreduce.Job: Job job_1438916755596_0005 running in uber mode : false
15/08/07 08:29:38 INFO mapreduce.Job:  map 0% reduce 0%
15/08/07 08:29:54 INFO mapreduce.Job:  map 2% reduce 0%
...
15/08/07 08:30:42 INFO mapreduce.Job:  map 34% reduce 0%
15/08/07 08:30:43 INFO mapreduce.Job:  map 35% reduce 4%
15/08/07 08:30:46 INFO mapreduce.Job:  map 36% reduce 5%
15/08/07 08:30:49 INFO mapreduce.Job:  map 37% reduce 7%
15/08/07 08:30:52 INFO mapreduce.Job:  map 38% reduce 9%
15/08/07 08:30:55 INFO mapreduce.Job:  map 38% reduce 11%
15/08/07 08:30:57 INFO mapreduce.Job:  map 39% reduce 11%
15/08/07 08:30:59 INFO mapreduce.Job:  map 41% reduce 11%
...
15/08/07 08:32:51 INFO mapreduce.Job:  map 98% reduce 27%
15/08/07 08:32:53 INFO mapreduce.Job:  map 99% reduce 27%
15/08/07 08:32:54 INFO mapreduce.Job:  map 100% reduce 28%
15/08/07 08:32:57 INFO mapreduce.Job:  map 100% reduce 30%
15/08/07 08:32:59 INFO mapreduce.Job:  map 100% reduce 32%
15/08/07 08:33:02 INFO mapreduce.Job:  map 100% reduce 44%
15/08/07 08:33:05 INFO mapreduce.Job:  map 100% reduce 67%
15/08/07 08:33:21 INFO mapreduce.Job:  map 100% reduce 68%
15/08/07 08:33:39 INFO mapreduce.Job:  map 100% reduce 69%
...
15/08/07 08:43:03 INFO mapreduce.Job:  map 100% reduce 99%
15/08/07 08:43:18 INFO mapreduce.Job:  map 100% reduce 100%
15/08/07 08:43:27 INFO mapreduce.Job: Job job_1438916755596_0005 completed successfully

总共耗时14min,可以看到在这里mapper时间还是大致和原先的一样,因为总共的数据在4GB左右而启动了66个Mapper,那么每个mapper划分到了约60MB的数据(mapper输出的数据也不会膨胀,因为输入是文本表示的数字而输出中间结果是LongWritable),这样的数据大小在默认的io.sort.mb为100MB的情况下都可以直接在内存中完成排序,并不需要外部的merge过程,因而io.sort.factor不会对Map过程产生什么影响。不过reducer的时间明显减少了,因为这个MR任务主要负担还是在reducer端,它需要对Mapper端得到数据进行merge操作,也就是说至少有66个mapper输出需要merge,那么原来io.sort.factor为10就不能一次性的merge至少完成需要两轮merge过程(第一轮66->6,第二轮6->1)。而如果把参数调整到100那么只需要一轮merge就可以完成了。

Job Counters
                Killed map tasks=1
                Launched map tasks=66
                Launched reduce tasks=1
                Data-local map tasks=57
                Rack-local map tasks=9
                Total time spent by all maps in occupied slots (ms)=3727216
                Total time spent by all reduces in occupied slots (ms)=775881
                Total time spent by all map tasks (ms)=3727216
                Total time spent by all reduce tasks (ms)=775881
                Total vcore-seconds taken by all map tasks=3727216
                Total vcore-seconds taken by all reduce tasks=775881
                Total megabyte-seconds taken by all map tasks=3816669184
                Total megabyte-seconds taken by all reduce tasks=794502144

Map-Reduce Framework
                Map input records=388738323
                Map output records=388738323
                Map output bytes=4664859876
                Map output materialized bytes=5442336912
                Input split bytes=7670
                Combine input records=0
                Combine output records=0
                Reduce input groups=371667599
                Reduce shuffle bytes=5442336912
                Reduce input records=388738323
                Reduce output records=388738323
                Spilled Records=1165028413
                Shuffled Maps =65
                Failed Shuffles=0
                Merged Map outputs=65
                GC time elapsed (ms)=56981
                CPU time spent (ms)=2405160
                Physical memory (bytes) snapshot=17848528896
                Virtual memory (bytes) snapshot=46269399040
                Total committed heap usage (bytes)=13616283648

从counter数据中我们可以发现默认参数下reducer需要1116s=18min完成,而调整了io.sort.factor后只需要775s=13min这是相当大的提升,时间减少了30%。由于我们估算原来需要两轮merge,现在只需要一轮,而其他参数都没变,可以推出一次merge的时间约为(1116 - 775)s = 341 s。

Spilled Records

这个按照标准的说法就是mapper和reducer中在各自工作时溢出到磁盘的记录数。什么叫溢出就是从一些缓冲中存储到硬盘上的过程,如果map后的sort缓冲,reducer端进行merge的缓冲。通过观察可以发现默认配置下(spilled records)/ (Reduce-output-records)约为3.93 =>4,而调整io.sort.factor后此比值为2.99 => 3,也就是说中间写到磁盘的记录数少了输出结果的数量,而一轮merge的写磁盘记录数也刚刚为这个数值,由此也可以推断调整参数却是减少了一轮merge过程。

但这里有个问题,就是spilled records数量是要排序数字的三倍,根据原有的分析如果map端内存完全可以容纳下mapper输出数据,那么map过程其实只有一次完整的spill,总的数量和记录数一致。而reducer端按照分析也只需要一次merge,也就是说这个比值应该在2,不知道哪里的写操作也算到了spilled records里面。

增大mapper端排序内存

即修改io.sort.mb的值,不过由于mapper本身时间就比较短,估计这个参数调整不会有太多作用。直接去调整内存值的话一般会报错(OutOfMemory)。因为默认配置给JVM虚拟机的空间最大为200MB,所以这里还要同时修改一下JVM的内存上限,即mapred.child.java.optsmapreduce.map.memory.mb前者是传给JVM的堆大小的参数,后者则用于描述整个JVM大概会占用的大小(还包括由它创建出来的进程),所以后者肯定是比前者要来的大的。三者的关系应该满足io.sort.mb < mapred.child.java.opts < mapreduce.map.memory.mb,调整参数代码:

conf.setInt("io.sort.mb", 500); // set io.sort.mb = 500MB
        conf.set("mapred.child.java.opts", "-Xmx800m"); // JVM HEAP = 800MB, default = 200MB

将map输出缓冲空间调整为500MB,相应的也增大了map端JVM的堆大小。

15/08/07 10:50:15 INFO mapreduce.Job: Job job_1438916755596_0008 running in uber mode : false
15/08/07 10:50:15 INFO mapreduce.Job:  map 0% reduce 0%
...
15/08/07 10:51:15 INFO mapreduce.Job:  map 34% reduce 0%
15/08/07 10:51:16 INFO mapreduce.Job:  map 37% reduce 4%
...
15/08/07 10:53:11 INFO mapreduce.Job:  map 99% reduce 15%
15/08/07 10:53:12 INFO mapreduce.Job:  map 100% reduce 15%
...
15/08/07 11:07:17 INFO mapreduce.Job:  map 100% reduce 100%
15/08/07 11:07:26 INFO mapreduce.Job: Job job_1438916755596_0008 completed successfully

Job Counters
                Killed map tasks=1
                Launched map tasks=66
                Launched reduce tasks=1
                Data-local map tasks=57
                Rack-local map tasks=9
                Total time spent by all maps in occupied slots (ms)=3330624
                Total time spent by all reduces in occupied slots (ms)=981770
                Total time spent by all map tasks (ms)=3330624
                Total time spent by all reduce tasks (ms)=981770
                Total vcore-seconds taken by all map tasks=3330624
                Total vcore-seconds taken by all reduce tasks=981770
                Total megabyte-seconds taken by all map tasks=3410558976
                Total megabyte-seconds taken by all reduce tasks=1005332480
        Map-Reduce Framework
                Map input records=388738323
                Map output records=388738323
                Map output bytes=4664859876
                Map output materialized bytes=5442336912
                Input split bytes=7670
                Combine input records=0
                Combine output records=0
                Reduce input groups=371667599
                Reduce shuffle bytes=5442336912
                Reduce input records=388738323
                Reduce output records=388738323
                Spilled Records=891105922
                Shuffled Maps =65
                Failed Shuffles=0
                Merged Map outputs=65
                GC time elapsed (ms)=217041
                CPU time spent (ms)=2437680
                Physical memory (bytes) snapshot=50441203712
                Virtual memory (bytes) snapshot=89002385408
                Total committed heap usage (bytes)=44534071296

由数据可知,总时间约为17min,(Spilled Records)/(Reduce output records) = 2.29,说明增大对mapper端的内存还是有一定效果的(mapper可能产生了超过100MB默认缓冲的数据,但是根据估算的话应该只有67MB左右的空间占用),使得spilled数明显减少。Total time spent by all map tasks的数值也相比前面两者降低了一些。

增大reducer端merge内存

在这个MR任务中,这个参数应该是最能提高速度的,由于reducer只有一个,我们可以把reducer的内存设的大一些,比如5GB使它能够容纳下mapper端的大部分输出。merge过程就可以在内存中进行了。merge所用的内存可以从比例和绝对大小进行设定,这里只使用比例设定,由于比例设定是按照JVM堆大小来定的所以我们需要对两个参数同时做修改。

conf.setInt("mapreduce.reduce.memory.mb", 5500);        // JVM process & its sub processes
        conf.set("mapreduce.reduce.java.opts", "-Xmx5000m");    // JVM max heap size
        conf.setFloat("mapred.job.shuffle.input.buffer.percent", 0.9f);       // using percentage
        conf.setFloat("mapreduce.reduce.shuffle.merge.percent", 0.9f);        // merge / total
        conf.setInt("mapred.inmem.merge.threshold", 0);         // disable value threshold

运行输出

15/08/07 14:29:54 INFO mapreduce.Job: Job job_1438916755596_0009 running in uber mode : false
15/08/07 14:29:54 INFO mapreduce.Job:  map 0% reduce 0%
15/08/07 14:30:12 INFO mapreduce.Job:  map 2% reduce 0%
15/08/07 14:30:13 INFO mapreduce.Job:  map 4% reduce 0%
15/08/07 14:30:14 INFO mapreduce.Job:  map 7% reduce 0%
15/08/07 14:30:15 INFO mapreduce.Job:  map 11% reduce 0%
..
15/08/07 14:33:36 INFO mapreduce.Job:  map 98% reduce 19%
15/08/07 14:33:42 INFO mapreduce.Job:  map 99% reduce 19%
15/08/07 14:33:49 INFO mapreduce.Job:  map 100% reduce 19%
...
15/08/07 14:35:31 INFO mapreduce.Job:  map 100% reduce 33%
15/08/07 14:36:37 INFO mapreduce.Job:  map 100% reduce 39%
...
15/08/07 14:44:23 INFO mapreduce.Job:  map 100% reduce 99%
15/08/07 14:44:38 INFO mapreduce.Job:  map 100% reduce 100%
15/08/07 14:44:45 INFO mapreduce.Job: Job job_1438916755596_0009 completed successfully
        Job Counters
                Killed map tasks=1
                Launched map tasks=66
                Launched reduce tasks=1
                Data-local map tasks=63
                Rack-local map tasks=3
                Total time spent by all maps in occupied slots (ms)=3736173
                Total time spent by all reduces in occupied slots (ms)=4700526
                Total time spent by all map tasks (ms)=3736173
                Total time spent by all reduce tasks (ms)=783421
                Total vcore-seconds taken by all map tasks=3736173
                Total vcore-seconds taken by all reduce tasks=783421
                Total megabyte-seconds taken by all map tasks=3825841152
                Total megabyte-seconds taken by all reduce tasks=4308815500
        Map-Reduce Framework
                Map input records=388738323
                Map output records=388738323
                Map output bytes=4664859876
                Map output materialized bytes=5442336912
                Input split bytes=7670
                Combine input records=0
                Combine output records=0
                Reduce input groups=371667599
                Reduce shuffle bytes=5442336912
                Reduce input records=388738323
                Reduce output records=388738323
                Spilled Records=1165028413
                Shuffled Maps =65
                Failed Shuffles=0
                Merged Map outputs=65
                GC time elapsed (ms)=51656
                CPU time spent (ms)=2575510
                Physical memory (bytes) snapshot=21428072448
                Virtual memory (bytes) snapshot=51386806272
                Total committed heap usage (bytes)=16239820800

总共耗时约15min,仅次与调整io.sort.factor的14min。不过从(Spilled Records) / (Reduce output records) = 2.99,可以发现它与调整io.sort.factor时的情况非常类似。这个情况还是有些问题的,于是来看一下reducer端的日志:

>2015-08-08 04:28:55,296 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2015-08-08 04:28:55,360 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2015-08-08 04:28:55,360 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system started
2015-08-08 04:28:55,371 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2015-08-08 04:28:55,371 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1439006457627_0002, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@411e6b7e)
2015-08-08 04:28:55,463 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2015-08-08 04:28:55,902 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /tmp/hadoop/nm-local-dir/usercache/ubuntu/appcache/application_1439006457627_0002
2015-08-08 04:28:56,675 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2015-08-08 04:28:57,117 INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2015-08-08 04:28:57,160 INFO [main] org.apache.hadoop.mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@38ed6f20
2015-08-08 04:28:57,178 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: MergerManager: memoryLimit=1932735232, maxSingleShuffleLimit=483183808, mergeThreshold=1739461632, ioSortFactor=10, memToMemMergeOutputsThreshold=10

最后一行有关于MergeManagerImpl这个类获得的一些参数:

memoryLimit=1932735232 约为1.9GB

maxSingleShuffleLimit=483183808 约为480MB

mergeThreshold=1739461632 约为1.7GB

ioSortFactor=10这个是默认值

memToMemMergeOutputsThreshold=10这个暂时不管

可以发现mergeThreshold * 0.9 = 1.7 和 mergeThreshold比较接近,应该说mapreduce.reduce.shuffle.merge.percent是起到了作用的(即占用多少input缓冲后开始merge操作并输出)。但是我们明明在程序中给reducer任务分配了5GB的内存为什么这里的上限是按照1.9GB来算呢?是不是mapreduce.reduce.java.opts参数没有起作用?后来重新运行任务后,在执行reducer任务的机器上执行ps命令发现JVM启动参数中包含了关于内存的配置。既然输出的log数值有疑问,下面就去看看这个MergeManagerImpl类,看其是如何计算得到这个数值的。