参考
- hadoop权威指南 第六章,6.4节
背景
hadoop,mapreduce就如MVC,spring一样现在已经是烂大街了,虽然用过,但是说看过源码么,没有,调过参数么?调过,调到刚好能跑起来。现在有时间看看hadoop权威指南,感觉真是走了许多弯路。
MR流程
参数
共同影响
io.sort.factor
多路合并允许的最大输入路数。设成较大的值可以减少合并轮数,从而减少磁盘读写次数。
map端
io.sort.mb
map端输出缓冲区大小,map输出先放到这里然后在通过排序和partition再写入本地磁盘,等待再次merge直到map过程结束数据被reduce端获取。
io.sort.spill.percent
map端输出数据占输出缓冲区多少比例时开始刷出到磁盘。这个应该取决于map端输出速度和磁盘写入速度比例,就是一个一般的有界缓冲+生产者消费者问题。
reduce端
mapred.job.shuffle.input.buffer.percent
reduce端的输入缓冲区比例(占JVM堆空间),如果从map端拉取到的数据全部能够放下则可以直接在内存中完成map的输出合并,不用写入磁盘,直接作为reduce的输入。
注意这是个坑,当此比例与JVM可用堆空间乘积超过Intger.MAX_VALUE时会不声不响的使用Intger.MAX_VALUE作为上限。必须强行设置
mapreduce.reduce.memory.totalbytes
参数来定义最大可用堆大小
mapred.job.shuffle.merge.percent
reduce端的输入缓冲区使用达到多少比例时开始merge到磁盘的过程。即当reduce端接收map端数据超过heapsize * mapred.job.shuffle.input.buffer.percent * mapred.job.shuffle.merge.percent
开始向本地磁盘输出merge结果。
mapred.inmem.merge.threshold
reduce端的输入缓冲区使用达到多少大小(MB)时开始merge到磁盘的过程。这里使用的是一个具体数值而不是比例。如果把这项设为0,则控制有比例参数计算得出。
mapred.job.reduce.input.buffer.percent
reduce端在进行reduce操作之前剩余在输入缓冲区的数据占堆空间的比例。因为reduce端最后一趟reduce的输入不用完全来自磁盘,它可以通过多路merge的过程直接获取来自磁盘或者内存(内存中的是已合并但为输出到磁盘的map输出数据)的数据。如果设定为0的话就是强制把缓冲清空,将所有合并结果写入磁盘。
这里也是有个坑,和上面的一样最多得到的大小不会超过2GB,也没什么附加参数可以修正的。
实验
对4GB整数(存储数字文件的大小,以文本形式存储)进行一个排序。
环境
Hadoop 2.6.0
1 Namenode + 1 ResourceManager + 3 DataNode&NodeManager
实验准备
使用如下命令产生4个包含随机整数的文本文件:
echo $(od -An -N4 -i /dev/urandom) >> out.data
生成的每个文件约1GB。因为使用了Linux上的随机数发生器,生成数据的过程有些慢,可以在四台机器上分别进行,最后将得到的数据文件上传到HDFS的(当前用户的home目录中)sort_integer文件夹中:
ubuntu@dev00:~/sort-mr$ hadoop fs -ls sort_integer
Found 4 items
-rw-r--r-- 1 ubuntu supergroup 1043473519 2015-08-06 13:05 sort_integer/int00.data
-rw-r--r-- 1 ubuntu supergroup 1075257196 2015-08-06 13:07 sort_integer/int01.data
-rw-r--r-- 1 ubuntu supergroup 1063854482 2015-08-06 13:08 sort_integer/int02.data
-rw-r--r-- 1 ubuntu supergroup 1086774112 2015-08-06 13:08 sort_integer/int03.data
MapReduce程序
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import java.io.IOException;
class SortMapper extends Mapper<Object, Text, LongWritable, IntWritable> {
private LongWritable num = new LongWritable(0);
private IntWritable one = new IntWritable(1);
@Override
protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
num.set(Long.valueOf(value.toString().trim()));
context.write(num, one);
}
}
class SortReducer extends Reducer<LongWritable, IntWritable, LongWritable, NullWritable> {
@Override
protected void reduce(LongWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int count = 0;
for (IntWritable i : values) {
count += i.get();
}
for (int i=0; i<count; i++) {
context.write(key, NullWritable.get());
}
}
}
public class SortMR {
public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("sort <input file/dir> <output dir>");
return;
}
Configuration conf = new Configuration();
conf.set();
Job job = Job.getInstance(conf, "sort-int");
job.setJarByClass(SortMR.class);
job.setMapperClass(SortMapper.class);
job.setReducerClass(SortReducer.class);
job.setMapOutputKeyClass(LongWritable.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(NullWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setNumReduceTasks(1);
job.waitForCompletion(true);
}
}
实验CASE
默认配置运行
15/08/07 04:38:03 INFO mapreduce.Job: Running job: job_1438916755596_0002
15/08/07 04:38:10 INFO mapreduce.Job: Job job_1438916755596_0002 running in uber mode : false
15/08/07 04:38:10 INFO mapreduce.Job: map 0% reduce 0%
15/08/07 04:38:26 INFO mapreduce.Job: map 3% reduce 0%
15/08/07 04:38:27 INFO mapreduce.Job: map 4% reduce 0%
...
15/08/07 04:39:14 INFO mapreduce.Job: map 33% reduce 0%
15/08/07 04:39:15 INFO mapreduce.Job: map 34% reduce 0%
15/08/07 04:39:17 INFO mapreduce.Job: map 36% reduce 0%
15/08/07 04:39:18 INFO mapreduce.Job: map 36% reduce 4%
15/08/07 04:39:19 INFO mapreduce.Job: map 37% reduce 4%
15/08/07 04:39:21 INFO mapreduce.Job: map 37% reduce 5%
15/08/07 04:39:24 INFO mapreduce.Job: map 37% reduce 7%
15/08/07 04:39:28 INFO mapreduce.Job: map 38% reduce 9%
...
15/08/07 04:41:06 INFO mapreduce.Job: map 92% reduce 25%
15/08/07 04:41:08 INFO mapreduce.Job: map 93% reduce 26%
15/08/07 04:41:09 INFO mapreduce.Job: map 94% reduce 26%
15/08/07 04:41:14 INFO mapreduce.Job: map 95% reduce 27%
15/08/07 04:41:17 INFO mapreduce.Job: map 96% reduce 28%
15/08/07 04:41:19 INFO mapreduce.Job: map 97% reduce 28%
15/08/07 04:41:21 INFO mapreduce.Job: map 98% reduce 28%
15/08/07 04:41:23 INFO mapreduce.Job: map 99% reduce 29%
15/08/07 04:41:26 INFO mapreduce.Job: map 100% reduce 31%
15/08/07 04:41:29 INFO mapreduce.Job: map 100% reduce 32%
15/08/07 04:41:32 INFO mapreduce.Job: map 100% reduce 33%
15/08/07 04:46:22 INFO mapreduce.Job: map 100% reduce 34%
15/08/07 04:46:25 INFO mapreduce.Job: map 100% reduce 35%
15/08/07 04:46:28 INFO mapreduce.Job: map 100% reduce 36%
...
15/08/07 04:56:56 INFO mapreduce.Job: map 100% reduce 98%
15/08/07 04:57:14 INFO mapreduce.Job: map 100% reduce 99%
15/08/07 04:57:32 INFO mapreduce.Job: map 100% reduce 100%
15/08/07 04:57:42 INFO mapreduce.Job: Job job_1438916755596_0002 completed successfully
整个过程耗时约20分钟,mapper用了3分钟全部完成。reducer上面的时间则比较长。
Job Counters
Killed map tasks=1
Launched map tasks=66
Launched reduce tasks=1
Data-local map tasks=51
Rack-local map tasks=15
Total time spent by all maps in occupied slots (ms)=3658033
Total time spent by all reduces in occupied slots (ms)=1116041
Total time spent by all map tasks (ms)=3658033
Total time spent by all reduce tasks (ms)=1116041
Total vcore-seconds taken by all map tasks=3658033
Total vcore-seconds taken by all reduce tasks=1116041
Total megabyte-seconds taken by all map tasks=3745825792
Total megabyte-seconds taken by all reduce tasks=1142825984
Map-Reduce Framework
Map input records=388738323
Map output records=388738323
Map output bytes=4664859876
Map output materialized bytes=5442336912
Input split bytes=7670
Combine input records=0
Combine output records=0
Reduce input groups=371667599
Reduce shuffle bytes=5442336912
Reduce input records=388738323
Reduce output records=388738323
Spilled Records=1527999455
Shuffled Maps =65
Failed Shuffles=0
Merged Map outputs=65
GC time elapsed (ms)=59949
CPU time spent (ms)=2838000
Physical memory (bytes) snapshot=17842167808
Virtual memory (bytes) snapshot=46181015552
Total committed heap usage (bytes)=13626769408
从输出统计可以看到总共输入记录数为388738323
,最后reduce输出记录总数为388738323
两者是一致的,至少数量上没有问题。
增大合并路数
即修改io.sort.factor
,同时进行合并的路数,减少反复合并写入读取磁盘的次数。这个factor数值越大则需要进行merge的轮数就越少。
conf.setInt("io.sort.factor", 100);
按照书上的例举,把它先设置为100。
15/08/07 08:29:31 INFO mapreduce.Job: Running job: job_1438916755596_0005
15/08/07 08:29:38 INFO mapreduce.Job: Job job_1438916755596_0005 running in uber mode : false
15/08/07 08:29:38 INFO mapreduce.Job: map 0% reduce 0%
15/08/07 08:29:54 INFO mapreduce.Job: map 2% reduce 0%
...
15/08/07 08:30:42 INFO mapreduce.Job: map 34% reduce 0%
15/08/07 08:30:43 INFO mapreduce.Job: map 35% reduce 4%
15/08/07 08:30:46 INFO mapreduce.Job: map 36% reduce 5%
15/08/07 08:30:49 INFO mapreduce.Job: map 37% reduce 7%
15/08/07 08:30:52 INFO mapreduce.Job: map 38% reduce 9%
15/08/07 08:30:55 INFO mapreduce.Job: map 38% reduce 11%
15/08/07 08:30:57 INFO mapreduce.Job: map 39% reduce 11%
15/08/07 08:30:59 INFO mapreduce.Job: map 41% reduce 11%
...
15/08/07 08:32:51 INFO mapreduce.Job: map 98% reduce 27%
15/08/07 08:32:53 INFO mapreduce.Job: map 99% reduce 27%
15/08/07 08:32:54 INFO mapreduce.Job: map 100% reduce 28%
15/08/07 08:32:57 INFO mapreduce.Job: map 100% reduce 30%
15/08/07 08:32:59 INFO mapreduce.Job: map 100% reduce 32%
15/08/07 08:33:02 INFO mapreduce.Job: map 100% reduce 44%
15/08/07 08:33:05 INFO mapreduce.Job: map 100% reduce 67%
15/08/07 08:33:21 INFO mapreduce.Job: map 100% reduce 68%
15/08/07 08:33:39 INFO mapreduce.Job: map 100% reduce 69%
...
15/08/07 08:43:03 INFO mapreduce.Job: map 100% reduce 99%
15/08/07 08:43:18 INFO mapreduce.Job: map 100% reduce 100%
15/08/07 08:43:27 INFO mapreduce.Job: Job job_1438916755596_0005 completed successfully
总共耗时14min,可以看到在这里mapper时间还是大致和原先的一样,因为总共的数据在4GB左右而启动了66个Mapper,那么每个mapper划分到了约60MB的数据(mapper输出的数据也不会膨胀,因为输入是文本表示的数字而输出中间结果是LongWritable),这样的数据大小在默认的io.sort.mb
为100MB的情况下都可以直接在内存中完成排序,并不需要外部的merge过程,因而io.sort.factor
不会对Map过程产生什么影响。不过reducer的时间明显减少了,因为这个MR任务主要负担还是在reducer端,它需要对Mapper端得到数据进行merge操作,也就是说至少有66个mapper输出需要merge,那么原来io.sort.factor
为10就不能一次性的merge至少完成需要两轮merge过程(第一轮66->6,第二轮6->1)。而如果把参数调整到100那么只需要一轮merge就可以完成了。
Job Counters
Killed map tasks=1
Launched map tasks=66
Launched reduce tasks=1
Data-local map tasks=57
Rack-local map tasks=9
Total time spent by all maps in occupied slots (ms)=3727216
Total time spent by all reduces in occupied slots (ms)=775881
Total time spent by all map tasks (ms)=3727216
Total time spent by all reduce tasks (ms)=775881
Total vcore-seconds taken by all map tasks=3727216
Total vcore-seconds taken by all reduce tasks=775881
Total megabyte-seconds taken by all map tasks=3816669184
Total megabyte-seconds taken by all reduce tasks=794502144
Map-Reduce Framework
Map input records=388738323
Map output records=388738323
Map output bytes=4664859876
Map output materialized bytes=5442336912
Input split bytes=7670
Combine input records=0
Combine output records=0
Reduce input groups=371667599
Reduce shuffle bytes=5442336912
Reduce input records=388738323
Reduce output records=388738323
Spilled Records=1165028413
Shuffled Maps =65
Failed Shuffles=0
Merged Map outputs=65
GC time elapsed (ms)=56981
CPU time spent (ms)=2405160
Physical memory (bytes) snapshot=17848528896
Virtual memory (bytes) snapshot=46269399040
Total committed heap usage (bytes)=13616283648
从counter数据中我们可以发现默认参数下reducer需要1116s=18min完成,而调整了io.sort.factor
后只需要775s=13min这是相当大的提升,时间减少了30%。由于我们估算原来需要两轮merge,现在只需要一轮,而其他参数都没变,可以推出一次merge的时间约为(1116 - 775)s = 341 s。
Spilled Records
这个按照标准的说法就是mapper和reducer中在各自工作时溢出到磁盘的记录数。什么叫溢出就是从一些缓冲中存储到硬盘上的过程,如果map后的sort缓冲,reducer端进行merge的缓冲。通过观察可以发现默认配置下(spilled records)/ (Reduce-output-records)约为3.93 =>4,而调整io.sort.factor
后此比值为2.99 => 3,也就是说中间写到磁盘的记录数少了输出结果的数量,而一轮merge的写磁盘记录数也刚刚为这个数值,由此也可以推断调整参数却是减少了一轮merge过程。
但这里有个问题,就是spilled records数量是要排序数字的三倍,根据原有的分析如果map端内存完全可以容纳下mapper输出数据,那么map过程其实只有一次完整的spill,总的数量和记录数一致。而reducer端按照分析也只需要一次merge,也就是说这个比值应该在2,不知道哪里的写操作也算到了spilled records里面。
增大mapper端排序内存
即修改io.sort.mb
的值,不过由于mapper本身时间就比较短,估计这个参数调整不会有太多作用。直接去调整内存值的话一般会报错(OutOfMemory)。因为默认配置给JVM虚拟机的空间最大为200MB,所以这里还要同时修改一下JVM的内存上限,即mapred.child.java.opts
和mapreduce.map.memory.mb
前者是传给JVM的堆大小的参数,后者则用于描述整个JVM大概会占用的大小(还包括由它创建出来的进程),所以后者肯定是比前者要来的大的。三者的关系应该满足io.sort.mb < mapred.child.java.opts < mapreduce.map.memory.mb
,调整参数代码:
conf.setInt("io.sort.mb", 500); // set io.sort.mb = 500MB
conf.set("mapred.child.java.opts", "-Xmx800m"); // JVM HEAP = 800MB, default = 200MB
将map输出缓冲空间调整为500MB,相应的也增大了map端JVM的堆大小。
15/08/07 10:50:15 INFO mapreduce.Job: Job job_1438916755596_0008 running in uber mode : false
15/08/07 10:50:15 INFO mapreduce.Job: map 0% reduce 0%
...
15/08/07 10:51:15 INFO mapreduce.Job: map 34% reduce 0%
15/08/07 10:51:16 INFO mapreduce.Job: map 37% reduce 4%
...
15/08/07 10:53:11 INFO mapreduce.Job: map 99% reduce 15%
15/08/07 10:53:12 INFO mapreduce.Job: map 100% reduce 15%
...
15/08/07 11:07:17 INFO mapreduce.Job: map 100% reduce 100%
15/08/07 11:07:26 INFO mapreduce.Job: Job job_1438916755596_0008 completed successfully
Job Counters
Killed map tasks=1
Launched map tasks=66
Launched reduce tasks=1
Data-local map tasks=57
Rack-local map tasks=9
Total time spent by all maps in occupied slots (ms)=3330624
Total time spent by all reduces in occupied slots (ms)=981770
Total time spent by all map tasks (ms)=3330624
Total time spent by all reduce tasks (ms)=981770
Total vcore-seconds taken by all map tasks=3330624
Total vcore-seconds taken by all reduce tasks=981770
Total megabyte-seconds taken by all map tasks=3410558976
Total megabyte-seconds taken by all reduce tasks=1005332480
Map-Reduce Framework
Map input records=388738323
Map output records=388738323
Map output bytes=4664859876
Map output materialized bytes=5442336912
Input split bytes=7670
Combine input records=0
Combine output records=0
Reduce input groups=371667599
Reduce shuffle bytes=5442336912
Reduce input records=388738323
Reduce output records=388738323
Spilled Records=891105922
Shuffled Maps =65
Failed Shuffles=0
Merged Map outputs=65
GC time elapsed (ms)=217041
CPU time spent (ms)=2437680
Physical memory (bytes) snapshot=50441203712
Virtual memory (bytes) snapshot=89002385408
Total committed heap usage (bytes)=44534071296
由数据可知,总时间约为17min,(Spilled Records)/(Reduce output records) = 2.29,说明增大对mapper端的内存还是有一定效果的(mapper可能产生了超过100MB默认缓冲的数据,但是根据估算的话应该只有67MB左右的空间占用),使得spilled数明显减少。Total time spent by all map tasks的数值也相比前面两者降低了一些。
增大reducer端merge内存
在这个MR任务中,这个参数应该是最能提高速度的,由于reducer只有一个,我们可以把reducer的内存设的大一些,比如5GB使它能够容纳下mapper端的大部分输出。merge过程就可以在内存中进行了。merge所用的内存可以从比例和绝对大小进行设定,这里只使用比例设定,由于比例设定是按照JVM堆大小来定的所以我们需要对两个参数同时做修改。
conf.setInt("mapreduce.reduce.memory.mb", 5500); // JVM process & its sub processes
conf.set("mapreduce.reduce.java.opts", "-Xmx5000m"); // JVM max heap size
conf.setFloat("mapred.job.shuffle.input.buffer.percent", 0.9f); // using percentage
conf.setFloat("mapreduce.reduce.shuffle.merge.percent", 0.9f); // merge / total
conf.setInt("mapred.inmem.merge.threshold", 0); // disable value threshold
运行输出
15/08/07 14:29:54 INFO mapreduce.Job: Job job_1438916755596_0009 running in uber mode : false
15/08/07 14:29:54 INFO mapreduce.Job: map 0% reduce 0%
15/08/07 14:30:12 INFO mapreduce.Job: map 2% reduce 0%
15/08/07 14:30:13 INFO mapreduce.Job: map 4% reduce 0%
15/08/07 14:30:14 INFO mapreduce.Job: map 7% reduce 0%
15/08/07 14:30:15 INFO mapreduce.Job: map 11% reduce 0%
..
15/08/07 14:33:36 INFO mapreduce.Job: map 98% reduce 19%
15/08/07 14:33:42 INFO mapreduce.Job: map 99% reduce 19%
15/08/07 14:33:49 INFO mapreduce.Job: map 100% reduce 19%
...
15/08/07 14:35:31 INFO mapreduce.Job: map 100% reduce 33%
15/08/07 14:36:37 INFO mapreduce.Job: map 100% reduce 39%
...
15/08/07 14:44:23 INFO mapreduce.Job: map 100% reduce 99%
15/08/07 14:44:38 INFO mapreduce.Job: map 100% reduce 100%
15/08/07 14:44:45 INFO mapreduce.Job: Job job_1438916755596_0009 completed successfully
Job Counters
Killed map tasks=1
Launched map tasks=66
Launched reduce tasks=1
Data-local map tasks=63
Rack-local map tasks=3
Total time spent by all maps in occupied slots (ms)=3736173
Total time spent by all reduces in occupied slots (ms)=4700526
Total time spent by all map tasks (ms)=3736173
Total time spent by all reduce tasks (ms)=783421
Total vcore-seconds taken by all map tasks=3736173
Total vcore-seconds taken by all reduce tasks=783421
Total megabyte-seconds taken by all map tasks=3825841152
Total megabyte-seconds taken by all reduce tasks=4308815500
Map-Reduce Framework
Map input records=388738323
Map output records=388738323
Map output bytes=4664859876
Map output materialized bytes=5442336912
Input split bytes=7670
Combine input records=0
Combine output records=0
Reduce input groups=371667599
Reduce shuffle bytes=5442336912
Reduce input records=388738323
Reduce output records=388738323
Spilled Records=1165028413
Shuffled Maps =65
Failed Shuffles=0
Merged Map outputs=65
GC time elapsed (ms)=51656
CPU time spent (ms)=2575510
Physical memory (bytes) snapshot=21428072448
Virtual memory (bytes) snapshot=51386806272
Total committed heap usage (bytes)=16239820800
总共耗时约15min,仅次与调整io.sort.factor
的14min。不过从(Spilled Records) / (Reduce output records) = 2.99,可以发现它与调整io.sort.factor
时的情况非常类似。这个情况还是有些问题的,于是来看一下reducer端的日志:
>2015-08-08 04:28:55,296 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2015-08-08 04:28:55,360 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2015-08-08 04:28:55,360 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ReduceTask metrics system started
2015-08-08 04:28:55,371 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2015-08-08 04:28:55,371 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1439006457627_0002, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@411e6b7e)
2015-08-08 04:28:55,463 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2015-08-08 04:28:55,902 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /tmp/hadoop/nm-local-dir/usercache/ubuntu/appcache/application_1439006457627_0002
2015-08-08 04:28:56,675 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2015-08-08 04:28:57,117 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2015-08-08 04:28:57,160 INFO [main] org.apache.hadoop.mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@38ed6f20
2015-08-08 04:28:57,178 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: MergerManager: memoryLimit=1932735232, maxSingleShuffleLimit=483183808, mergeThreshold=1739461632, ioSortFactor=10, memToMemMergeOutputsThreshold=10
最后一行有关于MergeManagerImpl这个类获得的一些参数:
memoryLimit=1932735232 约为1.9GB
maxSingleShuffleLimit=483183808 约为480MB
mergeThreshold=1739461632 约为1.7GB
ioSortFactor=10这个是默认值
memToMemMergeOutputsThreshold=10这个暂时不管
可以发现mergeThreshold * 0.9 = 1.7 和 mergeThreshold比较接近,应该说mapreduce.reduce.shuffle.merge.percent
是起到了作用的(即占用多少input缓冲后开始merge操作并输出)。但是我们明明在程序中给reducer任务分配了5GB的内存为什么这里的上限是按照1.9GB来算呢?是不是mapreduce.reduce.java.opts
参数没有起作用?后来重新运行任务后,在执行reducer任务的机器上执行ps命令发现JVM启动参数中包含了关于内存的配置。既然输出的log数值有疑问,下面就去看看这个MergeManagerImpl
类,看其是如何计算得到这个数值的。