源码中Mapper类中的方法
/** * The Context passed on to the {@link Mapper} implementations. */ public abstract class Context implements MapContext{ }
上下文map结束后,向reduce或者下一个阶段写数据时候
/** * Called once at the beginning of the task. */ protected void setup(Context context ) throws IOException, InterruptedException { // NOTHING }
任务开始的时候被调用一次
/** * Called once for each key/value pair in the input split. Most applications * 对于输入分割中的每个键/值对调用一次。所有的应用程序 * should override this, but the default is the identity function. * 应该重写这个,但默认是identity函数 * 这里的key和value是输入的 */ @SuppressWarnings("unchecked") protected void map(KEYIN key, VALUEIN value, Context context) throws IOException, InterruptedException { //输出的key-value context是上下文,属于管理者 context.write((KEYOUT) key, (VALUEOUT) value); }
处理整个map阶段的核心业务
/** * Called once at the end of the task. */ protected void cleanup(Context context ) throws IOException, InterruptedException { // NOTHING }
任务结束的时候
/** * Expert users can override this method for more complete control over the * 专家用户可以重写此方法以更完整地控制执行的mapper * execution of the Mapper. * @param context * @throws IOException */ public void run(Context context) throws IOException, InterruptedException { //初始化数据(初始化集合,加载表等) setup(context); try { while (context.nextKeyValue()) { //核心业务逻辑 map(context.getCurrentKey(), context.getCurrentValue(), context); } } finally { //最终结束:流的关闭,资源的处理 cleanup(context); } } }
具体的执行map方法的顺序
Reducer类
/** * The Context passed on to the {@link Reducer} implementations. */ public abstract class Context implements ReduceContext{ }
负责写出数据的
/** * Called once at the start of the task. */ protected void setup(Context context ) throws IOException, InterruptedException { // NOTHING }
开始的时候调用,初始化操作
/** * This method is called once for each key(这个方法被所有key使用). Most applications will define * their reduce class by overriding this method(所有的应用都会重写这个方法). The default implementation(默认是identity函数) * is an identity function. */ @SuppressWarnings("unchecked") protected void reduce(KEYIN key, Iterablevalues, Context context ) throws IOException, InterruptedException { for(VALUEIN value: values) { context.write((KEYOUT) key, (VALUEOUT) value); } }
具体的Reducer业务逻辑
/** * Called once at the end of the task. */ protected void cleanup(Context context ) throws IOException, InterruptedException { // NOTHING }
收尾的一些关闭流的操作
/** * Advanced application writers can use the 高级应用程序编写者可以使用 * {@link #run(org.apache.hadoop.mapreduce.Reducer.Context)} method to * control how the reduce task works.控制整个reduce task工作 */ public void run(Context context) throws IOException, InterruptedException { setup(context); try { while (context.nextKey()) { reduce(context.getCurrentKey(), context.getValues(), context); // If a back up store is used, reset it Iteratoriter = context.getValues().iterator(); if(iter instanceof ReduceContext.ValueIterator) { ((ReduceContext.ValueIterator)iter).resetBackupStore(); } } } finally { cleanup(context); } }
将所有方法串在一起