Hadoop 源码详解之 DBOutputFormat

1. 类释义

A OutputFormat that sends the reduce output to a SQL table.
一种将Reduce 输出到一个SQL表中的输出格式。

DBOutputFormat accepts <key,value> pairs, where key has a type extending DBWritable. Returned RecordWriter writes only the key to the database with a batch SQL query.
DBOuputFormat 接收<key,value>对,这其中的key实现了 DBWritable 接口。返回的RecordWriter 使用一个批处理的SQL 查询语句 仅仅把键写到database中。

2. 类源码

3. 方法详解

3.1 setOutput()方法
  • 方法释义

Initializes the reduce-part of the job with the appropriate output settings
使用合适的输出设置初始化job的 reduce部分。

  • 方法源码
/**
   * @param job The job
   * @param tableName The table to insert data into
   * @param fieldNames The field names in the table.
   */
  public static void setOutput(Job job, String tableName, 
      String... fieldNames) throws IOException {
    if(fieldNames.length > 0 && fieldNames[0] != null) {
      DBConfiguration dbConf = setOutput(job, tableName);
      dbConf.setOutputFieldNames(fieldNames);
    } else {
      if (fieldNames.length > 0) {
        setOutput(job, tableName, fieldNames.length);
      }
      else { 
        throw new IllegalArgumentException(
          "Field names must be greater than 0");
      }
    }
  }

注意上述的输出字段结果可以是多个列,所以其参数使用的是String ... fieldName