Hadoop源码详解之DBOutputFormat类

原创

说文科技 2021-07-07 15:38:13 ©著作权

©著作权归作者所有：来自51CTO博客作者说文科技的原创作品，请联系作者获取转载授权，否则将追究法律责任

`Hadoop` 源码详解之 `DBOutputFormat` 类

1. 类释义

A OutputFormat that sends the reduce output to a SQL table.
一种将Reduce 输出到一个SQL表中的输出格式。

DBOutputFormat accepts <key,value> pairs, where key has a type extending DBWritable. Returned RecordWriter writes only the key to the database with a batch SQL query.
DBOuputFormat 接收<key,value>对，这其中的key实现了 DBWritable 接口。返回的RecordWriter 使用一个批处理的SQL 查询语句仅仅把键写到database中。

2. 类源码

3. 方法详解

3.1 `setOutput()`方法

方法释义

Initializes the reduce-part of the job with the appropriate output settings
使用合适的输出设置初始化job的 reduce部分。

方法源码

/**
   * @param job The job
   * @param tableName The table to insert data into
   * @param fieldNames The field names in the table.
   */
  public static void setOutput(Job job, String tableName, 
      String... fieldNames) throws IOException {
    if(fieldNames.length > 0 && fieldNames[0] != null) {
      DBConfiguration dbConf = setOutput(job, tableName);
      dbConf.setOutputFieldNames(fieldNames);
    } else {
      if (fieldNames.length > 0) {
        setOutput(job, tableName, fieldNames.length);
      }
      else { 
        throw new IllegalArgumentException(
          "Field names must be greater than 0");
      }
    }
  }

注意上述的输出字段结果可以是多个列，所以其参数使用的是String ... fieldName。