【Structured Streaming】-- 输出模式

原创

high2011 2022-11-03 14:03:57 博主文章分类：Scala ©著作权

©著作权归作者所有：来自51CTO博客作者high2011的原创作品，请联系作者获取转载授权，否则将追究法律责任

1、环境

spark 2.4.0
scala 2.11.8
jdk 1.8
maven

<dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
            <version>2.4.0</version>
</dependency>

2、源码说明

package org.apache.spark.sql.streaming;

import org.apache.spark.annotation.InterfaceStability;
import org.apache.spark.sql.catalyst.streaming.InternalOutputModes;

/**
 * OutputMode describes what data will be written to a streaming sink when there is
 * new data available in a streaming DataFrame/Dataset.
 *
 * @since 2.0.0
 */
@InterfaceStability.Evolving
public class OutputMode {

  /**
   * OutputMode in which only the new rows in the streaming DataFrame/Dataset will be
   * written to the sink. This output mode can be only be used in queries that do not
   * contain any aggregation.
   *
   * @since 2.0.0
   */
  public static OutputMode Append() {
    return InternalOutputModes.Append$.MODULE$;
  }

  /**
   * OutputMode in which all the rows in the streaming DataFrame/Dataset will be written
   * to the sink every time there are some updates. This output mode can only be used in queries
   * that contain aggregations.
   *
   * @since 2.0.0
   */
  public static OutputMode Complete() {
    return InternalOutputModes.Complete$.MODULE$;
  }

  /**
   * OutputMode in which only the rows that were updated in the streaming DataFrame/Dataset will
   * be written to the sink every time there are some updates. If the query doesn't contain
   * aggregations, it will be equivalent to `Append` mode.
   *
   * @since 2.1.1
   */
  public static OutputMode Update() {
    return InternalOutputModes.Update$.MODULE$;
  }
}

3、区别

Append ：默认模式这种模式保证每一行只输出一次（假设是容错接收器）。只输出结果表中本批次新增的数据，即本批次中的数据。支持使用：select、where、map、flatMap、filter、join等的查询。
Completed：每次触发后，输出最新的完整的结果表数据。支持使用：聚合查询。
Updated：（自Spark 2.1.1起可用）只输出结果表中被本批次修改的数据。