文章目录

  • Flink 系列文章
  • 一、maven依赖
  • 二、Sink示例
  • 1、JDBC示例
  • 1)、maven依赖
  • 2)、实现
  • 3)、验证
  • 2、kafka示例
  • 1)、maven依赖
  • 2)、实现
  • 3)、验证
  • 3、redis示例
  • 1)、API介绍
  • 2)、maven依赖
  • 3)、实现
  • 4)、验证
  • 4、mysql示例
  • 1)、mavne依赖
  • 2)、实现
  • 3)、验证
  • 5、DistributedCache示例
  • 1)maven依赖
  • 2)、实现
  • 3)、验证
  • 6、Broadcast变量示例
  • 1)、实现
  • 2)、验证


本文详细的介绍了Flink的sink6种方式,即JDBC、Mysql、Kafka、redis、分布式缓存和广播变量,以及其实现代码、验证步骤。
本文依赖kafka的环境可用、redis环境可用。
本文分为三个部分,即maven依赖、sink的实现示例。

一、maven依赖

下文中所有示例都是用该maven依赖,除非有特殊说明的情况。

<properties>
        <encoding>UTF-8</encoding>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <java.version>1.8</java.version>
        <scala.version>2.12</scala.version>
        <flink.version>1.12.0</flink.version>
    </properties>

    <dependencies>
            <dependency>
            <groupId>jdk.tools</groupId>
            <artifactId>jdk.tools</artifactId>
            <version>1.8</version>
            <scope>system</scope>
            <systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_2.12</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-scala_2.12</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-scala_2.12</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.12</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-api-scala-bridge_2.12</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-api-java-bridge_2.12</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner-blink_2.12</artifactId>
            <version>${flink.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-common</artifactId>
            <version>${flink.version}</version>
        </dependency>

        <!-- 日志 -->
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>1.7.7</version>
            <scope>runtime</scope>
        </dependency>
        <dependency>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>1.2.17</version>
            <scope>runtime</scope>
        </dependency>

        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <version>1.18.2</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-common</artifactId>
			<version>3.1.4</version>
		</dependency>
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-client</artifactId>
			<version>3.1.4</version>
		</dependency>
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-hdfs</artifactId>
			<version>3.1.4</version>
		</dependency>
    </dependencies>

二、Sink示例

本部分介绍Flink的sink内容,示例有sink到RMDB、kafka、redis、mysql等。

1、JDBC示例

本示例是将数据插入通过jdbc插入到哦mysql中。

1)、maven依赖

<dependency>
			<groupId>mysql</groupId>
			<artifactId>mysql-connector-java</artifactId>
			<version>5.1.38</version>
			<!--<version>8.0.20</version> -->
		</dependency>

2)、实现

  • user bean
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

/**
 * @author alanchan
 *
 */
@Data
@AllArgsConstructor
@NoArgsConstructor
public class User {
	private int id;
	private String name;
	private String pwd;
	private String email;
	private int age;
	private double balance;
}
  • 实现
import org.apache.flink.api.common.RuntimeExecutionMode;
import org.apache.flink.connector.jdbc.JdbcConnectionOptions;
import org.apache.flink.connector.jdbc.JdbcSink;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.source_transformation_sink.bean.User;

/**
 * @author alanchan
 *
 */
public class JDBCDemo {

	/**
	 * @param args
	 * @throws Exception
	 */
	public static void main(String[] args) throws Exception {
		// env
		StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
		env.setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);

		// source
		DataStream<User> userDS = env.fromElements(new User(1, "alanchanchn", "vx", "alan.chan.chn@163.com", 19, 800));

		// transformation

		// sink
//		public static <T> SinkFunction<T> sink(String sql, JdbcStatementBuilder<T> statementBuilder, JdbcConnectionOptions connectionOptions) {
//			return sink(sql, statementBuilder, JdbcExecutionOptions.defaults(), connectionOptions);
//		}
		String sql = "INSERT INTO `user` (`id`, `name`, `pwd`, `email`, `age`, `balance`) VALUES (null, ?, ?, ?, ?, ?);";
		String driverName = "com.mysql.jdbc.Driver";
		String url = "jdbc:mysql://192.168.10.44:3306/test?useUnicode=true&characterEncoding=UTF-8&useSSL=false";
		String name = "root";
		String pw = "root";

		// 1、采用匿名类的方式写
//		studentDS.addSink(JdbcSink.sink(sql, new JdbcStatementBuilder<User>() {
//
//			@Override
//			public void accept(PreparedStatement ps, User value) throws SQLException {
//				ps.setString(1, value.getName());
//				ps.setString(2, value.getPwd());
//				ps.setString(3, value.getEmail());
//				ps.setInt(4, value.getAge());
//				ps.setDouble(5, value.getBalance());
//			}
//			// (String url, String driverName, String username, String password
//		}, new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
//				.withDriverName(driverName)
//				.withUrl(url)
//				.withUsername(name)
//				.withPassword(pw)
//				.build()));

		// 2、采用lambda方式
		userDS.addSink(JdbcSink.sink(sql, (ps, value) -> {
			ps.setString(1, value.getName());
			ps.setString(2, value.getPwd());
			ps.setString(3, value.getEmail());
			ps.setInt(4, value.getAge());
			ps.setDouble(5, value.getBalance());
		}, new JdbcConnectionOptions.JdbcConnectionOptionsBuilder().withDriverName(driverName).withUrl(url).withUsername(name).withPassword(pw).build()));

		// execute
		env.execute();
	}

}

3)、验证

数据插入到了mysql中

Flink(五)source、transformations、sink的详细示例(三)-sink示例_flink

2、kafka示例

1)、maven依赖

<dependency>
			<groupId>org.apache.flink</groupId>
			<artifactId>flink-connector-kafka_2.12</artifactId>
			<version>${flink.version}</version>
		</dependency>
		<dependency>
			<groupId>org.apache.flink</groupId>
			<artifactId>flink-sql-connector-kafka_2.12</artifactId>
			<version>${flink.version}</version>
		</dependency>

2)、实现

import java.util.Properties;

import org.apache.flink.api.common.RuntimeExecutionMode;
import org.apache.flink.api.common.functions.FilterFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer;
import org.apache.flink.streaming.util.serialization.SimpleStringSchema;

/**
 * @author alanchan
 *
 */
public class SinkKafka {

	/**
	 * @param args
	 * @throws Exception
	 */
	public static void main(String[] args) throws Exception {
		// env
		StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
		env.setRuntimeMode(RuntimeExecutionMode.AUTOMATIC);

		// source
		// 准备kafka连接参数
		Properties props = new Properties();
		// 集群地址
		props.setProperty("bootstrap.servers", "server1:9092");
		// 消费者组id
		props.setProperty("group.id", "flink");
		// latest有offset记录从记录位置开始消费,没有记录从最新的/最后的消息开始消费
		// earliest有offset记录从记录位置开始消费,没有记录从最早的/最开始的消息开始消费
		props.setProperty("auto.offset.reset", "latest");

		// 会开启一个后台线程每隔5s检测一下Kafka的分区情况,实现动态分区检测
		props.setProperty("flink.partition-discovery.interval-millis", "5000");
		// 自动提交
		props.setProperty("enable.auto.commit", "true");
		// 自动提交的时间间隔
		props.setProperty("auto.commit.interval.ms", "2000");
		// 使用连接参数创建FlinkKafkaConsumer/kafkaSource
		FlinkKafkaConsumer<String> kafkaSource = new FlinkKafkaConsumer<String>("t_kafkasource", new SimpleStringSchema(), props);
		// 使用kafkaSource
		DataStream<String> kafkaDS = env.addSource(kafkaSource);

		// transformation
		//以alan作为结尾
		SingleOutputStreamOperator<String> etlDS = kafkaDS.filter(new FilterFunction<String>() {
			@Override
			public boolean filter(String value) throws Exception {
				return value.contains("alan");
			}
		});

		// sink
		etlDS.print();

		Properties props2 = new Properties();
		props2.setProperty("bootstrap.servers", "server1:9092");
		FlinkKafkaProducer<String> kafkaSink = new FlinkKafkaProducer<>("t_kafkasink", new SimpleStringSchema(), props2);
		etlDS.addSink(kafkaSink);

		// execute
		env.execute();
	}

}

3)、验证

  • 本示例的逻辑如下:
    1、使用FlinkKafkaConsumer作为数据源
    2、中间经过转换成SingleOutputStreamOperator etlDS
    3、SingleOutputStreamOperator etlDS sink到kafka中进行展示
  • 验证步骤如下:
    1、启动kafka,并启动kafka生产者主题t_kafkasource
kafka-console-producer.sh --broker-list server1:9092 --topic t_kafkasource
[alanchan@server1 onekeystart]$ kafka-console-producer.sh --broker-list server1:9092 --topic t_kafkasource

2、启动应用程序

3、打开kafka消费者主题t_kafkasink观察数据

kafka-console-consumer.sh --bootstrap-server server1:9092 --topic t_kafkasink --from-beginning

[alanchan@server1 testdata]$ kafka-console-consumer.sh --bootstrap-server server1:9092 --topic t_kafkasink --from-beginning

4、在kafka生产者控制台输入

[alanchan@server1 onekeystart]$ kafka-console-producer.sh --broker-list server1:9092 --topic t_kafkasource
>i am alanchanchn alan
>i like flink alan
>and you?
>

5、观察应用程序控制台输出

Flink(五)source、transformations、sink的详细示例(三)-sink示例_flink 分布式缓存_02

6、观察kafka消费者控制台输出

[alanchan@server1 testdata]$ kafka-console-consumer.sh --bootstrap-server server1:9092 --topic t_kafkasink --from-beginning
i am alanchan [e][end]
i am alanchanchn alan
i like flink alan

由以上步骤可以发现,kafka生产的数据已经按照期望sink到kafka中了。

3、redis示例

1)、API介绍

flink 提供了专门操作redis 的RedisSink,使用起来更方便,而且不用我们考虑性能的问题,接下来将主要介绍RedisSink 如何使用。

RedisSink 核心类是RedisMapper 是一个接口,使用时要编写自己的redis 操作类实现这个接口中的三个方法,如下所示

  • 1.getCommandDescription() ,设置使用的redis 数据结构类型,和key 的名称,通过RedisCommand 设置数据结构类型
  • 2.String getKeyFromData(T data),设置value 中的键值对key的值
  • 3.String getValueFromData(T data),设置value 中的键值对value的值

19、Join操作map side join 和 reduce side join
这个功能可以被使用来分享外部静态的数据。

1)maven依赖

<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-common</artifactId>
			<version>3.1.4</version>
		</dependency>
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-client</artifactId>
			<version>3.1.4</version>
		</dependency>
		<dependency>
			<groupId>org.apache.hadoop</groupId>
			<artifactId>hadoop-hdfs</artifactId>
			<version>3.1.4</version>
		</dependency>

2)、实现

import java.io.File;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.apache.commons.io.FileUtils;
import org.apache.flink.api.common.functions.RichMapFunction;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.operators.DataSource;
import org.apache.flink.api.java.operators.MapOperator;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.configuration.Configuration;

import akka.japi.tuple.Tuple4;

/**
 * @author alanchan
 *
 */
public class DistributedCacheSink {

	/**
	 * @param args
	 * @throws Exception
	 */
	public static void main(String[] args) throws Exception {
		// env
		ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
		// Source
		// 注册分布式缓存文件
		env.registerCachedFile("hdfs://server2:8020//flinktest/words/goodsDistributedCacheFile", "goodsDistributedCacheFile");

		// order数据集(id,name,goodsid)
		DataSource<Tuple3<Integer, String, Integer>> ordersDS = env
				.fromCollection(Arrays.asList(Tuple3.of(1, "alanchanchn", 1), Tuple3.of(2, "alanchan", 4), Tuple3.of(3, "alan", 123)));

		// Transformation
		// 将ordersDS(id,name,goodsid)中的数据和分布式缓存中goodsDistributedCacheFile的数据(goodsid,goodsname)关联,得到这样格式的数据: (id,name,goodsid,goodsname)
		MapOperator<Tuple3<Integer, String, Integer>, Tuple4<Integer, String, Integer, String>> result = ordersDS

				// public abstract class RichMapFunction<IN, OUT> extends AbstractRichFunction
				// implements MapFunction<IN, OUT> {
				// @Override
				// public abstract OUT map(IN value) throws Exception;
				// }

				.map(new RichMapFunction<Tuple3<Integer, String, Integer>, Tuple4<Integer, String, Integer, String>>() {

					// 获取缓存数据,并存储,具体以实际应用为准
					Map<Integer, String> goodsMap = new HashMap<>();

					//读取缓存数据,并放入本地数据结构中
					@Override
					public void open(Configuration parameters) throws Exception {
						// 加载分布式缓存文件
						File file = getRuntimeContext().getDistributedCache().getFile("goodsDistributedCacheFile");
						List<String> goodsList = FileUtils.readLines(file);
						for (String str : goodsList) {
							String[] arr = str.split(",");
							goodsMap.put(Integer.parseInt(arr[0]), arr[1]);
						}
					}

					//关联数据,并输出需要的数据结构
					@Override
					public Tuple4<Integer, String, Integer, String> map(Tuple3<Integer, String, Integer> value) throws Exception {
						// 使用分布式缓存文件中的数据
						// 返回(id,name,goodsid,goodsname)
						return new Tuple4(value.f0, value.f1, value.f2, goodsMap.get(value.f2));
					}
				});

		// Sink
		result.print();

	}

}

3)、验证

  • 验证步骤
    1、准备分布式文件及其内容,并上传至hdfs中
    2、运行程序,查看输出
  • 验证
    1、缓存文件内容

    2、上传至hdfs

3、运行程序,查看结果

Flink(五)source、transformations、sink的详细示例(三)-sink示例_flink redis_03

6、Broadcast变量示例

Flink支持广播。可以将数据广播到TaskManager上就可以供TaskManager中的SubTask/task去使用,数据存储到内存中。这样可以减少大量的shuffle操作,而不需要多次传递给集群节点。比如在数据join阶段,可以把其中一个dataSet广播出去,一直加载到taskManager的内存中,可以直接在内存中拿数据,避免了大量的shuffle,导致集群性能下降;

Flink(五)source、transformations、sink的详细示例(三)-sink示例_flink 分布式缓存_04


本示例实现上一个缓存示例一样的内容,不过是使用广播实现的。该示例比较简单,实现逻辑与分布式缓存基本上一样。

1)、实现

import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.apache.flink.api.common.functions.RichMapFunction;
import org.apache.flink.api.java.ExecutionEnvironment;
import org.apache.flink.api.java.operators.DataSource;
import org.apache.flink.api.java.operators.MapOperator;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.tuple.Tuple3;
import org.apache.flink.configuration.Configuration;

import akka.japi.tuple.Tuple4;

/**
 * @author alanchan
 *
 */
public class BroadcastSink {

	/**
	 * @param args
	 * @throws Exception 
	 */
	public static void main(String[] args) throws Exception {
		// env
		ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

		// Source
		// student数据集(学号,姓名)
		DataSource<Tuple2<Integer, String>> studentDS = env.fromCollection(Arrays.asList(Tuple2.of(1, "alan"), Tuple2.of(2, "alanchan"), Tuple2.of(3, "alanchanchn")));

		// score数据集(学号,学科,成绩)
		DataSource<Tuple3<Integer, String, Integer>> scoreDS = env
				.fromCollection(Arrays.asList(Tuple3.of(1, "chinese", 50), Tuple3.of(1, "math", 90), Tuple3.of(1, "english", 90), Tuple3.of(2, "math", 70), Tuple3.of(3, "art", 86)));

		// Transformation
		// 将studentDS(学号,姓名)集合广播出去(广播到各个TaskManager内存中)
		// 然后使用scoreDS(学号,学科,成绩)和广播数据studentDS(学号,姓名)进行关联,得到这样格式的数据:(学号,姓名,学科,成绩)
		MapOperator<Tuple3<Integer, String, Integer>, Tuple4<Integer, String, String, Integer>> result = scoreDS
				.map(new RichMapFunction<Tuple3<Integer, String, Integer>, Tuple4<Integer, String, String, Integer>>() {
					
					Map<Integer, String> studentsMap = new HashMap<>();

					@Override
					public void open(Configuration parameters) throws Exception {
						// 获取广播数据
						List<Tuple2<Integer, String>> studentList = getRuntimeContext().getBroadcastVariable("studentsInfo");
						for (Tuple2<Integer, String> tuple : studentList) {
							studentsMap.put(tuple.f0, tuple.f1);
						}

					}

					@Override
					public Tuple4<Integer, String, String, Integer> map(Tuple3<Integer, String, Integer> value) throws Exception {
						// 使用广播数据
						Integer stuID = value.f0;
						String stuName = studentsMap.getOrDefault(stuID, "");

						return new Tuple4(stuID, stuName, value.f1, value.f2);
					}
				}).withBroadcastSet(studentDS, "studentsInfo");
		// 4.Sink
		result.print();
	}

}

2)、验证

运行应用程序,结果如下

Flink(五)source、transformations、sink的详细示例(三)-sink示例_flink_05

以上,详细的介绍了Flink的sink6种方式,即JDBC、Mysql、Kafka、redis、分布式缓存和广播变量。