8.1 概述

本章中我们举例介绍 Flink 的 Stream API 的常见操作。主要需要理解每一个操作的作用以及大致用法即可,尽管内容非常简单,但在实际应用开发中也经常围绕着这些 Operators 展开。

感谢各位小伙伴们对本系列基础教程博客的点赞评论支持,万分感谢 ~

8.2 基本操作

8.2.1 print 打印数据到控制台

这个操作非常简单,将 DataStream 中的数据打印到控制台即可。

flink 的flatMap 包含的所有写法 flink map flatmap_大数据

import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

/**
 * flink 打印 DataStreamSource 中的内容
 * @author smileyan
 */
public class PrintDemo {
    public static void main(String[] args) throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        DataStreamSource<Integer> dataStream = env.fromElements(1, 2, 3, 4, 5, 6);
        dataStream.print();
        env.execute();
    }
}

如图所示,打印结果中,> 符号之前指的是执行结点,类似于多线程执行任务时的线程名称。

8.2.2 map 获取一个元素并生成一个元素

这个例子中,我们对于已知的DataStream 做 map 操作,即对流中的每个字符串,都返回它的大写形式。

flink 的flatMap 包含的所有写法 flink map flatmap_flink_02

import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

/**
 * 例子1:flink 的 map 例子
 * @author smileyan
 */
public class MapDemo {
    public static void main(String[] args) throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        DataStream<String> fixedInput = env.fromElements(
                "apple",
                "banana",
                "cherry",
                "date",
                "elderberry");

        DataStream<String> uppercased = fixedInput.map(String::toUpperCase);

        uppercased.print();

        env.execute("Flink Map Example with Fixed Data");
    }
}

8.2.3 flatMap 获取一个元素并生成零个、一个或多个元素

例子1:本例子中,我们输入数据为三个链表,对每个链表进行 flatMap 操作,即求每个链表的所有数的和。

flink 的flatMap 包含的所有写法 flink map flatmap_大数据_03

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

import java.util.Arrays;
import java.util.List;

/**
 * 例子2:flink 的 flatMap
 * @author smileyan
 */
public class FlatMapDemo {
    public static void main(String[] args) throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        DataStream<List<Integer>> fixedInput = env.fromElements(
                Arrays.asList(1, 2, 3, 4),
                Arrays.asList(5, 6),
                Arrays.asList(7, 8, 9, 10, 11));

        SingleOutputStreamOperator<Integer> sums = fixedInput.flatMap(new FlatMapFunction<List<Integer>, Integer>() {
            @Override
            public void flatMap(List<Integer> integers, Collector<Integer> collector) throws Exception {
                int sum = 0;
                for (Integer integer : integers) {
                    sum += integer;
                }
                collector.collect(sum);
            }
        });

        sums.print();

        env.execute("Flink FlatMap Example for Sums");
    }

}

例子 2: 本例子将一个字符串拆分成多个子字段,并分割为多个流进行打印。

flink 的flatMap 包含的所有写法 flink map flatmap_大数据_04

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;


/**
 * 例子2:flink 的 flatMap
 * @author smileyan
 */
public class FlatMapDemo2 {

    public static void main(String[] args) throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        DataStream<String> dataStream = env.fromElements("Hello world! Hello Flink!");

        SingleOutputStreamOperator<String> output = dataStream.flatMap(new FlatMapFunction<String, String>() {
            @Override
            public void flatMap(String value, Collector<String> out) {
                for (String word : value.split(" ")) {
                    out.collect(word);
                }
            }
        });

        output.print();

        env.execute("Flink FlatMap Example for Sums");
    }

}

8.2.4 filter 为每个元素计算布尔函数,并保留那些函数返回true的元素

本例子中,对输入数据进行过滤,只保留偶数,并打印到控制台。

flink 的flatMap 包含的所有写法 flink map flatmap_数据_05

import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

/**
 * filter 例子
 * @author smileyan
 */
public class FilterDemo {
    public static void main(String[] args) throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        DataStream<Integer> fixedInput = env.fromElements(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

        DataStream<Integer> evenNumbers = fixedInput.filter(number -> number % 2 == 0);

        evenNumbers.print();

        env.execute("Flink Filter Example");
    }
}

8.3.5 keyBy 分组操作

本例子中,我们对输入的多个 Tuple2 类型的数据进行分组,分组后进行 flatMap,然后打印到控制台。注意打印时,> 左边的序号可以理解为线程名称,也可以理解为分组后的组名。

flink 的flatMap 包含的所有写法 flink map flatmap_apache_06

import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;

/**
 * 例子4: keyBy 例子
 * @author smileyan
 */
public class KeyByDemo {
    public static void main(String[] args) throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        DataStreamSource<Tuple2<Integer, String>> items = env.fromElements(Tuple2.of(1, "hello"),
                Tuple2.of(2, "welcome"), Tuple2.of(1, "flink"), Tuple2.of(2, "to"), Tuple2.of(2, "China"));

        SingleOutputStreamOperator<String> output = items.keyBy(tuple -> tuple.f0)
                .flatMap(new FlatMapFunction<Tuple2<Integer, String>, String>() {
                    @Override
                    public void flatMap(Tuple2<Integer, String> tuple, Collector<String> collector) throws Exception {
                        collector.collect(tuple.f1 + " ");
                    }
                });

        output.print();

        env.execute("Flink KeyBy Example");
    }
}

8.3.6 window

本例子中,我们给出固定的数据,并且每个元素都含有两个字段,首先根据第一个字段进行分组,分组完成以后进行窗口分割操作,并且每个窗口大小为3。因此源数据分组中只输出前面3个。

flink 的flatMap 包含的所有写法 flink map flatmap_学习_07

import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

/**
 * 例子 5:reduce 例子
 * @author smileyan
 */
public class WindowDemo {
    public static void main(String[] args) throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        DataStream<Tuple2<Integer, String>> fixedInput = env.fromElements(
                Tuple2.of(1, "apple"),
                Tuple2.of(2, "banana"),
                Tuple2.of(1, "cherry"),
                Tuple2.of(2, "elderberry"),
                Tuple2.of(1, "firefly"),
                Tuple2.of(2, "giraffe"),
                Tuple2.of(2, "hippopotamus")
        );

        SingleOutputStreamOperator<Tuple2<Integer, String>> reduced = fixedInput
                .keyBy(v -> v.f0)
                .countWindow(3)
                .reduce((a, b) -> Tuple2.of(a.f0, a.f1 + ";" + b.f1))
                .name("hello window");

        reduced.print();

        env.execute("Flink KeyBy Example");
    }
}

8.3.7 reduce 操作

前面的window例子已经用到了 reduce 操作,这里我们再举个简单的例子,主要用处在于对于已知的,多个数据进行reduce操作,一般是指将多个元素转换为一个元素。

本例子中我们对已经给出的数据进行分组,并且以滚动的方式逐步进行reduce合并,具体效果如下所示:

flink 的flatMap 包含的所有写法 flink map flatmap_大数据_08

import org.apache.flink.api.common.functions.ReduceFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;

public class ReduceDemo {
    public static void main(String[] args) throws Exception {
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        DataStream<Tuple2<Integer, String>> groupInput = env.fromElements(
                Tuple2.of(1, "apple"),
                Tuple2.of(2, "banana"),
                Tuple2.of(1, "cherry"),
                Tuple2.of(2, "elderberry"),
                Tuple2.of(1, "firefly"),
                Tuple2.of(2, "giraffe"),
                Tuple2.of(2, "hippopotamus")
        );

        SingleOutputStreamOperator<String> output = groupInput.keyBy(v -> v.f0)
                .reduce(new ReduceFunction<Tuple2<Integer, String>>() {
                    @Override
                    public Tuple2<Integer, String> reduce(Tuple2<Integer, String> result, Tuple2<Integer, String> t1) throws Exception {
                        return Tuple2.of(result.f0, result.f1 + ";" + t1.f1);
                    }
                })
                .map(v -> v.f1);

        output.print();

        env.execute();
    }
}

输出内容是:

12> banana
9> apple
12> banana;elderberry
9> apple;cherry
12> banana;elderberry;giraffe
9> apple;cherry;firefly
12> banana;elderberry;giraffe;hippopotamus

可以看出这是一个 rolling 的过程,以 flink 官方文档对 reduce 的概述为:A “rolling” reduce on a keyed data stream. Combines the current element with the last reduced value and emits the new value. 这个 reduce 的过程在rolling中完成。换而言之,如果是以kafka消息为源的话,整个rolling过程是无界的,可以继续rolling拼接。

8.4 总结

本章节对 Flink DataStream 的常用 operator 进行介绍,并通过代码介绍每个 operator 的使用方法大致流程。这里作为一个本系列的最后一个基础章节,在最后的两个章节中,我希望能够以实际开发场景为例子进行规模稍微大一些的设计与开发,相对而言内容更加杂乱但主要是对前面提到的基础模块进行拓展。