Flink结合lambda表达式

数字字符串 转 数字:1个字符串 => 1个数字 (map)

streamSource.map(Integer::valueOf)

StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();

        DataStreamSource<String> streamSource = environment.fromCollection(Arrays.asList("123", "852"));

        //优化步骤:
//        MapFunction<String, Integer> mapFunction = (String data) -> {
//            return Integer.valueOf(data);
//        };

//        MapFunction<String, Integer> mapFunction = (String data) -> Integer.valueOf(data);

//        MapFunction<String, Integer> mapFunction = (data) -> Integer.valueOf(data);

//        MapFunction<String, Integer> mapFunction = Integer::valueOf;

        SingleOutputStreamOperator<Integer> streamOperator = streamSource.map(Integer::valueOf);

        streamOperator.print();

        environment.execute();

json 转 Java对象

streamSource.map(

StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();

        Properties properties = new Properties();
        properties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"62.234.60.20:9092, 62.234.60.20:9093");
        properties.setProperty(ConsumerConfig.GROUP_ID_CONFIG, "flink_kafka_lambda");
        DataStreamSource<String> streamSource = environment.addSource(new FlinkKafkaConsumer<String>("alert_log", new SimpleStringSchema(), properties));

        ObjectMapper objectMapper = new ObjectMapper();

        //优化步骤:
//        MapFunction<String, AlertLog> mapFunction = (value) -> {
//            return objectMapper.readValue(value, AlertLog.class);
//        };

//        MapFunction<String, AlertLog> mapFunction = (value) -> objectMapper.readValue(value, AlertLog.class);

        SingleOutputStreamOperator<AlertLog> streamOperator = streamSource.map((value) -> objectMapper.readValue(value, AlertLog.class));

        streamOperator.print();

        environment.execute();

flatMap 数据的碾平:1个字符串 => n个数字 (flatmap)

flatmap 给定复合数据 => 拆解成多个不可分割的数据 数据的碾平

需求, 将上面的收到的每个字符串 => 拆解成数字
1个字符串 => 1个数字 (map)
1个字符串 => n个数字 (flatmap) 将数据碾平 提升处理效率

streamSource.flatMap(new FlatMapFunction

StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();

        environment.setParallelism(1);

        DataStreamSource<String> streamSource = environment.fromCollection(
                Arrays.asList("85,95,74", "25,32,54", "51,62,84")
        );

        SingleOutputStreamOperator<Long> streamOperator = streamSource.flatMap(new FlatMapFunction<String, Long>() {
            @Override
            public void flatMap(String s, Collector<Long> collector) throws Exception {
                String[] strArray = s.split(",");
                for (String s1 : strArray) {
                    collector.collect(Long.decode(s1));
                }
            }
        });

        streamOperator.print();

        environment.execute();
85
95
74
25
32
54
51
62
84

keyBy + 滚动窗口 + processingTime

streamSource.keyBy(keyedStream.window(

StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();

        DataStreamSource<String> streamSource = environment.addSource(new SourceFunction<String>() {
            @Override
            public void run(SourceContext<String> sourceContext) throws Exception {
                while (true){
                    String[] strs = {"口", "㐅", "工", "米"};
                    String word = strs[new Random().nextInt(4)];

                    String data = "1000" + "," + word;

                    TimeUnit.SECONDS.sleep(2);

                    System.out.println("本次生成了数据  ==> " + data);

                    sourceContext.collect(data);
                }

            }

            @Override
            public void cancel() {
            }
        });

        KeyedStream<String, String> keyedStream = streamSource.keyBy(word -> word.split(",")[1]);

        WindowedStream<String, String, TimeWindow> windowedStream = keyedStream.window(TumblingProcessingTimeWindows.of(Time.of(8, TimeUnit.SECONDS)));

        windowedStream.process(new ProcessWindowFunction<String, Object, String, TimeWindow>() {
            @Override
            public void process(String s, ProcessWindowFunction<String, Object, String, TimeWindow>.Context context, Iterable<String> iterable, Collector<Object> collector) throws Exception {

                System.out.println("当前8s窗口收集到的一捆数据iterable为 -> " + iterable);
            }
        });

        environment.execute();
本次生成了数据  ==> 1000,口
当前8s窗口收集到的一捆数据iterable为 -> [1000,口]
本次生成了数据  ==> 1000,工
本次生成了数据  ==> 1000,工
本次生成了数据  ==> 1000,口
本次生成了数据  ==> 1000,工
当前8s窗口收集到的一捆数据iterable为 -> [1000,口]
当前8s窗口收集到的一捆数据iterable为 -> [1000,工, 1000,工, 1000,工]

keyBy + 滚动窗口 + eventTime

environment.setStreamTimeCharacteristic(streamSource.assignTimestampsAndWatermarks(watermarks.keyBy(keyedStream.window(

StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();

        //1、step1. 设置时间语义为事件时间EventTime
        environment.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

        DataStreamSource<String> streamSource = environment.addSource(new SourceFunction<String>() {

            private long time = 1000;

            @Override
            public void run(SourceContext<String> sourceContext) throws Exception {
                while (true){
                    String[] strs = {"口", "㐅", "工", "米"};
                    String word = strs[new Random().nextInt(4)];

                    String data = time + "," + word;

                    time += 2000;

                    TimeUnit.SECONDS.sleep(2);

                    System.out.println("本次生成了数据  ==> " + data);

                    sourceContext.collect(data);
                }

            }

            @Override
            public void cancel() {
            }
        });

        // TODO: step2. 设置事件时间字段,数据类型必须为Long类型  给数据打上水印 以数据中自带的时间数据作为水印     watermark
        SingleOutputStreamOperator<String> watermarks = streamSource.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<String>(Time.seconds(0)) {
            @Override
            public long extractTimestamp(String s) {
                String strTime = s.split(",")[0];
                return Long.decode(strTime);
            }
        });

        KeyedStream<String, String> keyedStream = watermarks.keyBy(word -> word.split(",")[1]);

        //
        WindowedStream<String, String, TimeWindow> windowedStream = keyedStream.window(TumblingEventTimeWindows.of(Time.of(8, TimeUnit.SECONDS)));

        windowedStream.process(new ProcessWindowFunction<String, Object, String, TimeWindow>() {
            @Override
            public void process(String s, ProcessWindowFunction<String, Object, String, TimeWindow>.Context context, Iterable<String> iterable, Collector<Object> collector) throws Exception {
                System.out.println("当前8s窗口收集到的一捆数据iterable为 -> " + iterable);
            }
        });

        environment.execute();
本次生成了数据  ==> 1000,㐅
本次生成了数据  ==> 3000,㐅
本次生成了数据  ==> 5000,㐅
本次生成了数据  ==> 7000,工
本次生成了数据  ==> 9000,工
当前8s窗口收集到的一捆数据iterable为 -> [7000,工]
当前8s窗口收集到的一捆数据iterable为 -> [1000,㐅, 3000,㐅, 5000,㐅]
本次生成了数据  ==> 11000,工
本次生成了数据  ==> 13000,工
本次生成了数据  ==> 15000,口
本次生成了数据  ==> 17000,㐅
当前8s窗口收集到的一捆数据iterable为 -> [15000,口]
当前8s窗口收集到的一捆数据iterable为 -> [9000,工, 11000,工, 13000,工]

Flink聚合算子

flink要求必须是pojo类

Flink 会分析那些不属于任何一类的数据类型,尝试将它们作为 POJO 类型进行处理。如果一个类型满足如下条件,Flink 就会将它们作为 POJO 数据类型:

  • POJOs 类必须是一个公有类,Public 修饰且独立定义,不能是内部类;
  • POJOs 类中必须包含一个 Public 修饰的无参构造器;
  • POJOs 类中所有的字段必须是 Public 或者具有 Public 修饰的 getter 和 setter 方法;
  • POJOs 类中的字段类型必须是 Flink 支持的。
package com.itszt23.flink;

import lombok.Data;

@Data
public class People {
    public String className;

    public String name;

    public int javaScore;

    public People(){};
    public People(String className, String name, int javaScore) {
        this.className = className;
        this.name = name;
        this.javaScore = javaScore;
    }
}
StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();

        environment.setParallelism(1);

        //flink要求必须是pojo类, pojo类要求之一, 必须实体类是独立定义的public权限修饰的类
        //所以此处的People必须是单独的一个POJO类
        DataStreamSource<People> streamSource = environment.fromCollection(
                Arrays.asList(
                        new People("小将", "问天", 100),
                        new People("小将", "铁心", 200),
                        new People("小将", "问雅", 300),
                        new People("神兵", "天晶剑", 300)
                )
        );

        KeyedStream<People, String> keyedStream = streamSource.keyBy(People::getClassName);

        SingleOutputStreamOperator<People> javaScore = keyedStream.sum("javaScore");
        
        javaScore.print();

        environment.execute();
People(className=小将, name=问天, javaScore=100)
People(className=小将, name=问天, javaScore=300)
People(className=小将, name=问天, javaScore=600)
People(className=神兵, name=天晶剑, javaScore=300)
SingleOutputStreamOperator<People> javaScore = keyedStream.min("javaScore");
People(className=小将, name=问天, javaScore=100)
People(className=小将, name=问天, javaScore=100)
People(className=小将, name=问天, javaScore=100)
People(className=神兵, name=天晶剑, javaScore=300)