Flink结合lambda表达式
数字字符串 转 数字:1个字符串 => 1个数字 (map)
streamSource.map(Integer::valueOf)
StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> streamSource = environment.fromCollection(Arrays.asList("123", "852"));
//优化步骤:
// MapFunction<String, Integer> mapFunction = (String data) -> {
// return Integer.valueOf(data);
// };
// MapFunction<String, Integer> mapFunction = (String data) -> Integer.valueOf(data);
// MapFunction<String, Integer> mapFunction = (data) -> Integer.valueOf(data);
// MapFunction<String, Integer> mapFunction = Integer::valueOf;
SingleOutputStreamOperator<Integer> streamOperator = streamSource.map(Integer::valueOf);
streamOperator.print();
environment.execute();
json 转 Java对象
streamSource.map(
StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
Properties properties = new Properties();
properties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"62.234.60.20:9092, 62.234.60.20:9093");
properties.setProperty(ConsumerConfig.GROUP_ID_CONFIG, "flink_kafka_lambda");
DataStreamSource<String> streamSource = environment.addSource(new FlinkKafkaConsumer<String>("alert_log", new SimpleStringSchema(), properties));
ObjectMapper objectMapper = new ObjectMapper();
//优化步骤:
// MapFunction<String, AlertLog> mapFunction = (value) -> {
// return objectMapper.readValue(value, AlertLog.class);
// };
// MapFunction<String, AlertLog> mapFunction = (value) -> objectMapper.readValue(value, AlertLog.class);
SingleOutputStreamOperator<AlertLog> streamOperator = streamSource.map((value) -> objectMapper.readValue(value, AlertLog.class));
streamOperator.print();
environment.execute();
flatMap 数据的碾平:1个字符串 => n个数字 (flatmap)
flatmap 给定复合数据 => 拆解成多个不可分割的数据 数据的碾平
需求, 将上面的收到的每个字符串 => 拆解成数字
1个字符串 => 1个数字 (map)
1个字符串 => n个数字 (flatmap) 将数据碾平 提升处理效率
streamSource.flatMap(new FlatMapFunction
StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
environment.setParallelism(1);
DataStreamSource<String> streamSource = environment.fromCollection(
Arrays.asList("85,95,74", "25,32,54", "51,62,84")
);
SingleOutputStreamOperator<Long> streamOperator = streamSource.flatMap(new FlatMapFunction<String, Long>() {
@Override
public void flatMap(String s, Collector<Long> collector) throws Exception {
String[] strArray = s.split(",");
for (String s1 : strArray) {
collector.collect(Long.decode(s1));
}
}
});
streamOperator.print();
environment.execute();
85
95
74
25
32
54
51
62
84
keyBy + 滚动窗口 + processingTime
streamSource.keyBy(
keyedStream.window(
StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> streamSource = environment.addSource(new SourceFunction<String>() {
@Override
public void run(SourceContext<String> sourceContext) throws Exception {
while (true){
String[] strs = {"口", "㐅", "工", "米"};
String word = strs[new Random().nextInt(4)];
String data = "1000" + "," + word;
TimeUnit.SECONDS.sleep(2);
System.out.println("本次生成了数据 ==> " + data);
sourceContext.collect(data);
}
}
@Override
public void cancel() {
}
});
KeyedStream<String, String> keyedStream = streamSource.keyBy(word -> word.split(",")[1]);
WindowedStream<String, String, TimeWindow> windowedStream = keyedStream.window(TumblingProcessingTimeWindows.of(Time.of(8, TimeUnit.SECONDS)));
windowedStream.process(new ProcessWindowFunction<String, Object, String, TimeWindow>() {
@Override
public void process(String s, ProcessWindowFunction<String, Object, String, TimeWindow>.Context context, Iterable<String> iterable, Collector<Object> collector) throws Exception {
System.out.println("当前8s窗口收集到的一捆数据iterable为 -> " + iterable);
}
});
environment.execute();
本次生成了数据 ==> 1000,口
当前8s窗口收集到的一捆数据iterable为 -> [1000,口]
本次生成了数据 ==> 1000,工
本次生成了数据 ==> 1000,工
本次生成了数据 ==> 1000,口
本次生成了数据 ==> 1000,工
当前8s窗口收集到的一捆数据iterable为 -> [1000,口]
当前8s窗口收集到的一捆数据iterable为 -> [1000,工, 1000,工, 1000,工]
keyBy + 滚动窗口 + eventTime
environment.setStreamTimeCharacteristic(
streamSource.assignTimestampsAndWatermarks(
watermarks.keyBy(
keyedStream.window(
StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
//1、step1. 设置时间语义为事件时间EventTime
environment.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
DataStreamSource<String> streamSource = environment.addSource(new SourceFunction<String>() {
private long time = 1000;
@Override
public void run(SourceContext<String> sourceContext) throws Exception {
while (true){
String[] strs = {"口", "㐅", "工", "米"};
String word = strs[new Random().nextInt(4)];
String data = time + "," + word;
time += 2000;
TimeUnit.SECONDS.sleep(2);
System.out.println("本次生成了数据 ==> " + data);
sourceContext.collect(data);
}
}
@Override
public void cancel() {
}
});
// TODO: step2. 设置事件时间字段,数据类型必须为Long类型 给数据打上水印 以数据中自带的时间数据作为水印 watermark
SingleOutputStreamOperator<String> watermarks = streamSource.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<String>(Time.seconds(0)) {
@Override
public long extractTimestamp(String s) {
String strTime = s.split(",")[0];
return Long.decode(strTime);
}
});
KeyedStream<String, String> keyedStream = watermarks.keyBy(word -> word.split(",")[1]);
//
WindowedStream<String, String, TimeWindow> windowedStream = keyedStream.window(TumblingEventTimeWindows.of(Time.of(8, TimeUnit.SECONDS)));
windowedStream.process(new ProcessWindowFunction<String, Object, String, TimeWindow>() {
@Override
public void process(String s, ProcessWindowFunction<String, Object, String, TimeWindow>.Context context, Iterable<String> iterable, Collector<Object> collector) throws Exception {
System.out.println("当前8s窗口收集到的一捆数据iterable为 -> " + iterable);
}
});
environment.execute();
本次生成了数据 ==> 1000,㐅
本次生成了数据 ==> 3000,㐅
本次生成了数据 ==> 5000,㐅
本次生成了数据 ==> 7000,工
本次生成了数据 ==> 9000,工
当前8s窗口收集到的一捆数据iterable为 -> [7000,工]
当前8s窗口收集到的一捆数据iterable为 -> [1000,㐅, 3000,㐅, 5000,㐅]
本次生成了数据 ==> 11000,工
本次生成了数据 ==> 13000,工
本次生成了数据 ==> 15000,口
本次生成了数据 ==> 17000,㐅
当前8s窗口收集到的一捆数据iterable为 -> [15000,口]
当前8s窗口收集到的一捆数据iterable为 -> [9000,工, 11000,工, 13000,工]
Flink聚合算子
flink要求必须是pojo类。
Flink 会分析那些不属于任何一类的数据类型,尝试将它们作为 POJO 类型进行处理。如果一个类型满足如下条件,Flink 就会将它们作为 POJO 数据类型:
- POJOs 类必须是一个公有类,Public 修饰且独立定义,不能是内部类;
- POJOs 类中必须包含一个 Public 修饰的无参构造器;
- POJOs 类中所有的字段必须是 Public 或者具有 Public 修饰的 getter 和 setter 方法;
- POJOs 类中的字段类型必须是 Flink 支持的。
package com.itszt23.flink;
import lombok.Data;
@Data
public class People {
public String className;
public String name;
public int javaScore;
public People(){};
public People(String className, String name, int javaScore) {
this.className = className;
this.name = name;
this.javaScore = javaScore;
}
}
StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
environment.setParallelism(1);
//flink要求必须是pojo类, pojo类要求之一, 必须实体类是独立定义的public权限修饰的类
//所以此处的People必须是单独的一个POJO类
DataStreamSource<People> streamSource = environment.fromCollection(
Arrays.asList(
new People("小将", "问天", 100),
new People("小将", "铁心", 200),
new People("小将", "问雅", 300),
new People("神兵", "天晶剑", 300)
)
);
KeyedStream<People, String> keyedStream = streamSource.keyBy(People::getClassName);
SingleOutputStreamOperator<People> javaScore = keyedStream.sum("javaScore");
javaScore.print();
environment.execute();
People(className=小将, name=问天, javaScore=100)
People(className=小将, name=问天, javaScore=300)
People(className=小将, name=问天, javaScore=600)
People(className=神兵, name=天晶剑, javaScore=300)
SingleOutputStreamOperator<People> javaScore = keyedStream.min("javaScore");
People(className=小将, name=问天, javaScore=100)
People(className=小将, name=问天, javaScore=100)
People(className=小将, name=问天, javaScore=100)
People(className=神兵, name=天晶剑, javaScore=300)