flink 算子 keyBy min max minBy maxBy reduce spilt select
keyBy
Datastream -->KeyedStream:逻辑的将一个流拆分成不相交的分区,每个分区包含具有相同key的元素,在内部以hash的形式实现的
1、KeyBy会重新分区; 2、不同的key有可能分到一起,因为是通过hash原理实现的;
!!但并不是同一个分区就只有一个相同的Key
滚动聚合算子
sum min max minBy maxBy,针对KeyedStrxeam的每一个支流做聚合
public class TransformAPI {
public static void main(String[] args) throws Exception{
StreamExecutionEnvironment env =StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> inputStream =
env.readTextFile("D:\\Tool\\Idea2020\\FlinkTest\\src\\main\\resources\\file.txt");
//需要先做分组,才能再做聚合
DataStream<SensorReading> dataStream= inputStream.map(new MapFunction<String, SensorReading>() {
@Override
public SensorReading map(String s) throws Exception {
String[] splits = s.split(",");
return new SensorReading(splits[0],new Long(splits[1]),new Double(splits[2]));
}
});
// KeyedStream<SensorReading, Tuple> keyedStream = dataStream.keyBy("id");
KeyedStream<SensorReading, String> keyedStream = dataStream.keyBy(SensorReading::getId);
//滚动聚合
// SingleOutputStreamOperator<SensorReading> resultStream = keyedStream.max("temperature");
SingleOutputStreamOperator<SensorReading> temperature = keyedStream.max("temperature");
temperature.print();
env.execute();
}
}
file.txt 内容
sensor_1,1547718199L,12.5
sensor_6,1547718200L,22.5
sensor_7,1547718201L,32.5
sensor_9,1547718202L,42.5
sensor_1,1547718111L,50.1
sensor_1,1547234239L,45.8
输出
6> SensorReading{id='sensor_6', timeStamp=1547718200, temperature=22.5}
8> SensorReading{id='sensor_9', timeStamp=1547718202, temperature=42.5}
5> SensorReading{id='sensor_1', timeStamp=1547234239, temperature=45.8}
7> SensorReading{id='sensor_7', timeStamp=1547718201, temperature=32.5}
5> SensorReading{id='sensor_1', timeStamp=1547234239, temperature=45.8}
5> SensorReading{id='sensor_1', timeStamp=1547234239, temperature=50.1}
Reduce
KeyedStream ->DataStream:一个分组数据流的聚合操作,合并当前的元素和上次聚合的结果,产生一个新的值,返回的流中包含每一次聚合的结果,而不是只返回最后一次聚合的最终结果
spilt
注:此操作一般后面会跟一个select的操作
DataStream ->SpiltStream:根据某些特征把一个DataStream拆分威两个或者多个Datastream
select
SpiltStream ->DataStream:从一个SpiltStream中获取一个或多个DataStream
public class TransformSpilt {
public static void main(String[] args) throws Exception{
StreamExecutionEnvironment env =StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> inputStream =
env.readTextFile("D:\\Tool\\Idea2020\\FlinkTest\\src\\main\\resources\\file.txt");
env.setParallelism(1);
DataStream<SensorReading> dataStream = inputStream.map(line -> {
String[] splits = line.split(",");
return new SensorReading(splits[0],new Long(splits[1]),new Double(splits[2]));
});
//分流
SplitStream<SensorReading> splitStream = dataStream.split(new OutputSelector<SensorReading>() {
@Override
public Iterable<String> select(SensorReading sensorReading) {
return (sensorReading.getTemperature() > 30 ? Collections.singletonList("high"):Collections.singletonList("low"));
}
});
DataStream<SensorReading> highTempStream = splitStream.select("high");
DataStream<SensorReading> lowTempStream = splitStream.select("low");
DataStream<SensorReading> allTempStream = splitStream.select("high", "low");
highTempStream.print("****************high****************8");
lowTempStream.print("****************low****************8");
allTempStream.print("****************all****************8");
env.execute();
}
}
输出:
****************low****************8> SensorReading{id='sensor_6', timeStamp=1547718200, temperature=22.5}
****************all****************8> SensorReading{id='sensor_6', timeStamp=1547718200, temperature=22.5}
****************high****************8> SensorReading{id='sensor_1', timeStamp=1547234239, temperature=45.8}
****************all****************8> SensorReading{id='sensor_1', timeStamp=1547234239, temperature=45.8}
****************low****************8> SensorReading{id='sensor_1', timeStamp=1547718199, temperature=12.5}
****************all****************8> SensorReading{id='sensor_1', timeStamp=1547718199, temperature=12.5}
****************high****************8> SensorReading{id='sensor_1', timeStamp=1547718111, temperature=50.1}
****************all****************8> SensorReading{id='sensor_1', timeStamp=1547718111, temperature=50.1}
****************high****************8> SensorReading{id='sensor_9', timeStamp=1547718202, temperature=42.5}
****************all****************8> SensorReading{id='sensor_9', timeStamp=1547718202, temperature=42.5}
****************high****************8> SensorReading{id='sensor_7', timeStamp=1547718201, temperature=32.5}
****************all****************8> SensorReading{id='sensor_7', timeStamp=1547718201, temperature=32.5}
Connect和CoMap
–>局限 只能连接两条流,但是类型可以不一样
DataStrream,DataStream -> ConnectedStream:连接两个保持他们类型的数据流,两个数据流被Connect之后,只是被放在了一个同一个流中,内部依然保持各自的数据和形式不发生任何变化,两个流相互独立。
CoMap,CoFlatMap
ConnectedStreams ->dataStream:作用于connectedStream上,功能与map和flatMap一样,对ConnectedStream中的每一个Stream分别进行map和flatMap处理
public class TransformSpilt {
public static void main(String[] args) throws Exception{
StreamExecutionEnvironment env =StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> inputStream =
env.readTextFile("D:\\Tool\\Idea2020\\FlinkTest\\src\\main\\resources\\file.txt");
env.setParallelism(1);
DataStream<SensorReading> dataStream = inputStream.map(line -> {
String[] splits = line.split(",");
return new SensorReading(splits[0],new Long(splits[1]),new Double(splits[2]));
});
//分流
SplitStream<SensorReading> splitStream = dataStream.split(new OutputSelector<SensorReading>() {
@Override
public Iterable<String> select(SensorReading sensorReading) {
return (sensorReading.getTemperature() > 30 ? Collections.singletonList("high"):Collections.singletonList("low"));
}
});
DataStream<SensorReading> highTempStream = splitStream.select("high");
DataStream<SensorReading> lowTempStream = splitStream.select("low");
DataStream<SensorReading> allTempStream = splitStream.select("high", "low");
highTempStream.print("****************high****************8");
lowTempStream.print("****************low****************8");
allTempStream.print("****************all****************8");
//合流connect,可以是不同数据类型
//将highTempStream转换为二元组类型,与lowTempStream合并后输出状态信息
DataStream<Tuple2<String, Double>> warnStream =
highTempStream.map(new MapFunction<SensorReading, Tuple2<String, Double>>() {
@Override
public Tuple2<String, Double> map(SensorReading sensorReading) throws Exception {
return new Tuple2<>(sensorReading.getId(),sensorReading.getTemperature());
}
});
//不同类型合流
ConnectedStreams<Tuple2<String, Double>, SensorReading> connectStreams =
warnStream.connect(lowTempStream);
//此处去两者公共 类型为Object
SingleOutputStreamOperator<Object> resultStream = connectStreams.map(new CoMapFunction<Tuple2<String, Double>, SensorReading, Object>() {
@Override
public Object map1(Tuple2<String, Double> value) throws Exception {
return new Tuple3<>(value.f0,value.f1,"high temp warning");
}
@Override
public Object map2(SensorReading sensorReading) throws Exception {
return new Tuple2<>(sensorReading.getId(),"normal");
}
});
resultStream.print("**********合流*********");
env.execute();
}
}
结果:
****************all****************8> SensorReading{id='sensor_1', timeStamp=1547234239, temperature=45.8}
****************high****************8> SensorReading{id='sensor_1', timeStamp=1547234239, temperature=45.8}
****************low****************8> SensorReading{id='sensor_6', timeStamp=1547718200, temperature=22.5}
****************all****************8> SensorReading{id='sensor_6', timeStamp=1547718200, temperature=22.5}
****************all****************8> SensorReading{id='sensor_7', timeStamp=1547718201, temperature=32.5}
****************high****************8> SensorReading{id='sensor_7', timeStamp=1547718201, temperature=32.5}
****************low****************8> SensorReading{id='sensor_1', timeStamp=1547718199, temperature=12.5}
****************all****************8> SensorReading{id='sensor_1', timeStamp=1547718199, temperature=12.5}
****************all****************8> SensorReading{id='sensor_1', timeStamp=1547718111, temperature=50.1}
****************high****************8> SensorReading{id='sensor_1', timeStamp=1547718111, temperature=50.1}
****************all****************8> SensorReading{id='sensor_9', timeStamp=1547718202, temperature=42.5}
****************high****************8> SensorReading{id='sensor_9', timeStamp=1547718202, temperature=42.5}
**********合流*********> (sensor_1,45.8,high temp warning)
**********合流*********> (sensor_6,normal)
**********合流*********> (sensor_7,32.5,high temp warning)
**********合流*********> (sensor_1,normal)
**********合流*********> (sensor_1,50.1,high temp warning)
**********合流*********> (sensor_9,42.5,high temp warning)
Union
可以合并多条流,但是数据类型得一样
DataStream -> DataStream:对两个或者两个以上的DataStream进行union操作,产生一个包含所有DataStream元素新的心Datastream
DataStream<SensorReading> unionStream = highTempStream.union(lowTempStream, allTempStream);
unionStream.print("**********union*********");
结果:
**********union*********> SensorReading{id='sensor_1', timeStamp=1547234239, temperature=45.8}
**********union*********> SensorReading{id='sensor_9', timeStamp=1547718202, temperature=42.5}
**********union*********> SensorReading{id='sensor_1', timeStamp=1547718111, temperature=50.1}
**********union*********> SensorReading{id='sensor_7', timeStamp=1547718201, temperature=32.5}
**********union*********> SensorReading{id='sensor_1', timeStamp=1547718199, temperature=12.5}
**********union*********> SensorReading{id='sensor_6', timeStamp=1547718200, temperature=22.5}
**********union*********> SensorReading{id='sensor_1', timeStamp=1547234239, temperature=45.8}
**********union*********> SensorReading{id='sensor_9', timeStamp=1547718202, temperature=42.5}
**********union*********> SensorReading{id='sensor_1', timeStamp=1547718111, temperature=50.1}
**********union*********> SensorReading{id='sensor_1', timeStamp=1547718199, temperature=12.5}
**********union*********> SensorReading{id='sensor_6', timeStamp=1547718200, temperature=22.5}
**********union*********> SensorReading{id='sensor_7', timeStamp=1547718201, temperature=32.5}
shuffer
重分区
public class TransformShuffer {
public static void main(String[] args) throws Exception{
StreamExecutionEnvironment env= StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> inputStream =
env.readTextFile("D:\\Tool\\Idea2020\\FlinkTest\\src\\main\\resources\\file.txt");
env.setParallelism(4);
inputStream.print("input");
DataStream<String> shuffle = inputStream.shuffle();
shuffle.print("shuffer");
env.execute();
}
}
answer:
input:1> sensor_1,1547234239,45.8
input:1> sensor_9,1547718202,42.5
shuffer:2> sensor_1,1547234239,45.8
shuffer:2> sensor_1,1547718111,50.1
shuffer:2> sensor_7,1547718201,32.5
shuffer:2> sensor_6,1547718200,22.5
shuffer:2> sensor_9,1547718202,42.5
input:3> sensor_6,1547718200,22.5
shuffer:2> sensor_1,1547718199,12.5
input:2> sensor_7,1547718201,32.5
input:2> sensor_1,1547718111,50.1
input:2> sensor_1,1547718199,12.5