flink 算子 keyBy min max minBy maxBy reduce spilt select

keyBy

Datastream -->KeyedStream:逻辑的将一个流拆分成不相交的分区,每个分区包含具有相同key的元素,在内部以hash的形式实现的

1、KeyBy会重新分区; 2、不同的key有可能分到一起,因为是通过hash原理实现的;

!!但并不是同一个分区就只有一个相同的Key

flink redis 读取多条数据 flink keyby reduce_flink redis 读取多条数据


flink redis 读取多条数据 flink keyby reduce_数据类型_02

滚动聚合算子

sum min max minBy maxBy,针对KeyedStrxeam的每一个支流做聚合
public class TransformAPI {

    public static void main(String[] args) throws Exception{
        StreamExecutionEnvironment env =StreamExecutionEnvironment.getExecutionEnvironment();
        DataStream<String> inputStream =
                env.readTextFile("D:\\Tool\\Idea2020\\FlinkTest\\src\\main\\resources\\file.txt");

        //需要先做分组,才能再做聚合
        DataStream<SensorReading> dataStream= inputStream.map(new MapFunction<String, SensorReading>() {
            @Override
            public SensorReading map(String s) throws Exception {
                String[] splits = s.split(",");
                return new SensorReading(splits[0],new Long(splits[1]),new Double(splits[2]));
            }
        });

//        KeyedStream<SensorReading, Tuple> keyedStream = dataStream.keyBy("id");
        KeyedStream<SensorReading, String> keyedStream = dataStream.keyBy(SensorReading::getId);

        //滚动聚合
//        SingleOutputStreamOperator<SensorReading> resultStream = keyedStream.max("temperature");


        SingleOutputStreamOperator<SensorReading> temperature = keyedStream.max("temperature");

        temperature.print();
        env.execute();
        
    }
}
file.txt 内容
sensor_1,1547718199L,12.5
sensor_6,1547718200L,22.5
sensor_7,1547718201L,32.5
sensor_9,1547718202L,42.5
sensor_1,1547718111L,50.1
sensor_1,1547234239L,45.8
输出
6> SensorReading{id='sensor_6', timeStamp=1547718200, temperature=22.5}
8> SensorReading{id='sensor_9', timeStamp=1547718202, temperature=42.5}
5> SensorReading{id='sensor_1', timeStamp=1547234239, temperature=45.8}
7> SensorReading{id='sensor_7', timeStamp=1547718201, temperature=32.5}
5> SensorReading{id='sensor_1', timeStamp=1547234239, temperature=45.8}
5> SensorReading{id='sensor_1', timeStamp=1547234239, temperature=50.1}

Reduce

KeyedStream ->DataStream:一个分组数据流的聚合操作,合并当前的元素和上次聚合的结果,产生一个新的值,返回的流中包含每一次聚合的结果,而不是只返回最后一次聚合的最终结果

spilt

注:此操作一般后面会跟一个select的操作

DataStream ->SpiltStream:根据某些特征把一个DataStream拆分威两个或者多个Datastream

flink redis 读取多条数据 flink keyby reduce_flink redis 读取多条数据_03

select

SpiltStream ->DataStream:从一个SpiltStream中获取一个或多个DataStream

flink redis 读取多条数据 flink keyby reduce_flink_04

public class TransformSpilt {
    public static void main(String[] args) throws Exception{

        StreamExecutionEnvironment env =StreamExecutionEnvironment.getExecutionEnvironment();
        DataStream<String> inputStream =
                env.readTextFile("D:\\Tool\\Idea2020\\FlinkTest\\src\\main\\resources\\file.txt");
        env.setParallelism(1);

        DataStream<SensorReading> dataStream = inputStream.map(line -> {

                String[] splits = line.split(",");
                return new SensorReading(splits[0],new Long(splits[1]),new Double(splits[2]));
        });
        //分流
        SplitStream<SensorReading> splitStream = dataStream.split(new OutputSelector<SensorReading>() {
            @Override
            public Iterable<String> select(SensorReading sensorReading) {

                return (sensorReading.getTemperature() > 30 ? Collections.singletonList("high"):Collections.singletonList("low"));
            }
        });

        DataStream<SensorReading> highTempStream = splitStream.select("high");
        DataStream<SensorReading> lowTempStream = splitStream.select("low");
        DataStream<SensorReading> allTempStream = splitStream.select("high", "low");

        highTempStream.print("****************high****************8");
        lowTempStream.print("****************low****************8");
        allTempStream.print("****************all****************8");

        env.execute();
    }
}
输出:
****************low****************8> SensorReading{id='sensor_6', timeStamp=1547718200, temperature=22.5}
****************all****************8> SensorReading{id='sensor_6', timeStamp=1547718200, temperature=22.5}
****************high****************8> SensorReading{id='sensor_1', timeStamp=1547234239, temperature=45.8}
****************all****************8> SensorReading{id='sensor_1', timeStamp=1547234239, temperature=45.8}
****************low****************8> SensorReading{id='sensor_1', timeStamp=1547718199, temperature=12.5}
****************all****************8> SensorReading{id='sensor_1', timeStamp=1547718199, temperature=12.5}
****************high****************8> SensorReading{id='sensor_1', timeStamp=1547718111, temperature=50.1}
****************all****************8> SensorReading{id='sensor_1', timeStamp=1547718111, temperature=50.1}
****************high****************8> SensorReading{id='sensor_9', timeStamp=1547718202, temperature=42.5}
****************all****************8> SensorReading{id='sensor_9', timeStamp=1547718202, temperature=42.5}
****************high****************8> SensorReading{id='sensor_7', timeStamp=1547718201, temperature=32.5}
****************all****************8> SensorReading{id='sensor_7', timeStamp=1547718201, temperature=32.5}

Connect和CoMap

–>局限 只能连接两条流,但是类型可以不一样

DataStrream,DataStream -> ConnectedStream:连接两个保持他们类型的数据流,两个数据流被Connect之后,只是被放在了一个同一个流中,内部依然保持各自的数据和形式不发生任何变化,两个流相互独立。

flink redis 读取多条数据 flink keyby reduce_flink redis 读取多条数据_05

CoMap,CoFlatMap

ConnectedStreams ->dataStream:作用于connectedStream上,功能与map和flatMap一样,对ConnectedStream中的每一个Stream分别进行map和flatMap处理

flink redis 读取多条数据 flink keyby reduce_flink redis 读取多条数据_06

public class TransformSpilt {
    public static void main(String[] args) throws Exception{

        StreamExecutionEnvironment env =StreamExecutionEnvironment.getExecutionEnvironment();
        DataStream<String> inputStream =
                env.readTextFile("D:\\Tool\\Idea2020\\FlinkTest\\src\\main\\resources\\file.txt");
        env.setParallelism(1);

        DataStream<SensorReading> dataStream = inputStream.map(line -> {

                String[] splits = line.split(",");
                return new SensorReading(splits[0],new Long(splits[1]),new Double(splits[2]));
        });
        //分流
        SplitStream<SensorReading> splitStream = dataStream.split(new OutputSelector<SensorReading>() {
            @Override
            public Iterable<String> select(SensorReading sensorReading) {

                return (sensorReading.getTemperature() > 30 ? Collections.singletonList("high"):Collections.singletonList("low"));
            }
        });

        DataStream<SensorReading> highTempStream = splitStream.select("high");
        DataStream<SensorReading> lowTempStream = splitStream.select("low");
        DataStream<SensorReading> allTempStream = splitStream.select("high", "low");

        highTempStream.print("****************high****************8");
        lowTempStream.print("****************low****************8");
        allTempStream.print("****************all****************8");

        //合流connect,可以是不同数据类型
        //将highTempStream转换为二元组类型,与lowTempStream合并后输出状态信息
        DataStream<Tuple2<String, Double>> warnStream =
                highTempStream.map(new MapFunction<SensorReading, Tuple2<String, Double>>() {

            @Override
            public Tuple2<String, Double> map(SensorReading sensorReading) throws Exception {

                return new Tuple2<>(sensorReading.getId(),sensorReading.getTemperature());
            }
        });
        //不同类型合流
        ConnectedStreams<Tuple2<String, Double>, SensorReading> connectStreams =
                warnStream.connect(lowTempStream);
        //此处去两者公共 类型为Object
        SingleOutputStreamOperator<Object> resultStream = connectStreams.map(new CoMapFunction<Tuple2<String, Double>, SensorReading, Object>() {
            @Override
            public Object map1(Tuple2<String, Double> value) throws Exception {

                return new Tuple3<>(value.f0,value.f1,"high temp warning");
            }
            @Override
            public Object map2(SensorReading sensorReading) throws Exception {
                return new Tuple2<>(sensorReading.getId(),"normal");
            }
        });

        resultStream.print("**********合流*********");


        env.execute();
    }
}

结果:
****************all****************8> SensorReading{id='sensor_1', timeStamp=1547234239, temperature=45.8}
****************high****************8> SensorReading{id='sensor_1', timeStamp=1547234239, temperature=45.8}
****************low****************8> SensorReading{id='sensor_6', timeStamp=1547718200, temperature=22.5}
****************all****************8> SensorReading{id='sensor_6', timeStamp=1547718200, temperature=22.5}
****************all****************8> SensorReading{id='sensor_7', timeStamp=1547718201, temperature=32.5}
****************high****************8> SensorReading{id='sensor_7', timeStamp=1547718201, temperature=32.5}
****************low****************8> SensorReading{id='sensor_1', timeStamp=1547718199, temperature=12.5}
****************all****************8> SensorReading{id='sensor_1', timeStamp=1547718199, temperature=12.5}
****************all****************8> SensorReading{id='sensor_1', timeStamp=1547718111, temperature=50.1}
****************high****************8> SensorReading{id='sensor_1', timeStamp=1547718111, temperature=50.1}
****************all****************8> SensorReading{id='sensor_9', timeStamp=1547718202, temperature=42.5}
****************high****************8> SensorReading{id='sensor_9', timeStamp=1547718202, temperature=42.5}
**********合流*********> (sensor_1,45.8,high temp warning)
**********合流*********> (sensor_6,normal)
**********合流*********> (sensor_7,32.5,high temp warning)
**********合流*********> (sensor_1,normal)
**********合流*********> (sensor_1,50.1,high temp warning)
**********合流*********> (sensor_9,42.5,high temp warning)

Union

可以合并多条流,但是数据类型得一样

DataStream -> DataStream:对两个或者两个以上的DataStream进行union操作,产生一个包含所有DataStream元素新的心Datastream

flink redis 读取多条数据 flink keyby reduce_数据类型_07

DataStream<SensorReading> unionStream = highTempStream.union(lowTempStream, allTempStream);
        unionStream.print("**********union*********");

结果:
**********union*********> SensorReading{id='sensor_1', timeStamp=1547234239, temperature=45.8}
**********union*********> SensorReading{id='sensor_9', timeStamp=1547718202, temperature=42.5}
**********union*********> SensorReading{id='sensor_1', timeStamp=1547718111, temperature=50.1}
**********union*********> SensorReading{id='sensor_7', timeStamp=1547718201, temperature=32.5}
**********union*********> SensorReading{id='sensor_1', timeStamp=1547718199, temperature=12.5}
**********union*********> SensorReading{id='sensor_6', timeStamp=1547718200, temperature=22.5}
**********union*********> SensorReading{id='sensor_1', timeStamp=1547234239, temperature=45.8}
**********union*********> SensorReading{id='sensor_9', timeStamp=1547718202, temperature=42.5}
**********union*********> SensorReading{id='sensor_1', timeStamp=1547718111, temperature=50.1}
**********union*********> SensorReading{id='sensor_1', timeStamp=1547718199, temperature=12.5}
**********union*********> SensorReading{id='sensor_6', timeStamp=1547718200, temperature=22.5}
**********union*********> SensorReading{id='sensor_7', timeStamp=1547718201, temperature=32.5}

shuffer

重分区

public class TransformShuffer {

    public static void main(String[] args) throws Exception{
        StreamExecutionEnvironment env= StreamExecutionEnvironment.getExecutionEnvironment();
        DataStream<String> inputStream =
                env.readTextFile("D:\\Tool\\Idea2020\\FlinkTest\\src\\main\\resources\\file.txt");
        env.setParallelism(4);

        inputStream.print("input");
        
        DataStream<String> shuffle = inputStream.shuffle();

        shuffle.print("shuffer");
        env.execute();
    }
}

answer:
input:1> sensor_1,1547234239,45.8
input:1> sensor_9,1547718202,42.5
shuffer:2> sensor_1,1547234239,45.8
shuffer:2> sensor_1,1547718111,50.1
shuffer:2> sensor_7,1547718201,32.5
shuffer:2> sensor_6,1547718200,22.5
shuffer:2> sensor_9,1547718202,42.5
input:3> sensor_6,1547718200,22.5
shuffer:2> sensor_1,1547718199,12.5
input:2> sensor_7,1547718201,32.5
input:2> sensor_1,1547718111,50.1
input:2> sensor_1,1547718199,12.5