1、介绍

FlinkCEP是在Flink之上实现的复杂事件处理(CEP)库。它允许您在无穷无尽的事件流中检测事件模式,使您有机会掌握数据中重要的内容。通常会用来做一些用户操作APP的日志风控策略等多种复杂事件,下面详细以用户连续10s内登陆失败超过3次告警为需求,进行全面讲解。

1.1、整体需求数据详解图

java flink 实时指标计算 flink实时计算复杂例子_flink

2、官方案例

官方代码案例如下:

DataStream<Event> input = ...

Pattern<Event, ?> pattern = Pattern.<Event>begin("start").where(
        new SimpleCondition<Event>() {
            @Override
            public boolean filter(Event event) {
                return event.getId() == 42;
            }
        }
    ).next("middle").subtype(SubEvent.class).where(
        new SimpleCondition<SubEvent>() {
            @Override
            public boolean filter(SubEvent subEvent) {
                return subEvent.getVolume() >= 10.0;
            }
        }
    ).followedBy("end").where(
         new SimpleCondition<Event>() {
            @Override
            public boolean filter(Event event) {
                return event.getName().equals("end");
            }
         }
    );

PatternStream<Event> patternStream = CEP.pattern(input, pattern);

DataStream<Alert> result = patternStream.process(
    new PatternProcessFunction<Event, Alert>() {
        @Override
        public void processMatch(
                Map<String, List<Event>> pattern,
                Context ctx,
                Collector<Alert> out) throws Exception {
            out.collect(createAlertFrom(pattern));
        }
    });

2.1、官方案例总结

CEP编程步骤
a)定义模式序列

Pattern.<Class>begin("patternName").API...

基本都是按照如上的套路来新建自定义一个模式规则

后续的可以跟的API可以在官方中查看学习

Event Processing (CEP) | Apache Flink b)将模式序列作用到流上

CEP.pattern(inputDataStream,pattern)

CEP.pattern()是固定格式写法,

其中第一个参数,表示需要具体作用的流;

第二个参数,表示具体的自定义的模式。
c)提取匹配上的数据和输出

由b)生成的流用process API来进行数据处理输出,继承PatternProcessFunction,重写processMatch(Map<String, List<Event>> pattern,Context ctx,Collector<Alert> out)方法,

第一个参数,表示具体匹配上的数据,其中Map的key就是a)步骤中定义的"patternName"名称,value就是该名称具体对应规则匹配上的数据集;

第二个参数,表示没匹配上的数据侧输出流

第三个参数,表示具体该函数处理完,需要对外输出的内容收集。

3、需求案例详解

下面就以从Socket中模拟读取用户操作日志数据,来进行数据CEP匹配数据输出。

以如下代码把读进来的数据进行数据打平成JavaBean。该章节的讲解以代码段进行,后续章节会把demo代码全部贴出来。

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

/**
 * 设置成1,是为了能够触发watermark来计算
 */
env.setParallelism(1);

DataStreamSource<String> socketTextStream = env.socketTextStream("localhost", 8888);

SingleOutputStreamOperator<UserLoginLog> dataStream = socketTextStream.flatMap(new MyFlatMapFunction())
                .assignTimestampsAndWatermarks(
                        WatermarkStrategy.<UserLoginLog>forBoundedOutOfOrderness(Duration.ofSeconds(1))
                                .withTimestampAssigner((SerializableTimestampAssigner<UserLoginLog>) (element, recordTimestamp) -> element.getLoginTime())
                );

3.1、使用begin.where.next.where.next

/**
 * 10s钟之内连续3次登陆失败的才输出,强制连续
 */
Pattern<UserLoginLog, UserLoginLog> wherePatternOne = Pattern.<UserLoginLog>begin("start").where(new SimpleCondition<UserLoginLog>() {
            @Override
            public boolean filter(UserLoginLog value) throws Exception {
                return 1 == value.getLoginStatus();
            }
        }).next("second").where(new IterativeCondition<UserLoginLog>() {
            @Override
            public boolean filter(UserLoginLog value, Context<UserLoginLog> ctx) throws Exception {

                return 1 == value.getLoginStatus();
            }
        }).next("third").where(new SimpleCondition<UserLoginLog>() {
            @Override
            public boolean filter(UserLoginLog value) throws Exception {

                return 1 == value.getLoginStatus();
            }
        }).within(Time.seconds(10));

如上根据设置判断登陆状态是否为失败开始计数,连续第二条,第三条如果也同样为失败的话,就会输出

//如下日志数据输入,最终将输出loginId为:11111、11112、11113、11116、11117、11121

{"loginId":11111,"loginTime":1645177352000,"loginStatus":1,"userName":"aaron"}
{"loginId":11112,"loginTime":1645177353000,"loginStatus":1,"userName":"aaron"}
{"loginId":11113,"loginTime":1645177354000,"loginStatus":1,"userName":"aaron"}
{"loginId":11116,"loginTime":1645177355000,"loginStatus":1,"userName":"aaron"}
{"loginId":11117,"loginTime":1645177356000,"loginStatus":1,"userName":"aaron"}
{"loginId":11118,"loginTime":1645177357000,"loginStatus":1,"userName":"aaron"}
{"loginId":11119,"loginTime":1645177358000,"loginStatus":1,"userName":"aaron"}
{"loginId":11120,"loginTime":1645177359000,"loginStatus":0,"userName":"aaron"}
{"loginId":11121,"loginTime":1645177360000,"loginStatus":1,"userName":"aaron"}
{"loginId":11122,"loginTime":1645177361000,"loginStatus":1,"userName":"aaron"}
{"loginId":11123,"loginTime":1645177362000,"loginStatus":1,"userName":"aaron"}

3.1.1需求输出图解

java flink 实时指标计算 flink实时计算复杂例子_大数据_02

3.2、使用begin.times

/**
 * 10s钟之内连续3次登陆失败的才输出,不强制连续
 */
Pattern<UserLoginLog, UserLoginLog> wherePatternTwo = Pattern.<UserLoginLog>begin("start").where(new IterativeCondition<UserLoginLog>() {
            @Override
            public boolean filter(UserLoginLog value, Context<UserLoginLog> ctx) throws Exception {
                return 1 == value.getLoginStatus();
            }
        }).times(3).within(Time.seconds(10));

如上根据设置判断登陆状态是否为失败开始计数,只要在10秒之内出现第二条,第三条如果也同样为失败的话,就会输出,该本质就是不需要连续出现。

//如下日志数据输入,最终将输出loginId为:11111、11112、11113、11116、11117、11118、11119、11121

{"loginId":11111,"loginTime":1645177352000,"loginStatus":1,"userName":"aaron"}
{"loginId":11112,"loginTime":1645177353000,"loginStatus":1,"userName":"aaron"}
{"loginId":11113,"loginTime":1645177354000,"loginStatus":1,"userName":"aaron"}
{"loginId":11116,"loginTime":1645177355000,"loginStatus":1,"userName":"aaron"}
{"loginId":11117,"loginTime":1645177356000,"loginStatus":1,"userName":"aaron"}
{"loginId":11118,"loginTime":1645177357000,"loginStatus":1,"userName":"aaron"}
{"loginId":11119,"loginTime":1645177358000,"loginStatus":1,"userName":"aaron"}
{"loginId":11120,"loginTime":1645177359000,"loginStatus":0,"userName":"aaron"}
{"loginId":11121,"loginTime":1645177360000,"loginStatus":1,"userName":"aaron"}
{"loginId":11122,"loginTime":1645177361000,"loginStatus":1,"userName":"aaron"}
{"loginId":11123,"loginTime":1645177362000,"loginStatus":1,"userName":"aaron"}

3.2.1、需求图解

java flink 实时指标计算 flink实时计算复杂例子_flink_03

3.3、使用begin.times.consecutive

/**
 * 10s钟之内连续3次登陆失败的才输出,加上 consecutive 之后 就是 强制连续输出
 */
Pattern<UserLoginLog, UserLoginLog> wherePatternThree = Pattern.<UserLoginLog>begin("start").where(new IterativeCondition<UserLoginLog>() {
            @Override
            public boolean filter(UserLoginLog value, Context<UserLoginLog> ctx) throws Exception {
                return 1 == value.getLoginStatus();
            }
        }).times(3).consecutive().within(Time.seconds(10));

如上在比3.2的基础上多加了一个consecutive之后,就变成跟3.1一样的效果

//如下日志数据输入,最终将输出loginId为:11111、11112、11113、11116、11117、11121

{"loginId":11111,"loginTime":1645177352000,"loginStatus":1,"userName":"aaron"}
{"loginId":11112,"loginTime":1645177353000,"loginStatus":1,"userName":"aaron"}
{"loginId":11113,"loginTime":1645177354000,"loginStatus":1,"userName":"aaron"}
{"loginId":11116,"loginTime":1645177355000,"loginStatus":1,"userName":"aaron"}
{"loginId":11117,"loginTime":1645177356000,"loginStatus":1,"userName":"aaron"}
{"loginId":11118,"loginTime":1645177357000,"loginStatus":1,"userName":"aaron"}
{"loginId":11119,"loginTime":1645177358000,"loginStatus":1,"userName":"aaron"}
{"loginId":11120,"loginTime":1645177359000,"loginStatus":0,"userName":"aaron"}
{"loginId":11121,"loginTime":1645177360000,"loginStatus":1,"userName":"aaron"}
{"loginId":11122,"loginTime":1645177361000,"loginStatus":1,"userName":"aaron"}
{"loginId":11123,"loginTime":1645177362000,"loginStatus":1,"userName":"aaron"}

4、本Demo所有代码

4.1、pom文件

<properties>
        <flink.version>1.14.3</flink.version>
        <hadoop.version>2.7.5</hadoop.version>
        <scala.binary.version>2.11</scala.binary.version>
        <kafka.version>2.4.0</kafka.version>
        <redis.version>3.3.0</redis.version>
        <lombok.version>1.18.6</lombok.version>
        <fastjson.verson>1.2.72</fastjson.verson>
        <jdk.version>1.8</jdk.version>

    </properties>
    <dependencyManagement>
        <dependencies>
            <!--hadoop 依赖-->
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
                <version>${hadoop.version}</version>
                <exclusions>
                    <exclusion>
                        <groupId>org.slf4j</groupId>
                        <artifactId>slf4j-log4j12</artifactId>
                    </exclusion>
                    <exclusion>
                        <groupId>log4j</groupId>
                        <artifactId>log4j</artifactId>
                    </exclusion>
                    <exclusion>
                        <groupId>org.slf4j</groupId>
                        <artifactId>slf4j-api</artifactId>
                    </exclusion>
                </exclusions>
            </dependency>
            <!--flink 依赖-->
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-clients_${scala.binary.version}</artifactId>
                <version>${flink.version}</version>
                <exclusions>
                    <exclusion>
                        <groupId>org.slf4j</groupId>
                        <artifactId>slf4j-api</artifactId>
                    </exclusion>
                </exclusions>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-java</artifactId>
                <version>${flink.version}</version>
                <exclusions>
                    <exclusion>
                        <groupId>log4j</groupId>
                        <artifactId>*</artifactId>
                    </exclusion>
                    <exclusion>
                        <groupId>org.slf4j</groupId>
                        <artifactId>slf4j-log4j12</artifactId>
                    </exclusion>
                </exclusions>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
                <version>${flink.version}</version>
                <exclusions>
                    <exclusion>
                        <groupId>log4j</groupId>
                        <artifactId>*</artifactId>
                    </exclusion>
                    <exclusion>
                        <groupId>org.slf4j</groupId>
                        <artifactId>slf4j-log4j12</artifactId>
                    </exclusion>
                    <exclusion>
                        <groupId>com.google.code.findbugs</groupId>
                        <artifactId>jsr305</artifactId>
                    </exclusion>
                    <exclusion>
                        <groupId>org.apache.flink</groupId>
                        <artifactId>force-shading</artifactId>
                    </exclusion>
                </exclusions>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-statebackend-rocksdb_${scala.binary.version}</artifactId>
                <version>${flink.version}</version>
                <exclusions>
                    <exclusion>
                        <groupId>log4j</groupId>
                        <artifactId>*</artifactId>
                    </exclusion>
                    <exclusion>
                        <groupId>org.slf4j</groupId>
                        <artifactId>slf4j-log4j12</artifactId>
                    </exclusion>
                </exclusions>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-runtime-web_2.11</artifactId>
                <version>${flink.version}</version>
                <scope>provided</scope>
            </dependency>
            <!--kafka依赖-->
            <dependency>
                <groupId>org.apache.kafka</groupId>
                <artifactId>kafka-clients</artifactId>
                <version>${kafka.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-connector-kafka_2.11</artifactId>
                <version>${flink.version}</version>
                <exclusions>
                    <exclusion>
                        <groupId>log4j</groupId>
                        <artifactId>*</artifactId>
                    </exclusion>
                    <exclusion>
                        <groupId>org.slf4j</groupId>
                        <artifactId>slf4j-log4j12</artifactId>
                    </exclusion>
                </exclusions>
            </dependency>
            <!--redis依赖-->
            <dependency>
                <groupId>redis.clients</groupId>
                <artifactId>jedis</artifactId>
                <version>${redis.version}</version>
            </dependency>
            <!--lombok-->
            <dependency>
                <groupId>org.projectlombok</groupId>
                <artifactId>lombok</artifactId>
                <version>${lombok.version}</version>
                <scope>provided</scope>
            </dependency>
            <dependency>
                <groupId>com.alibaba</groupId>
                <artifactId>fastjson</artifactId>
                <version>${fastjson.verson}</version>
            </dependency>

            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-cep_2.11</artifactId>
                <version>${flink.version}</version>
            </dependency>

        </dependencies>
    </dependencyManagement>

4.2、UserLoginLog类

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

@Data
@AllArgsConstructor
@NoArgsConstructor
class UserLoginLog {
    /**
     * 登陆id
     */
    private int loginId;
    /**
     * 登陆时间
     */
    private long loginTime;
    /**
     * 登陆状态 1--登陆失败 0--登陆成功
     */
    private int loginStatus;
    /**
     * 登陆用户名
     */
    private String userName;
}

4.3、MyFlatMapFunction类

import com.alibaba.fastjson.JSONObject;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.lang.StringUtils;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.util.Collector;

@Slf4j
public class MyFlatMapFunction implements FlatMapFunction<String, UserLoginLog> {

    /**
     * The core method of the FlatMapFunction. Takes an element from the input data set and
     * transforms it into zero, one, or more elements.
     *
     * @param value The input value.
     * @param out   The collector for returning result values.
     * @throws Exception This method may throw exceptions. Throwing an exception will cause the
     *                   operation to fail and may trigger recovery.
     */
    @Override
    public void flatMap(String value, Collector<UserLoginLog> out) throws Exception {

        if (StringUtils.isNotBlank(value)) {
            UserLoginLog userLoginLog = JSONObject.parseObject(value, UserLoginLog.class);
            out.collect(userLoginLog);
        }


    }
}

4.4、MyPatternProcessFunction类

import lombok.extern.slf4j.Slf4j;
import org.apache.flink.cep.functions.PatternProcessFunction;
import org.apache.flink.util.Collector;

import java.util.List;
import java.util.Map;

@Slf4j
public class MyPatternProcessFunction extends PatternProcessFunction<UserLoginLog, UserLoginLog> {
    /**
     * Generates resulting elements given a map of detected pattern events. The events are
     * identified by their specified names.
     *
     * <p>{@link Context#timestamp()} in this case returns the time of the
     * last element that was assigned to the match, resulting in this partial match being finished.
     *
     * @param match map containing the found pattern. Events are identified by their names.
     * @param ctx   enables access to time features and emitting results through side outputs
     * @param out   Collector used to output the generated elements
     * @throws Exception This method may throw exceptions. Throwing an exception will cause the
     *                   operation to fail and may trigger recovery.
     */
    @Override
    public void processMatch(Map<String, List<UserLoginLog>> match, Context ctx, Collector<UserLoginLog> out) throws Exception {

        List<UserLoginLog> start = match.get("start");

        out.collect(start.get(0));
    }
}

4.4、主类

import lombok.extern.slf4j.Slf4j;
import org.apache.flink.api.common.eventtime.*;
import org.apache.flink.cep.CEP;
import org.apache.flink.cep.PatternStream;
import org.apache.flink.cep.pattern.Pattern;
import org.apache.flink.cep.pattern.conditions.IterativeCondition;
import org.apache.flink.cep.pattern.conditions.SimpleCondition;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.api.windowing.time.Time;

import java.time.Duration;

@Slf4j
public class CepLearning {
    public static void main(String[] args) {

        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        /**
         * 设置成1,是为了能够触发watermark来计算
         */
        env.setParallelism(1);

        DataStreamSource<String> socketTextStream = env.socketTextStream("localhost", 8888);

        SingleOutputStreamOperator<UserLoginLog> dataStream = socketTextStream.flatMap(new MyFlatMapFunction())
                .assignTimestampsAndWatermarks(
                        WatermarkStrategy.<UserLoginLog>forBoundedOutOfOrderness(Duration.ofSeconds(1))
                                .withTimestampAssigner((SerializableTimestampAssigner<UserLoginLog>) (element, recordTimestamp) -> element.getLoginTime())
                );


        /**
         * 10s钟之内连续3次登陆失败的才输出,强制连续
         */
        Pattern<UserLoginLog, UserLoginLog> wherePatternOne = Pattern.<UserLoginLog>begin("start").where(new SimpleCondition<UserLoginLog>() {
            @Override
            public boolean filter(UserLoginLog value) throws Exception {
                return 1 == value.getLoginStatus();
            }
        }).next("second").where(new IterativeCondition<UserLoginLog>() {
            @Override
            public boolean filter(UserLoginLog value, Context<UserLoginLog> ctx) throws Exception {

                return 1 == value.getLoginStatus();
            }
        }).next("third").where(new SimpleCondition<UserLoginLog>() {
            @Override
            public boolean filter(UserLoginLog value) throws Exception {

                return 1 == value.getLoginStatus();
            }
        }).within(Time.seconds(10));

        /**
         * 10s钟之内连续3次登陆失败的才输出,不强制连续
         */
        Pattern<UserLoginLog, UserLoginLog> wherePatternTwo = Pattern.<UserLoginLog>begin("start").where(new IterativeCondition<UserLoginLog>() {
            @Override
            public boolean filter(UserLoginLog value, Context<UserLoginLog> ctx) throws Exception {
                return 1 == value.getLoginStatus();
            }
        }).times(3).within(Time.seconds(10));

        /**
         * 10s钟之内连续3次登陆失败的才输出,加上 consecutive 之后 就是 强制连续输出
         */
        Pattern<UserLoginLog, UserLoginLog> wherePatternThree = Pattern.<UserLoginLog>begin("start").where(new IterativeCondition<UserLoginLog>() {
            @Override
            public boolean filter(UserLoginLog value, Context<UserLoginLog> ctx) throws Exception {
                return 1 == value.getLoginStatus();
            }
        }).times(3).consecutive().within(Time.seconds(10));


        PatternStream<UserLoginLog> patternStream = CEP.pattern(dataStream, wherePatternOne);

        PatternStream<UserLoginLog> patternStream1 = CEP.pattern(dataStream, wherePatternTwo);

        PatternStream<UserLoginLog> patternStream2 = CEP.pattern(dataStream, wherePatternThree);


        SingleOutputStreamOperator<UserLoginLog> process = patternStream.process(new MyPatternProcessFunction());
        SingleOutputStreamOperator<UserLoginLog> process1 = patternStream1.process(new MyPatternProcessFunction());
        SingleOutputStreamOperator<UserLoginLog> process2 = patternStream2.process(new MyPatternProcessFunction());

        process.print("resultOutPut");

        process1.print("resultOutPutTwo");

        process2.print("resultOutPutThree");

        try {
            env.execute();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }


}