写完这篇文章,已经是晚上23:57,文章深夜买醉????哈哈

背景

众所周知,各大电商app在首页都会设置"金刚区",而金刚区可以算是流量中的“黄金位置”,乃至“钻石位置”。

下图是某电商app的首页截图,其金刚由10大导航icon组成,其集团内业务板块众多,金刚区icon数量有限,因此icon的取舍需要靠数据评估。

今天我就拿这个来做一个模拟实战——用一个指标来评估这个icon的点击情况,从而决定这个icon的去留。

Flink统计电商ICON导航流量实战_ico

 

业务需求

用户来到电商app,可以理解为一次请求sid,也可以理解为session,用户会可能会重复点击多个icon,业务想实时查看每个icon下有多个去重复的请求sid。

 

kafka模拟数据

认识我的人,都知道我的一贯作风,话少干活多,不浪费大家时间,直接上代码。

  •  
    public void run() {        int messageNo = 1;        try {            for (; ; ) {                List<String> messageList = Arrays.asList("超市", "数码电器", "服饰", "生鲜", "到家", "充值缴费", "领豆", "领券", "值钱", "plus会员");                //模拟用户请求sid                String sid = UUID.randomUUID().toString();                //模拟用户一次请求下点击了多次icon                int random_sid = new Random().nextInt(3) + 1;                for (int i = 0; i < random_sid; i++) {                    //模拟用户点了哪个icon                    int random_icon = new Random().nextInt(10);                    //封装json日志                    JSONObject jsonObject = new JSONObject();                    jsonObject.put("sid",sid);                    jsonObject.put("icon",messageList.get(random_icon));                    jsonObject.put("event_time",System.currentTimeMillis());                    String messageStr = jsonObject.toJSONString();                    producer.send(new ProducerRecord<String, String>(topic, "Message", messageStr));                    Thread.sleep(6000);                    //生产了100条就打印                    if (messageNo % 100 == 0) {                        System.out.println("成功发送了:" + 100);                    }                    //生产1000条就退出                    if (messageNo % 10000 == 0) {                        System.out.println("成功发送了" + messageNo + "条");                        break;                    }                    messageNo++;                }            }        } catch (Exception e) {            e.printStackTrace();        } finally {            producer.close();        }    }

 

Flink核心代码展示
  •  
注:由于代码使用的state做实时去重复,需要用的checkpoint做线上状态实时存储

checkpoint配置代码

  •  
env.enableCheckpointing(60000); //多久checkpoint一次env.getCheckpointConfig().setCheckpointTimeout(60000);//checkpoint超时时间env.getCheckpointConfig().setMinPauseBetweenCheckpoints(500);//两个checkpoint的时间,不能小于500msenv.getCheckpointConfig().setFailOnCheckpointingErrors(false);//允许checkpoint的时候失败env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);//如果当前进程正在checkpoint,系统不会触发另一个checkpointenv.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);//确保数据严格一次StateBackend fsStateBackend = new FsStateBackend("file:///Users/huzechen/Downloads/FinkStudy/src/main/resources");env.setStateBackend(fsStateBackend);//由于我是本地环境,没有设置RocksDBenv.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);//当作业被取消时,保留外部的checkpointenv.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);

去重复核心代码

  •  
keyBy(0,1).process(new ProcessFunction<Tuple4<String, String, String, String>, Tuple4<String, String, String,Long>>() {    private ValueState<Tuple2<String,String>> state;    @Override    public void open(Configuration parameters) throws Exception {        StateTtlConfig ttlConfig = StateTtlConfig                .newBuilder(org.apache.flink.api.common.time.Time.days(1))                .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)                .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)                .build();        ValueStateDescriptor<Tuple2<String,String>> stateDescriptor = new ValueStateDescriptor<Tuple2<String,String>>("myState", TypeInformation.of(new TypeHint<Tuple2<String,String>>() {}));        stateDescriptor.enableTimeToLive(ttlConfig);        state = getRuntimeContext().getState(stateDescriptor);    }    @Override    public void processElement(Tuple4<String, String, String, String> value, Context ctx, Collector<Tuple4<String, String, String,Long>> out) throws Exception {        if (state.value() == null) {            out.collect(new Tuple4(value.f1, value.f2, value.f3,1L));            state.update(new Tuple2(value.f0,value.f1));        }    }});

窗口聚合打印代码

  •  
.keyBy(0,1,2).window(TumblingProcessingTimeWindows.of(Time.seconds(10)))        .sum(3).map(new MapFunction<Tuple4<String,String,String,Long>, String>() {    @Override    public String map(Tuple4<String, String, String, Long> o) throws Exception {        JSONObject jsonObject = new JSONObject();        jsonObject.put("icon",o.f0);        jsonObject.put("dt",o.f1);        jsonObject.put("hour",o.f2);        jsonObject.put("sid_num",o.f3);        return jsonObject.toJSONString();    }}).print();

效果数据展示

  •  
4> {"dt":"20190913","sid_num":43,"hour":"23","icon":"plus会员"}6> {"dt":"20190913","sid_num":4,"hour":"22","icon":"服饰"}5> {"dt":"20190913","sid_num":36,"hour":"23","icon":"数码电器"}3> {"dt":"20190913","sid_num":5,"hour":"22","icon":"领券"}4> {"dt":"20190913","sid_num":37,"hour":"23","icon":"领券"}2> {"dt":"20190913","sid_num":41,"hour":"23","icon":"生鲜"}8> {"dt":"20190913","sid_num":44,"hour":"23","icon":"值钱"}1> {"dt":"20190913","sid_num":3,"hour":"22","icon":"到家"}8> {"dt":"20190913","sid_num":4,"hour":"22","icon":"充值缴费"}2> {"dt":"20190913","sid_num":38,"hour":"23","icon":"到家"}4> {"dt":"20190913","sid_num":33,"hour":"23","icon":"领豆"}3> {"dt":"20190913","sid_num":2,"hour":"22","icon":"超市"}6> {"dt":"20190913","sid_num":32,"hour":"23","icon":"充值缴费"}2> {"dt":"20190913","sid_num":6,"hour":"22","icon":"数码电器"}1> {"dt":"20190913","sid_num":7,"hour":"22","icon":"值钱"}6> {"dt":"20190913","sid_num":37,"hour":"23","icon":"服饰"}3> {"dt":"20190913","sid_num":6,"hour":"22","icon":"领豆"}6> {"dt":"20190913","sid_num":2,"hour":"22","icon":"plus会员"}3> {"dt":"20190913","sid_num":37,"hour":"23","icon":"超市"}3> {"dt":"20190913","sid_num":6,"hour":"22","icon":"生鲜"}