题目一:

用户-视频互动表tb_user_video_log

id

uid

video_id

start_time

end_time

if_follow

if_like

if_retweet

comment_id

1

101

2001

2021-09-01 10:00:00

2021-09-01 10:00:20

0

1

1

NULL

2

105

2002

2021-09-10 11:00:00

2021-09-10 11:00:30

1

0

1

NULL

3

101

2001

2021-10-01 10:00:00

2021-10-01 10:00:20

1

1

1

NULL

4

102

2001

2021-10-01 10:00:00

2021-10-01 10:00:15

0

0

1

NULL

5

103

2001

2021-10-01 11:00:50

2021-10-01 11:01:15

1

1

0

1732526

6

106

2002

2021-10-01 10:59:05

2021-10-01 11:00:05

2

0

0

NULL

(uid-用户ID, video_id-视频ID, start_time-开始观看时间, end_time-结束观看时间, if_follow-是否关注, if_like-是否点赞, if_retweet-是否转发, comment_id-评论ID)

短视频信息表tb_video_info

id

video_id

author

tag

duration

release_time

1

2001

901

影视

30

2021-01-01 07:00:00

2

2002

901

美食

60

2021-01-01 07:00:00

3

2003

902

旅游

90

2020-01-01 07:00:00

4

2004

902

美女

90

2020-01-01 08:00:00

(video_id-视频ID, author-创作者ID, tag-类别标签, duration-视频时长, release_time-发布时间)

问题:计算2021年里每个创作者每月的涨粉率及截止当月的总粉丝量 

  • 涨粉率=(加粉量 - 掉粉量) / 播放量。结果按创作者ID、总粉丝量升序排序。
  • if_follow-是否关注为1表示用户观看视频中关注了视频创作者,为0表示此次互动前后关注状态未发生变化,为2表示本次观看过程中取消了关注。

输出示例

示例数据的输出结果如下

author

month

fans_growth_rate

total_fans

901

2021-09

0.500

1

901

2021-10

0.250

2

解释:

示例数据中表tb_user_video_log里只有视频2001和2002的播放记录,都来自创作者901,播放时间在2021年9月和10月;其中9月里加粉量为1,掉粉量为0,播放量为2,因此涨粉率为0.500(保留3位小数);其中10月里加粉量为2,掉份量为1,播放量为4,因此涨粉率为0.250,截止当前总粉丝数为2。

 思路:

第一步:计算每人每月的粉丝量,播放量(按照人和月份分组)

粉丝量:1对应的减去-1对应的粉丝和

播放量:count(*)或者count(1),

涨粉率=粉丝量/播放量

第二步:对每月的粉丝量累计sum得到截至粉丝量(计算每个人的粉丝量,按照月累计即可所以partition by人order by月)

select author,month,round(fans_growth_number/counts,3) as fans_growth_rate,
sum(fans_growth_number) over (partition by author order by month) as total_fans
from(select author,DATE_FORMAT(start_time,'%Y-%m') as month,
    sum(case when if_follow = 1 then 1
    when if_follow = 2 then -1
    else 0 end) as fans_growth_number,
     count(*) as counts
    from tb_user_video_log as  t1
    inner join tb_video_info as  t2
    on t1.video_id = t2.video_id
    where year(start_time)=2021
    group by author,month) t
order by author,total_fans

题目二:

现有试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分):

id

uid

exam_id

start_time

submit_time

score

1

1001

9001

2020-01-01 09:01:01

2020-01-01 09:21:59

90

2

1002

9001

2020-01-20 10:01:01

2020-01-20 10:10:01

89

3

1002

9001

2020-02-01 12:11:01

2020-02-01 12:31:01

83

4

1003

9001

2020-03-01 19:01:01

2020-03-01 19:30:01

75

5

1004

9001

2020-03-01 12:01:01

2020-03-01 12:11:01

60

6

1003

9001

2020-03-01 12:01:01

2020-03-01 12:41:01

90

7

1002

9001

2020-05-02 19:01:01

2020-05-02 19:32:00

90

8

1001

9002

2020-01-02 19:01:01

2020-01-02 19:59:01

69

9

1004

9002

2020-02-02 12:01:01

2020-02-02 12:20:01

99

10

1003

9002

2020-02-02 12:01:01

2020-02-02 12:31:01

68

11

1001

9002

2020-02-02 12:01:01

2020-02-02 12:43:01

81

12

1001

9002

2020-03-02 12:11:01

(NULL)

(NULL)

请输出每份试卷每月作答数和截止当月的作答总数。
由示例数据结果输出如下:

exam_id

start_month

month_cnt

cum_exam_cnt

9001

202001

2

2

9001

202002

1

3

9001

202003

3

6

9001

202005

1

7

9002

202001

1

1

9002

202002

3

4

9002

202003

1

5

解释:试卷9001在202001、202002、202003、202005共4个月有被作答记录,每个月被作答数分别为2、1、3、1,截止当月累积作答总数为2、3、6、7。

思路:

select exam_id,
    start_month,
    month_cnt,
    sum(month_cnt) over(partition by exam_id order by start_month) as cum_exam_cnt
from (
    select
        exam_id,
        date_format(start_time,'%Y%m') as start_month,
        count(*) as month_cnt
    from exam_record
    group by exam_id,date_format(start_time,'%Y%m')
)t

步骤:

1.根据试卷和日期进行分组,并且进行计数

2.对计数后的进行求和,按月累计就是order by 月,每份试卷就是partition by 试卷

题目三:

现有试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分):

id

uid

exam_id

start_time

submit_time

score

1

1001

9001

2020-01-01 09:01:01

2020-01-01 09:21:59

90

2

1002

9001

2020-01-20 10:01:01

2020-01-20 10:10:01

89

3

1002

9001

2020-02-01 12:11:01

2020-02-01 12:31:01

83

4

1003

9001

2020-03-01 19:01:01

2020-03-01 19:30:01

75

5

1004

9001

2020-03-01 12:01:01

2020-03-01 12:11:01

60

6

1003

9001

2020-03-01 12:01:01

2020-03-01 12:41:01

90

7

1002

9001

2020-05-02 19:01:01

2020-05-02 19:32:00

90

8

1001

9002

2020-01-02 19:01:01

2020-01-02 19:59:01

69

9

1004

9002

2020-02-02 12:01:01

2020-02-02 12:20:01

99

10

1003

9002

2020-02-02 12:01:01

2020-02-02 12:31:01

68

11

1001

9002

2020-01-02 19:01:01

2020-02-02 12:43:01

81

12

1001

9002

2020-03-02 12:11:01

(NULL)

(NULL)

请输出自从有用户作答记录以来,每月的试卷作答记录中月活用户数、新增用户数、截止当月的单月最大新增用户数、截止当月的累积用户数。结果按月份升序输出。
由示例数据结果输出如下:

start_month

mau

month_add_uv

max_month_add_uv

cum_sum_uv

202001

2

2

2

2

202002

4

2

2

4

202003

3

0

2

4

202005

1

0

2

4

month

1001

1002

1003

1004

202001

1

1

 

 

202002

1

1

1

1

202003

1

 

1

1

202005

 

1

 

 

由上述矩阵可以看出,2020年1月有2个用户活跃(mau=2),当月新增用户数为2;

2020年2月有4个用户活跃,当月新增用户数为2,最大单月新增用户数为2,当前累积用户数为4。

思路1:

第一步选出活跃用户:

select date_format(start_time,'%Y%m') as month,count(distinct uid) as 月活用户数
from exam_record
group by month

java stream 计算各个项的总和 java累计求和_数据

第二步选出新增用户:

select date_format(start_time,'%Y%m') dt,count(distinct uid) as 新增 from exam_record
    where (date_format(start_time,'%Y%m'),uid) in
    (
    select min(date_format(start_time,'%Y%m')) as dt,uid
    from exam_record
    group by uid
    )
 group by dt

java stream 计算各个项的总和 java累计求和_ide_02

第三步:使用月份左连接

select * from
(
select date_format(start_time,'%Y%m') as month,count(distinct uid) as 月活用户数
from exam_record
group by month
) a
left join
(select date_format(start_time,'%Y%m') dt,count(distinct uid) as 新增 from exam_record
    where (date_format(start_time,'%Y%m'),uid) in
    (
    select min(date_format(start_time,'%Y%m')) as dt,uid
    from exam_record
    group by uid
    )
 group by dt
) b
on a.month = b.dt
 
第四步:计算最大和累计
select a.month,a.月活用户数,ifnull(b.新增,0),
max(ifnull(b.新增,0)) over(order by a.month),
sum(ifnull(b.新增,0)) over (order by a.month)
from
(
select date_format(start_time,'%Y%m') as month,count(distinct uid) as 月活用户数
from exam_record
group by month
) a
left join
(select date_format(start_time,'%Y%m') dt,count(distinct uid) as 新增 from exam_record
    where (date_format(start_time,'%Y%m'),uid) in
    (
    select min(date_format(start_time,'%Y%m')) as dt,uid
    from exam_record
    group by uid
    )
 group by dt
) b
on a.month = b.dt

 

java stream 计算各个项的总和 java累计求和_数据_03

思路二:

1.首先用窗口函数构造辅助列new_day(当月登录是否为最早登陆月):

 

java stream 计算各个项的总和 java累计求和_升序_04

2.计算每个月的new-day就是新增数,对新增计算max,sum分别为最大和累计

SELECT start_month ,#每个月     
    COUNT(DISTINCT uid) mau, #月活用户数   
    count(new_day) month_add_uv, #新增用户
    MAX(count(new_day))OVER(ORDER BY start_month) max_month_add_uv, #截止当月的单月最大新增用户数
    SUM(count(new_day))OVER(ORDER BY start_month) cum_sum_uv
FROM (
    SELECT *,DATE_FORMAT(start_time,'%Y%m') start_month,
    IF(start_time=MIN(start_time)OVER(PARTITION BY uid),1,null) new_day
    FROM exam_record
    )t1
GROUP BY start_month;

 

java stream 计算各个项的总和 java累计求和_数据_05

思路三:

1.对每位用户的月份进行排序,排名为1的月份就是新增月份

select
      uid, date_format(start_time, '%Y%m') as start_month
    , row_number() over(partition by uid order by start_time) as rn
    from exam_record

注意:这里使用row_number,dense_rank,rank均可

 

java stream 计算各个项的总和 java累计求和_ide_06

2.对每个月的1计数得到的就是每个月的新增

select start_month,
count(distinct uid) as mau
, count(if(rn=1, uid, null)) as month_add_uv
from (
    select
      uid, date_format(start_time, '%Y%m') as start_month
    , row_number() over(partition by uid order by start_time) as rn
    from exam_record
) t
group by start_month

 

java stream 计算各个项的总和 java累计求和_ide_07

3.然后加入max,sum

select
  start_month
, count(distinct uid) as mau
, count(if(rn=1, uid, null)) as month_add_uv
, max(count(if(rn=1, uid, null))) over(order by start_month) as max_month_add_uv
, sum(count(if(rn=1, uid, null))) over(order by start_month) as cum_sum_uv
from (
    select
      uid, date_format(start_time, '%Y%m') as start_month
    , row_number() over(partition by uid order by start_time) as rn
    from exam_record
) t
group by start_month