窗口函数之聚合类应用
前面我们已经介绍过窗口函数的基本知识点。接下来我们分享一下具体的一些应用。本次重点分享聚合型窗口函数应用。函数名([expr]) over(partition by <要分列的组> order by <要排序的列> rows between < 数据范围 >)
聚合型窗口函数应用
注意:avg()均值函数,sum()求和函数,max()最大值函数,min()最小值函数,count()计数函数等窗口函数的使用有特殊要求。
- over中可以写分组、排序,但是分组(按组计算)只能用partition by,排序(逐步计算)使用的是order by。
- 对单个字段去重用:distinct,他的执行在窗口函数执行之后执行,
- 对多字段合并的去重:需要再建立一次新的查询再使用group by。group by在窗口函数之前先执行。
SELECT
count(vip_num) over() as `记录次数`,
sum(vip_num) over() as `会员数`,
sum(rank_score) over() as `业绩`
FROM hdp_defaultdb.shishou_xx
WHERE partition_dt = '20221029'
AND substr(order_date,1,10) >= '2022-10-01'
注释:代码查询了where条件下该表的记录次数,会员数,业绩,没有进行分组与排序,直接聚合
SELECT
DISTINCT
city as `城市`,
count(vip_num) over(partition by city) as `记录次数`,
sum(vip_num) over(partition by city) as `会员数`,
sum(rank_score) over(partition by city) as `业绩`
FROM hdp_defaultdb.shishou_xx
WHERE partition_dt= '20221020'
AND substr(order_date,1,10) >= '2022-10-01'
查询结果:
注释:关键使用了distinct,原因如果不使用distinct去重,窗口函数当order by和窗口从句都缺失,窗口规范默认为:rows between unbounded preceding and nubounded following,即全表的每行记录一起分组计算后,每个行结果都会输出结果,案例中武汉就会输出72行相同的结果。窗口函数用了city字段进行分组,如果要对多个字段进行分组,用英文状态逗号隔开即可(如sum(vip_num) over(partition by city,nane))
SELECT
DISTINCT
city as `城市`,
substr(order_date,1,10) as `日期`,
count(vip_num) over(partition by city order by substr(order_date,1,10)) as `记录次数`,
sum(vip_num) over(partition by city order by substr(order_date,1,10)) as `会员数`,
sum(rank_score) over(partition by city order by substr(order_date,1,10)) as `业绩`
FROM hdp_defaultdb.shishou_xx
WHERE partition_dt= '20221029'
AND substr(order_date,1,10) >= '2022-10-01'
查询结果:得到了按照城市分组后,再按照日期进行累加的结果
注释:当order by 后面缺少窗口从句条件,窗口规范默认为:rows between unbounded preceding and current row,即从起始行到当前行的的分组聚合。
SELECT
DISTINCT
city as `城市`,
substr(order_date,1,10) as `日期`,
count(vip_num) over(partition by city,substr(order_date,1,10) ) as `每日记录次数`,
sum(vip_num) over(partition by city,substr(order_date,1,10)) as `每日会员数`,
sum(vip_num) over(partition by city order by substr(order_date,1,10)) as `累计会员`
FROM hdp_defaultdb.shishou_xx
WHERE partition_dt= '20221029'
AND substr(order_date,1,10) >= '2022-10-27'
注释:按照城市、日期分组,计算每日记录数,每日会员数,累计会员数。
聚合型窗口函数的使用案例
数据源
-- 数据字段说明:用户名(username),购买日期(buydate),购买价格(buycost)
张飞,2019-01-01,10
关羽,2019-01-02,15
张飞,2019-02-03,23
关羽,2019-01-04,29
张飞,2019-01-05,46
张飞,2019-04-06,42
关羽,2019-01-07,50
张飞,2019-01-08,55
刘备,2019-04-08,62
刘备,2019-04-09,68
诸葛亮,2019-05-10,12
刘备,2019-04-11,75
诸葛亮,2019-06-12,80
刘备,2019-04-13,94
查询应用
-- 查询购买明细,及合计总购买次数、总购买金额。
select username,buydate,buycost, -- 购买明细
count(*) over() c, --合计购买次数
sum(buycost) over() s -- 合计购买金额
from demo01;
-- 查询购买明细,及每个用户的合计购买金额。
select username,buydate,buycost, -- 购买明细
sum(buycost) over(partition by username) s -- 每个用户的合计购买金额
from demo01;
-- 查询购买明细,及每个用户每个月购买总额。
select username,buydate,buycost, -- 购买明细
sum(buycost) over(partition by username,month(buydate)) s -- 每个用户每个月购买总额
from demo01;
-- 查询购买明细,及按时间升序的累计的购买金额and累计购买次数。
select username,buydate,buycost, -- 购买明细
sum(buycost) over(order by buydate asc) s, -- 累计的购买金额
count(*) over(order by buydate asc) c -- 累计购买次数
from demo01;
-- 查询按月累计的购买金额及累计购买次数。
select distinct substr(buydate,1,7) t,
sum(buycost) over(order by substr(buydate,1,7) asc) s, -- 按月累计的购买金额
count(*) over(order by substr(buydate,1,7) asc) c -- 按月累计购买次数
from demo01;
-- 统计每个用户每月的购买总额,和截止到当月的累计购买总额。
select a.username,a.d,max(a.v) v,max(a.sv) sv from
(select username,substr(buydate,1,7) d,
sum(buycost) over(partition by username,month(buydate)) v,
sum(buycost) over(partition by username order by month(buydate) asc) sv
from demo01 ) a
group by a.username,a.d;
-- 查询购买明细,及每个用户按月累计的购买金额and按月累计的购买次数。
select username,buydate,buycost, -- 购买明细
sum(buycost) over(partition by username order by month(buydate) asc) s, -- 每个用户按月累计的购买金额
count(*) over(partition by username order by month(buydate) asc) c --按月累计的购买次数
from demo01;
-- 查询在2019年4月份,购买过产品的顾客及每个顾客购买次数。
select distinct username,
count(*) over(partition by username) as c
from demo01 where substr(buydate,1,7)="2019-04";
-- 统计每个用户的总消费额、消费次数、平均消费金额,单次最高消费金额和最低消费金额。
select distinct username,
sum(buycost) over (partition by username),
count(buydate) over (partition by username),
round(avg(buycost) over (partition by username),2),
max(buycost) over (partition by username),
min(buycost) over (partition by username)
from demo01;