窗口函数之聚合类应用

前面我们已经介绍过窗口函数的基本知识点。接下来我们分享一下具体的一些应用。本次重点分享聚合型窗口函数应用。函数名([expr]) over(partition by <要分列的组> order by <要排序的列> rows between < 数据范围 >)

聚合型窗口函数应用

注意:avg()均值函数,sum()求和函数,max()最大值函数,min()最小值函数,count()计数函数等窗口函数的使用有特殊要求。

  • over中可以写分组、排序,但是分组(按组计算)只能用partition by,排序(逐步计算)使用的是order by。
  • 对单个字段去重用:distinct,他的执行在窗口函数执行之后执行,
  • 对多字段合并的去重:需要再建立一次新的查询再使用group by。group by在窗口函数之前先执行。
SELECT 
  count(vip_num) over()  as `记录次数`,
  sum(vip_num) over() as `会员数`,
  sum(rank_score) over() as `业绩`
 FROM hdp_defaultdb.shishou_xx
 WHERE partition_dt = '20221029'
 AND substr(order_date,1,10) >= '2022-10-01'

注释:代码查询了where条件下该表的记录次数,会员数,业绩,没有进行分组与排序,直接聚合

SELECT 
  DISTINCT
  city as `城市`,
  count(vip_num) over(partition by city)  as `记录次数`,
  sum(vip_num) over(partition by city) as `会员数`,
  sum(rank_score) over(partition by city) as `业绩`
 FROM hdp_defaultdb.shishou_xx
 WHERE partition_dt= '20221020'
 AND substr(order_date,1,10) >= '2022-10-01'

查询结果:

窗口函数之聚合型窗口函数的应用_窗口函数

注释:关键使用了distinct,原因如果不使用distinct去重,窗口函数当order by和窗口从句都缺失,窗口规范默认为:rows between unbounded preceding and  nubounded following,即全表的每行记录一起分组计算后,每个行结果都会输出结果,案例中武汉就会输出72行相同的结果。窗口函数用了city字段进行分组,如果要对多个字段进行分组,用英文状态逗号隔开即可(如sum(vip_num) over(partition by city,nane))

SELECT 
  DISTINCT
  city as `城市`,
  substr(order_date,1,10) as `日期`,
  count(vip_num) over(partition by city order by substr(order_date,1,10))  as `记录次数`,
  sum(vip_num) over(partition by city order by substr(order_date,1,10)) as `会员数`,
  sum(rank_score) over(partition by city order by substr(order_date,1,10)) as `业绩`
 FROM hdp_defaultdb.shishou_xx
 WHERE partition_dt= '20221029'
 AND substr(order_date,1,10) >= '2022-10-01'

查询结果:得到了按照城市分组后,再按照日期进行累加的结果

窗口函数之聚合型窗口函数的应用_字段_02

注释:当order by 后面缺少窗口从句条件,窗口规范默认为:rows between unbounded preceding and current row,即从起始行到当前行的的分组聚合。

SELECT 
  DISTINCT
  city as `城市`,
  substr(order_date,1,10) as `日期`,
  count(vip_num) over(partition by city,substr(order_date,1,10) )  as `每日记录次数`,
  sum(vip_num) over(partition by city,substr(order_date,1,10)) as `每日会员数`,
  sum(vip_num) over(partition by city order by substr(order_date,1,10)) as `累计会员`
 FROM hdp_defaultdb.shishou_xx
 WHERE partition_dt= '20221029'
 AND substr(order_date,1,10) >= '2022-10-27'

窗口函数之聚合型窗口函数的应用_字段_03

注释:按照城市、日期分组,计算每日记录数,每日会员数,累计会员数。

聚合型窗口函数的使用案例

数据源

-- 数据字段说明:用户名(username),购买日期(buydate),购买价格(buycost)

张飞,2019-01-01,10
关羽,2019-01-02,15
张飞,2019-02-03,23
关羽,2019-01-04,29
张飞,2019-01-05,46
张飞,2019-04-06,42
关羽,2019-01-07,50
张飞,2019-01-08,55
刘备,2019-04-08,62
刘备,2019-04-09,68
诸葛亮,2019-05-10,12
刘备,2019-04-11,75
诸葛亮,2019-06-12,80
刘备,2019-04-13,94

查询应用

-- 查询购买明细,及合计总购买次数、总购买金额。
select username,buydate,buycost, -- 购买明细
 count(*) over() c, --合计购买次数
 sum(buycost) over() s -- 合计购买金额 
 from demo01;
 
-- 查询购买明细,及每个用户的合计购买金额。
select username,buydate,buycost, -- 购买明细 
 sum(buycost) over(partition by username) s -- 每个用户的合计购买金额
 from demo01;
 
-- 查询购买明细,及每个用户每个月购买总额。
select username,buydate,buycost, -- 购买明细 
 sum(buycost) over(partition by username,month(buydate)) s -- 每个用户每个月购买总额
 from demo01;
 
-- 查询购买明细,及按时间升序的累计的购买金额and累计购买次数。
select username,buydate,buycost, -- 购买明细 
 sum(buycost) over(order by buydate asc) s, -- 累计的购买金额
 count(*) over(order by buydate asc) c -- 累计购买次数 
 from demo01;
 
-- 查询按月累计的购买金额及累计购买次数。
select distinct substr(buydate,1,7) t,
 sum(buycost) over(order by substr(buydate,1,7) asc) s, -- 按月累计的购买金额
 count(*) over(order by substr(buydate,1,7) asc) c --  按月累计购买次数 
 from demo01; 
 
-- 统计每个用户每月的购买总额,和截止到当月的累计购买总额。
select a.username,a.d,max(a.v) v,max(a.sv) sv from
(select username,substr(buydate,1,7) d,
 sum(buycost) over(partition by username,month(buydate)) v,
 sum(buycost) over(partition by username order by month(buydate) asc) sv
 from demo01 ) a 
group by a.username,a.d;
 
-- 查询购买明细,及每个用户按月累计的购买金额and按月累计的购买次数。
select username,buydate,buycost, -- 购买明细 
 sum(buycost) over(partition by username order by month(buydate) asc) s, -- 每个用户按月累计的购买金额 
 count(*) over(partition by username order by month(buydate) asc) c --按月累计的购买次数
  from demo01;
 
-- 查询在2019年4月份,购买过产品的顾客及每个顾客购买次数。
select distinct username,
 count(*) over(partition by username) as c 
 from demo01 where substr(buydate,1,7)="2019-04";
 
-- 统计每个用户的总消费额、消费次数、平均消费金额,单次最高消费金额和最低消费金额。
select distinct username,
 sum(buycost) over (partition by username),
 count(buydate) over (partition by username),
 round(avg(buycost) over (partition by username),2),
 max(buycost) over (partition by username),
 min(buycost) over (partition by username)
 from demo01;