0 前言
最近刷完了LeetCode上所有的SQL题目,总结了一下下,个人感觉,可以把数据分析面试过程中遇到的SQL查询部分的常见题型分成8个大的类别,这些类别的题目在LC上都能找到对应的题目进行练习。由于SQL题型不像算法题目那样有自己的tag,于是我就手动整理了一份EXCEL,上面有LC142道SQL题(截止到2021/01/02)的题型解释和难度说明。(手动整理,可能会有疏漏)。
如下图所示
题型整理(部分)
8个主要题型分别为:
- TopN问题
- 中位数问题
- 连续问题
- 累计(区间)计算问题
- 条件求和问题
- 行转列问题
- 递归问题
- 笛卡尔积问题
进阶题型会比普通的查询更难一点,对应的进阶技巧可以参考这篇文章:
Ocean:进击的小菜鸡之SQL进阶学习zhuanlan.zhihu.com
1 TopN
TopN问题是指,求某一字段的分组排名在前N名的记录。这里的N通常会取3/2/1不等,比如说公司不同部门的薪水前三名,不同游戏类型的最近三天的记录等等。
例题1-1
- 解法一
利用窗口函数解决TopN问题
select
t.Department, t.Employee, t.Salary
from
(
select
b.Name as Department,
a.Name as Employee,
a.Salary,
dense_rank() over (partition by DepartmentId order by Salary desc) as rk
from
Employee a join Department b on a.DepartmentId = b.Id
) t
where t.rk <= 3
- 解法二
对于这种分组内取前几名的问题,可以先group by
然后用having count()
来筛选,比如这题,找每个部门的工资前三名,那么先在子查询中用Employee
和自己做连接,连接条件是【部门相同但是工资比我高】,那么接下来按照having count(Salary) <= 2
来筛选的原理是:如果【跟我一个部门而且工资比我高的人数】不超过2
个,那么我一定是部门工资前三,这样内层查询可以查询出所有符合要求的员工ID
select
d.name as Department, e.Name as Employee, e.Salary
from
Employee e left join Department d on e.DepartmentId = d.Id
where
e.id
in
(
select e1.Id
from Employee e1 left join Employee e2
on e1.Salary < e2.Salary and e1.DepartmentId = e2.DepartmentId
group by e1.Id having count(distinct e2.Salary) <= 2
)
and
e.DepartmentId in (select Id from Department)
order by
d.Id, Salary desc
- 解法三
与解法二相似,不过是把自连接拆到了where子句上面,<=2改成了<3
select
d.Name as Department,
e.Name as Employee,
e.Salary
from
Employee e join Department d on e.DepartmentId = d.Id
where
(
select count(distinct e1.Salary)
from Employee e1
where e.Salary < e1.Salary
and e1.DepartmentId = e.DepartmentId
) < 3
order by d.Id, Salary desc
- 解法四
利用变量赋值累加,可以得到连续出现的次数
select distinct Num as ConsecutiveNums
from
(
select
Num,
case when @prev = Num then @count := @count + 1
when (@prev := Num) is not null then @count := 1
end as CNT
from
Logs, (select @prev := null,@count := null) as t
) temp
where temp.CNT >= 3
例题1-2
- 用topN的思路将结果Union起来即可
把不同字段下top1的name和title用Union All的方式组合起来
(
select name results
from (select name,rank() over(order by count(1) desc,name) rk
from Movie_Rating m join Users using(user_id) group by 1) t
where rk = 1
)
union all
(
select title results
from (select title,rank() over(order by avg(rating) desc,title) rk
from Movie_Rating m join Movies using(movie_id)
where created_at like '2020-02%' group by 1) t
where rk = 1
)
- 默认分组取第一
在MySQL中分组后可以用排序的方式取分组排名第一的值
(
select name results
from Movie_Rating m
join Users u on m.user_id = u.user_id
group by 1 order by count(1) desc,name limit 1
)
union all
(
select title results
from Movie_Rating m
join Movies mo on m.movie_id = mo.movie_id
where created_at like '2020-02%'
group by 1 order by avg(rating) desc,1 limit 1
)
2 中位数
中位数是大家都熟悉的概念,但是在SQL题目,查询某个字段值为该字段中位数的记录并不是很好求,需要根据中位数的定义特性进行求取,有时候也可以用辅助列帮忙。题目通常为:求数字的中位数、求薪资的中位数等等。
- 窗口函数
如果一笔工资的排序,卡在公司工资出现次数的和这个次数+1之间,则说明这一笔工资是中位数
with t as(
select
Id,Company,Salary,
row_number() over(partition by Company order by Salary) as rk,
count(Salary) over(partition by Company) as cnt
from
Employee
)
select
distinct Id,Company,Salary
from
t
where
rk between cnt/2.0 and cnt/2.0+1
order by
Company,Salary
- 自连接分组,子句过滤
一笔工资,如果比他大的数量大于等于总数的一半,且比他小的数量大于等于总数的一半,那么这一笔工资刚好就是工资的中位数。
select
min(e1.Id) id,e1.Company,e1.Salary
from
Employee e1 join Employee e2 using(Company)
group by
2,3
having
SUM(CASE WHEN e1.Salary >= e2.Salary THEN 1 ELSE 0 END) >= COUNT(1)/2
and
SUM(CASE WHEN e1.Salary <= e2.Salary THEN 1 ELSE 0 END) >= COUNT(1)/2
order by 2,3
例题2-2
- 解法一
利用中位数的数学特性,数字升序对其频数累计求和,数字降序对其频数累计求和,两列累计频数皆大于总频数一半的那些数字做一个平均即可。如数字1,2,3对应频数1,1,1。无论升降序频数累计皆是1,2,3,对应数字123排序为:123,321。总频数一半为1.5,满足条件的只有2,对应的数字为2,平均之后也是2。
with t as (
select
number,
(sum(frequency) over())/2 med,
sum(frequency) over(order by number) asc_cum,
sum(frequency) over(order by number desc) des_cum
from
Numbers
)
select avg(number) as median from t where asc_cum >= med and des_cum >= med
- 解法二
SELECT
AVG(NUMBER) AS MEDIAN
FROM
(
SELECT
*,
SUM(FREQUENCY) OVER(ORDER BY NUMBER) AS S1,
SUM(FREQUENCY) OVER(ORDER BY NUMBER DESC) AS S2
FROM
NUMBERS
) AS A
WHERE
S1 BETWEEN S2 - FREQUENCY AND S2 + FREQUENCY
3 连续
连续问题是指某些具有连续特性的记录,比如说连续出现的数字,连续登录的客户等等。
例题3-1
- 解法一
直接用暴力解法,三表关联,找出一样的数字即可,注意要加一个distinct
select
distinct l1.Num as ConsecutiveNums
from
Logs l1, Logs l2, Logs l3
where
l1.Id = l2.Id - 1
and
l1.Id = l3.Id - 2
and
l1.Num = l2.Num
and
l2.Num = l3.Num
- 解法二
考虑到对id进行的排序后的辅助列1,减去,用Num分组后再对id进行排序的辅助列2,会得到一个diff列,这个diff列具有一个性质,即,若Num对应的diff值相同且出现三次及以上,那么就说明这个Num是连续出现的。
select
distinct Num as ConsecutiveNums
from
(
select Num, count(*) as n
from
(
select
Id,Num,
(row_number() over(order by Id) - row_number() over(partition by Num order by Id)) as diff
from
Logs
) as a
group by
Num, diff
) as b
where
n >= 3
- 解法三
利用窗口函数求出diff后进行分组,再用having子句过滤
select distinct Num as ConsecutiveNums
from(
select Num,Id + 1 - row_number() over(partition by Num) diff
from Logs
) t
group by Num,diff
having count(1) >=3
例题3-2
窗口函数
min代表首次登陆,lead-min代表第二次登陆日期减去首次登陆日期,若为1则存在连续登录。
with t as(
select
player_id,
lead(event_date,1) over(partition by player_id order by event_date) -
min(event_date) over(partition by player_id order by event_date) as diff
from
Activity
)
select round((select count(1) from t where diff = 1)/(select count(distinct player_id) from Activity),2) fraction
- 左连接
选出首次登录日期后,将原表连接登录表,条件是存在第二天登录的日期,对满足条件的date取一个平均即可
select
round(avg(b.event_date is not null),2) fraction
from
(
select
player_id,min(event_date) dt
from
Activity
group by
player_id
) a
left join
Activity b
on
a.player_id = b.player_id and a.dt = b.event_date - 1
- 子查询
筛选出存在首次登录后第二天也登录的日期即可,注意in的用法不局限于一个字段,两个字段一样也可以用
select
round(
count(distinct player_id)/
(select count(distinct player_id) from Activity)
,2) fraction
from
Activity
where
(player_id,event_date)
in
(
select
player_id,date(min(event_date)+1)
from
Activity
group by
player_id
)
例题3-3
PS:本题难度较大
- 解法一
将不同状态的记录做联合查询,再利用subdate将不同状态的日期和序号做差,用差进行分类即可。
select type as period_state, min(date) as start_date, max(date) as end_date
from
(
select type, date, subdate(date,row_number()over(partition by type order by date)) as diff
from
(
select 'failed' as type, fail_date as date from Failed
union all
select 'succeeded' as type, success_date as date from Succeeded
) a
)a
where date between '2019-01-01' and '2019-12-31'
group by type,diff
order by start_date
作者:nei-zang-mei-liao
链接:https://leetcode-cn.com/problems/report-contiguous-dates/solution/meng-xin-jie-fa-yong-chuang-kou-han-shu-he-subdate/
来源:力扣(LeetCode)
- 解法二
除了利用时间作差,也可以用两个窗口函数,一个计算全部记录的排序,一个记录分组后的排序,两个排序作差
with t1 as(
select success_date d,'succeeded' p from Succeeded where year(success_date)=2019
union all
select fail_date d,'failed' p from Failed where year(fail_date)=2019
order by 1
)
select p period_state,min(d) start_date, max(d) end_date
from
(
select
*,
row_number() over() rk1,
rank() over(partition by p order by d) rk2
from
t1
order by
d
) t
group by 1,rk1-rk2
4 累计(区间)计算
所谓累计计算或者是区间计算,就是指在特定的区间内进行求和。常见的累计区间有:从过去到现在,或者是最近的几天内。
例题4-1
- 解法一
通过带条件的自连接,将每一行都进行分组求和,其结果就是累计求和
select
a.player_id,a.event_date,sum(b.games_played) games_played_so_far
from
activity a join activity b on a.player_id=b.player_id and a.event_date>=b.event_date
group by
1,2
order by
1,2
- 解法二
要实现累计求和,其实窗口函数就能直接达到这个效果
select
player_id,event_date,sum(games_played)
over(partition by player_id order by event_date) games_played_so_far
from
Activity
group by
1,2
例题4-2
- 解法一
用join on date diff的方式,可以将每一条记录对应的前6条记录找出,然后再计算平均值
select
a.visited_on,
sum(b.amount) as amount,
round(sum(b.amount)/7,2) as average_amount
from
(
select distinct visited_on from Customer
where visited_on >= (select date(min(visited_on)+6) from Customer)
) a
left join
Customer b
on
datediff(a.visited_on,b.visited_on) between 0 and 6
group by
a.visited_on
order by
a.visited_on
- 解法二
用窗口函数的frame子句,可以直接指定要计算的范围
select * from
(
select
visited_on,
sum(amt) over(rows between 6 preceding and current row) amount,
round(avg(amt) over(rows between 6 preceding and current row),2) average_amount
from
(select visited_on,sum(amount) amt from Customer group by visited_on) t1
) t2
where visited_on >= (select date(min(visited_on)+6) from Customer)
5 条件求和
条件求和就是说,计算和的时候需要满足特定的条件才能相加。形式常为sum if,或者是sum case when。
例题5-1
解法一
容易想到用sum中带一个if条件,将股票不同的操作状态产生的和进行相减.
select
stock_name,
sum(if(operation='Sell',price,0)) -
sum(if(operation='Buy',price,0)) capital_gain_loss
from
Stocks
group by
1
- 解法二
对解法一进行优化
select
stock_name,
sum(if(operation='Buy',-price,price)) capital_gain_loss
from
Stocks
group by
1
例题5-2
- 解法一
利用groupby找出每个用户的最小日期即为登录日期,然后用datediff找出最近90天的登录用户即可
select
login_date,count(1) user_count
from
(
select user_id,min(activity_date) login_date from Traffic
where activity = 'login' group by user_id
) t
where
datediff('2019-06-30',login_date) <= 90
group by
login_date
order by
login_date
- 解法二
窗口函数求出每个用户的首次登录日期然后做筛选,再求登录用户数量
with t as(
select
user_id,activity_date,
rank() over(partition by user_id order by activity_date) rk
from
Traffic
where
activity = 'login'
)
select
activity_date login_date,
count(distinct user_id) as user_count
from
t
where
rk = 1 and datediff('2019-06-30',activity_date) <= 90
group by
1
order by
1
例题5-3
此题巨难,涉及的知识点包括条件求和,连续和递归。
Drivers table:
+-----------+------------+
| driver_id | join_date |
+-----------+------------+
| 10 | 2019-12-10 |
| 8 | 2020-1-13 |
| 5 | 2020-2-16 |
| 7 | 2020-3-8 |
| 4 | 2020-5-17 |
| 1 | 2020-10-24 |
| 6 | 2021-1-5 |
+-----------+------------+
Rides table:
+---------+---------+--------------+
| ride_id | user_id | requested_at |
+---------+---------+--------------+
| 6 | 75 | 2019-12-9 |
| 1 | 54 | 2020-2-9 |
| 10 | 63 | 2020-3-4 |
| 19 | 39 | 2020-4-6 |
| 3 | 41 | 2020-6-3 |
| 13 | 52 | 2020-6-22 |
| 7 | 69 | 2020-7-16 |
| 17 | 70 | 2020-8-25 |
| 20 | 81 | 2020-11-2 |
| 5 | 57 | 2020-11-9 |
| 2 | 42 | 2020-12-9 |
| 11 | 68 | 2021-1-11 |
| 15 | 32 | 2021-1-17 |
| 12 | 11 | 2021-1-19 |
| 14 | 18 | 2021-1-27 |
+---------+---------+--------------+
AcceptedRides table:
+---------+-----------+---------------+---------------+
| ride_id | driver_id | ride_distance | ride_duration |
+---------+-----------+---------------+---------------+
| 10 | 10 | 63 | 38 |
| 13 | 10 | 73 | 96 |
| 7 | 8 | 100 | 28 |
| 17 | 7 | 119 | 68 |
| 20 | 1 | 121 | 92 |
| 5 | 7 | 42 | 101 |
| 2 | 4 | 6 | 38 |
| 11 | 8 | 37 | 43 |
| 15 | 8 | 108 | 82 |
| 12 | 8 | 38 | 34 |
| 14 | 1 | 90 | 74 |
+---------+-----------+---------------+---------------+
Result table:
+-------+----------------+----------------+
| month | active_drivers | accepted_rides |
+-------+----------------+----------------+
| 1 | 2 | 0 |
| 2 | 3 | 0 |
| 3 | 4 | 1 |
| 4 | 4 | 0 |
| 5 | 5 | 0 |
| 6 | 5 | 1 |
| 7 | 5 | 1 |
| 8 | 5 | 1 |
| 9 | 5 | 0 |
| 10 | 6 | 0 |
| 11 | 6 | 2 |
| 12 | 6 | 1 |
+-------+----------------+----------------+
By the end of January --> two active drivers (10, 8) and no accepted rides.
By the end of February --> three active drivers (10, 8, 5) and no accepted rides.
By the end of March --> four active drivers (10, 8, 5, 7) and one accepted ride (10).
By the end of April --> four active drivers (10, 8, 5, 7) and no accepted rides.
By the end of May --> five active drivers (10, 8, 5, 7, 4) and no accepted rides.
By the end of June --> five active drivers (10, 8, 5, 7, 4) and one accepted ride (13).
By the end of July --> five active drivers (10, 8, 5, 7, 4) and one accepted ride (7).
By the end of August --> five active drivers (10, 8, 5, 7, 4) and one accepted ride (17).
By the end of Septemeber --> five active drivers (10, 8, 5, 7, 4) and no accepted rides.
By the end of October --> six active drivers (10, 8, 5, 7, 4, 1) and no accepted rides.
By the end of November --> six active drivers (10, 8, 5, 7, 4, 1) and two accepted rides (20, 5).
By the end of December --> six active drivers (10, 8, 5, 7, 4, 1) and one accepted ride (2).
- 解答
recursive是递归表,生成每个月。
t1是活跃司机表,找出2020年每个月累积的活跃司机再加上19年的司机数量即是题目所需的司机数量。
t2是找出每个月的所有接受的行程。 最后合并两个表就行~
with
recursive t(m) as(
select 1 m
union all
select m + 1 from t where m < 12
),
t1 as(
select
t.m month,
sum(if(cnt is null,0,cnt)) over(order by t.m) +
(select count(1) from Drivers where year(join_date)<2020) active_drivers
from
t
left join
(select month(join_date) m,count(distinct driver_id) cnt
from Drivers where year(join_date)=2020 group by 1) tmp
on
t.m = tmp.m
),
t2 as(
select
m,count(a.ride_id) accepted_rides
from
t
left join
Rides r
on
t.m = month(requested_at) and year(requested_at) = 2020
left join
AcceptedRides a on a.ride_id = r.ride_id
group by
1
)
select month,active_drivers,accepted_rides from t1 join t2 on t1.month = t2.m
- 解法二
2020年的司机数量均加上2019的数量太麻烦了,可以直接把2019年的司机数量归类到2020年的1月去
with recursive mon_time as
(
select 1 as month
union all
select month + 1 from mon_time where month <= 11
)
select m.month,
sum(ifnull(d.num_driver,0)) over(order by m.month) as active_drivers,
ifnull(num_rides,0) as accepted_rides
from mon_time m
left join
(
select if(year(join_date)<'2020',1,month(join_date)) as month_driver,
count(driver_id) as num_driver
from drivers
where year(join_date)<='2020'
group by if(year(join_date)<'2020',1,month(join_date))
)d on m.month = month_driver
left join
(
select month(requested_at) as month_rides,
count(ride_id) as num_rides
from rides
where ride_id in (select ride_id from acceptedrides)
and year(requested_at) = '2020'
group by month(requested_at)
)r on m.month = month_rides
group by m.month
order by m.month
6 行转列
行转列就是指把符合某一个条件的值另起一个字段来输出。
例题6-1
- 解答
只需要把每个id在符合月份的条件下输出即可。
select
id,
max(if(month='Jan' ,revenue,null)) as 'Jan_Revenue',
max(if(month='Feb' ,revenue,null)) as 'Feb_Revenue',
max(if(month='Mar' ,revenue,null)) as 'Mar_Revenue',
max(if(month='Apr' ,revenue,null)) as 'Apr_Revenue',
max(if(month='May' ,revenue,null)) as 'May_Revenue',
max(if(month='Jun' ,revenue,null)) as 'Jun_Revenue',
max(if(month='Jul' ,revenue,null)) as 'Jul_Revenue',
max(if(month='Aug' ,revenue,null)) as 'Aug_Revenue',
max(if(month='Sep' ,revenue,null)) as 'Sep_Revenue',
max(if(month='Oct' ,revenue,null)) as 'Oct_Revenue',
max(if(month='Nov' ,revenue,null)) as 'Nov_Revenue',
max(if(month='Dec' ,revenue,null)) as 'Dec_Revenue'
from
Department
group by
id
用case when也可以达到一样的效果。
例题6-2
- 解答
先进行表连接,把每个item的所属种类找出,然后根据种类进行group by,基于逻辑函数if判断出周内每一天的销量,再求和即可。
select
item_category as CATEGORY,
sum(if(w=0,quantity,0)) as MONDAY,
sum(if(w=1,quantity,0)) as TUESDAY,
sum(if(w=2,quantity,0)) as WEDNESDAY,
sum(if(w=3,quantity,0)) as THURSDAY,
sum(if(w=4,quantity,0)) as FRIDAY,
sum(if(w=5,quantity,0)) as SATURDAY,
sum(if(w=6,quantity,0)) as SUNDAY
from
(select item_category,quantity,weekday(order_date) w
from Items i left join Orders o on i.item_id = o.item_id) t
group by
1
order by
1
7 递归
递归常用于补齐某个字段所有会出现的可能性,如1,3,5可以用递归补齐2和4。
例题7-1
- 解答
在已经145的情况下要求出2和3,可以用递归的方式生成1-5然后再用带join和条件筛选即可找出2和3.
with recursive t(n) as(
select 1
union all
select 1+n from t where n < (select max(customer_id) from Customers)
)
select
n ids
from
t left join Customers c on t.n = c.customer_id
where
c.customer_id is null
例题7-2
此题是LC的SQL里面最难的几道题之一,考点涉及递归、联合查询、条件求和。
- 解答
第一个recursive表找出0到max的所有可能的cnt,第二个临时表找出所有交易次数的次数,然后两边连接即可
with
recursive b(n) as(
select 0
union all
select n+1 from b where n <
(select max(cnt) from (select user_id,transaction_date ,count(1) cnt from Transactions group by 1,2 ) t )
),
a as(
select cnt,count(1) t
from
(
select
sum(amount>0) cnt
from
(
select user_id,visit_date transaction_date,-1 amount
from Visits
union all
select * from Transactions
) t1
group by
user_id,transaction_date
) t2
group by 1
)
select
b.n transactions_count,ifnull(a.t,0) visits_count
from
b left join a on b.n = a.cnt
8 笛卡尔积
所谓笛卡尔积就是指a,b两表连接时不做任何的join操作,然a表的每一行记录都有b表的所有记录。
例题8-1
要求写一段 SQL 语句,查询出每个学生参加每一门科目测试的次数,结果按 student_id
和 subject_name
排序。
查询结构格式如下所示:
Students table:
+------------+--------------+
| student_id | student_name |
+------------+--------------+
| 1 | Alice |
| 2 | Bob |
| 13 | John |
| 6 | Alex |
+------------+--------------+
Subjects table:
+--------------+
| subject_name |
+--------------+
| Math |
| Physics |
| Programming |
+--------------+
Examinations table:
+------------+--------------+
| student_id | subject_name |
+------------+--------------+
| 1 | Math |
| 1 | Physics |
| 1 | Programming |
| 2 | Programming |
| 1 | Physics |
| 1 | Math |
| 13 | Math |
| 13 | Programming |
| 13 | Physics |
| 2 | Math |
| 1 | Math |
+------------+--------------+
Result table:
+------------+--------------+--------------+----------------+
| student_id | student_name | subject_name | attended_exams |
+------------+--------------+--------------+----------------+
| 1 | Alice | Math | 3 |
| 1 | Alice | Physics | 2 |
| 1 | Alice | Programming | 1 |
| 2 | Bob | Math | 1 |
| 2 | Bob | Physics | 0 |
| 2 | Bob | Programming | 1 |
| 6 | Alex | Math | 0 |
| 6 | Alex | Physics | 0 |
| 6 | Alex | Programming | 0 |
| 13 | John | Math | 1 |
| 13 | John | Physics | 1 |
| 13 | John | Programming | 1 |
+------------+--------------+--------------+----------------+
结果表需包含所有学生和所有科目(即便测试次数为0):
Alice 参加了 3 次数学测试, 2 次物理测试,以及 1 次编程测试;
Bob 参加了 1 次数学测试, 1 次编程测试,没有参加物理测试;
Alex 啥测试都没参加;
John 参加了数学、物理、编程测试各 1 次。
- 解答
要计算出每一位同学对应每一个科目的考试次数,显然就需要同学对科目做笛卡尔积。
select
st.student_id,st.student_name,su.subject_name,
count(e.subject_name) attended_exams
from
Students st join Subjects su left join Examinations e
on
st.student_id = e.student_id and su.subject_name = e.subject_name
group by
1,3
order by
1,3
例题8-2
编写一段 SQL 查询每个产品每年的总销售额,并包含 product_id, product_name 以及 report_year 等信息。
销售年份的日期介于 2018 年到 2020 年之间。你返回的结果需要按 product_id 和 report_year 排序。
查询结果格式如下例所示:
Product table:
+------------+--------------+
| product_id | product_name |
+------------+--------------+
| 1 | LC Phone |
| 2 | LC T-Shirt |
| 3 | LC Keychain |
+------------+--------------+
Sales table:
+------------+--------------+-------------+---------------------+
| product_id | period_start | period_end | average_daily_sales |
+------------+--------------+-------------+---------------------+
| 1 | 2019-01-25 | 2019-02-28 | 100 |
| 2 | 2018-12-01 | 2020-01-01 | 10 |
| 3 | 2019-12-01 | 2020-01-31 | 1 |
+------------+--------------+-------------+---------------------+
Result table:
+------------+--------------+-------------+--------------+
| product_id | product_name | report_year | total_amount |
+------------+--------------+-------------+--------------+
| 1 | LC Phone | 2019 | 3500 |
| 2 | LC T-Shirt | 2018 | 310 |
| 2 | LC T-Shirt | 2019 | 3650 |
| 2 | LC T-Shirt | 2020 | 10 |
| 3 | LC Keychain | 2019 | 31 |
| 3 | LC Keychain | 2020 | 31 |
+------------+--------------+-------------+--------------+
LC Phone 在 2019-01-25 至 2019-02-28 期间销售,该产品销售时间总计35天。销售总额 35*100 = 3500。
LC T-shirt 在 2018-12-01 至 2020-01-01 期间销售,该产品在2018年、2019年、2020年的销售时间分别是31天、365天、1天,2018年、2019年、2020年的销售总额分别是31*10=310、365*10=3650、1*10=10。
LC Keychain 在 2019-12-01 至 2020-01-31 期间销售,该产品在2019年、2020年的销售时间分别是:31天、31天,2019年、2020年的销售总额分别是31*1=31、31*1=31。
用联合查询和笛卡尔积的方式列出商品对应的所有可能年份,然后根据period字段和年份去做判断,找出商品存在的那几年,最后根据每一年的结束值减去每一年的起始值得到天数,再乘以商品售价即可。
- 根据年份数量进行判断,再用年份做差进行条件求和
with t as(
select
s.product_id,p.product_name,y,
date(concat(y,'-01','-01')) ys,
date(concat(y,'-12','-31')) ye,
period_start ps,
period_end pe,
average_daily_sales a,
count(1) over(partition by product_id) c
from
Product p join Sales s using(product_id),
(select '2018' y union all select '2019' y union all select '2020' y) t
where
y between year(period_start) and year(period_end)
group by
1,3
order by
1,3
)
select
product_id,product_name,y report_year,
case when c = 1 then datediff(pe,ps)*a+a
when y = left(ps,4) then datediff(ye,ps)*a+a
when y = left(pe,4) then datediff(pe,ys)*a+a
else 365 * a end as total_amount
from t
- 直接用if函数找出每一年的起始与结束,用datediff找出起始和结束相差的天数+1即可
select
s.PRODUCT_ID, PRODUCT_NAME, date_format(bound, '%Y') REPORT_YEAR,
(datediff(
if(bound < period_end, bound, period_end),
if(makedate(year(bound), 1) > period_start, makedate(year(bound), 1), period_start)
) + 1) * average_daily_sales TOTAL_AMOUNT
from product p join (
select '2018-12-31' bound
union all
select '2019-12-31' bound
union all
select '2020-12-31' bound
) bounds join sales s
on
p.product_id = s.product_id
and year(bound) between year(period_start) and year(period_end)
order by s.product_id, report_year