SQL server sum求和小数点后面的数学不见了 sql求和的列很多_SQL


0 前言

最近刷完了LeetCode上所有的SQL题目,总结了一下下,个人感觉,可以把数据分析面试过程中遇到的SQL查询部分的常见题型分成8个大的类别,这些类别的题目在LC上都能找到对应的题目进行练习。由于SQL题型不像算法题目那样有自己的tag,于是我就手动整理了一份EXCEL,上面有LC142道SQL题(截止到2021/01/02)的题型解释和难度说明。(手动整理,可能会有疏漏)。

如下图所示


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_窗口函数_02

题型整理(部分)


8个主要题型分别为:

  1. TopN问题
  2. 中位数问题
  3. 连续问题
  4. 累计(区间)计算问题
  5. 条件求和问题
  6. 行转列问题
  7. 递归问题
  8. 笛卡尔积问题

进阶题型会比普通的查询更难一点,对应的进阶技巧可以参考这篇文章:

Ocean:进击的小菜鸡之SQL进阶学习zhuanlan.zhihu.com


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_中位数_03


1 TopN

TopN问题是指,求某一字段的分组排名在前N名的记录。这里的N通常会取3/2/1不等,比如说公司不同部门的薪水前三名,不同游戏类型的最近三天的记录等等。

例题1-1


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_SQL_04


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_中位数_05


  • 解法一

利用窗口函数解决TopN问题


select 
    t.Department, t.Employee, t.Salary
from 
(
    select 
        b.Name as Department,
        a.Name as Employee,
        a.Salary, 
        dense_rank() over (partition by DepartmentId order by Salary desc) as rk 
    from 
        Employee a join Department b on a.DepartmentId = b.Id 
) t
where t.rk <= 3


  • 解法二

对于这种分组内取前几名的问题,可以先group by然后用having count()来筛选,比如这题,找每个部门的工资前三名,那么先在子查询中用Employee和自己做连接,连接条件是【部门相同但是工资比我高】,那么接下来按照having count(Salary) <= 2来筛选的原理是:如果【跟我一个部门而且工资比我高的人数】不超过2个,那么我一定是部门工资前三,这样内层查询可以查询出所有符合要求的员工ID


select 
    d.name as Department, e.Name as Employee, e.Salary
from 
    Employee e left join Department d on e.DepartmentId = d.Id 
where 
    e.id 
in 
(
    select e1.Id 
    from Employee e1 left join Employee e2
    on e1.Salary < e2.Salary and e1.DepartmentId = e2.DepartmentId
    group by e1.Id having count(distinct e2.Salary) <= 2
)
and 
    e.DepartmentId in (select Id from Department)
order by
    d.Id, Salary desc


  • 解法三

与解法二相似,不过是把自连接拆到了where子句上面,<=2改成了<3


select 
    d.Name as Department,
    e.Name as Employee,
    e.Salary
from
    Employee e join Department d on e.DepartmentId = d.Id 
where  
(
    select count(distinct e1.Salary)
    from Employee e1 
    where e.Salary < e1.Salary
    and e1.DepartmentId = e.DepartmentId
) < 3
order by d.Id, Salary desc


  • 解法四

利用变量赋值累加,可以得到连续出现的次数


select distinct Num as ConsecutiveNums
from 
(
      select 
         Num, 
         case when @prev = Num then @count := @count + 1
              when (@prev := Num) is not null then @count := 1
              end as CNT
      from 
         Logs, (select @prev := null,@count := null) as t
) temp
where temp.CNT >= 3


例题1-2


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_sql sum求和 值小于不显示_06


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_中位数_07


  • 用topN的思路将结果Union起来即可

把不同字段下top1的name和title用Union All的方式组合起来


(
    select name results
    from (select name,rank() over(order by count(1) desc,name) rk
          from Movie_Rating m join Users using(user_id) group by 1) t
    where rk = 1
)
union all
(
    select title results
    from (select title,rank() over(order by avg(rating) desc,title) rk
          from Movie_Rating m join Movies using(movie_id)
          where created_at like '2020-02%' group by 1) t
    where rk = 1
)


  • 默认分组取第一

在MySQL中分组后可以用排序的方式取分组排名第一的值


(
    select name results 
    from Movie_Rating m
    join Users u on m.user_id = u.user_id
    group by 1 order by count(1) desc,name limit 1
)
union all
(
    select title results 
    from Movie_Rating m
    join Movies mo on m.movie_id = mo.movie_id
    where created_at like '2020-02%'
    group by 1 order by avg(rating) desc,1 limit 1
)


2 中位数

中位数是大家都熟悉的概念,但是在SQL题目,查询某个字段值为该字段中位数的记录并不是很好求,需要根据中位数的定义特性进行求取,有时候也可以用辅助列帮忙。题目通常为:求数字的中位数、求薪资的中位数等等。


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_SQL_08


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_sql sum求和 值小于不显示_09


  • 窗口函数

如果一笔工资的排序,卡在公司工资出现次数的和这个次数+1之间,则说明这一笔工资是中位数


with t as(
    select
        Id,Company,Salary,
        row_number() over(partition by Company order by Salary) as rk,
        count(Salary) over(partition by Company) as cnt
    from
        Employee
)
select
    distinct Id,Company,Salary
from
    t
where
    rk between cnt/2.0 and cnt/2.0+1
order by
    Company,Salary


  • 自连接分组,子句过滤

一笔工资,如果比他大的数量大于等于总数的一半,且比他小的数量大于等于总数的一半,那么这一笔工资刚好就是工资的中位数。


select  
    min(e1.Id) id,e1.Company,e1.Salary
from 
    Employee e1 join Employee e2 using(Company)
group by 
    2,3
having 
    SUM(CASE WHEN e1.Salary >= e2.Salary THEN 1 ELSE 0 END) >= COUNT(1)/2 
and 
    SUM(CASE WHEN e1.Salary <= e2.Salary THEN 1 ELSE 0 END) >= COUNT(1)/2
order by 2,3


例题2-2


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_SQL_10


  • 解法一

利用中位数的数学特性,数字升序对其频数累计求和,数字降序对其频数累计求和,两列累计频数皆大于总频数一半的那些数字做一个平均即可。如数字1,2,3对应频数1,1,1。无论升降序频数累计皆是1,2,3,对应数字123排序为:123,321。总频数一半为1.5,满足条件的只有2,对应的数字为2,平均之后也是2。


with t as (
    select 
            number,
            (sum(frequency) over())/2 med,
            sum(frequency) over(order by number) asc_cum,
            sum(frequency) over(order by number desc) des_cum
    from 
            Numbers
)
select avg(number) as median from t where asc_cum >= med and des_cum >= med


  • 解法二
SELECT 
    AVG(NUMBER) AS MEDIAN
FROM 
    (
        SELECT 
            *, 
            SUM(FREQUENCY) OVER(ORDER BY NUMBER) AS S1, 
            SUM(FREQUENCY) OVER(ORDER BY NUMBER DESC) AS S2
        FROM 
            NUMBERS
    ) AS A
WHERE 
    S1 BETWEEN S2 - FREQUENCY AND S2 + FREQUENCY


3 连续

连续问题是指某些具有连续特性的记录,比如说连续出现的数字,连续登录的客户等等。

例题3-1


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_sql sum求和 值小于不显示_11


  • 解法一

直接用暴力解法,三表关联,找出一样的数字即可,注意要加一个distinct


select
    distinct l1.Num as ConsecutiveNums
from
    Logs l1, Logs l2, Logs l3
where
    l1.Id = l2.Id - 1
and
    l1.Id = l3.Id - 2
and
    l1.Num = l2.Num
and
    l2.Num = l3.Num


  • 解法二

考虑到对id进行的排序后的辅助列1,减去,用Num分组后再对id进行排序的辅助列2,会得到一个diff列,这个diff列具有一个性质,即,若Num对应的diff值相同且出现三次及以上,那么就说明这个Num是连续出现的。


select
    distinct Num as ConsecutiveNums
from
    (
        select Num, count(*) as n
        from
            (
                select 
                    Id,Num,
                    (row_number() over(order by Id) - row_number() over(partition by Num order by Id)) as diff
                from
                    Logs
            ) as a
        group by
            Num, diff
    ) as b
where
    n >= 3


  • 解法三

利用窗口函数求出diff后进行分组,再用having子句过滤


select distinct Num as ConsecutiveNums
from(
    select Num,Id + 1 - row_number() over(partition by Num) diff
    from Logs
) t
group by Num,diff
having count(1) >=3


例题3-2


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_中位数_12


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_SQL_13


窗口函数

min代表首次登陆,lead-min代表第二次登陆日期减去首次登陆日期,若为1则存在连续登录。


with t as(
select
    player_id,
    lead(event_date,1) over(partition by player_id order by event_date) -
    min(event_date) over(partition by player_id order by event_date) as diff
from
    Activity
)
select round((select count(1) from t where diff = 1)/(select count(distinct player_id) from Activity),2) fraction


  • 左连接

选出首次登录日期后,将原表连接登录表,条件是存在第二天登录的日期,对满足条件的date取一个平均即可


select
    round(avg(b.event_date is not null),2) fraction
from
    (
        select
            player_id,min(event_date) dt
        from
            Activity
        group by
            player_id
    ) a
left join
    Activity b
on
    a.player_id = b.player_id and a.dt = b.event_date - 1


  • 子查询

筛选出存在首次登录后第二天也登录的日期即可,注意in的用法不局限于一个字段,两个字段一样也可以用


select
    round(
        count(distinct player_id)/
        (select count(distinct player_id) from Activity)
        ,2) fraction
from
    Activity
where
    (player_id,event_date) 
in
    (
        select
            player_id,date(min(event_date)+1)
        from
            Activity
        group by
            player_id
    )


例题3-3

PS:本题难度较大


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_中位数_14


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_sql sum求和 值小于不显示_15


  • 解法一

将不同状态的记录做联合查询,再利用subdate将不同状态的日期和序号做差,用差进行分类即可。


select type as period_state, min(date) as start_date, max(date) as end_date
from
(
    select type, date, subdate(date,row_number()over(partition by type order by date)) as diff
    from
    (
        select 'failed' as type, fail_date as date from Failed
        union all
        select 'succeeded' as type, success_date as date from Succeeded
    ) a
)a
where date between '2019-01-01' and '2019-12-31'
group by type,diff
order by start_date

作者:nei-zang-mei-liao
链接:https://leetcode-cn.com/problems/report-contiguous-dates/solution/meng-xin-jie-fa-yong-chuang-kou-han-shu-he-subdate/
来源:力扣(LeetCode)


  • 解法二

除了利用时间作差,也可以用两个窗口函数,一个计算全部记录的排序,一个记录分组后的排序,两个排序作差


with t1 as(
    select success_date d,'succeeded' p from Succeeded where year(success_date)=2019
    union all
    select fail_date d,'failed' p from Failed where year(fail_date)=2019
    order by 1
)
select p period_state,min(d) start_date, max(d) end_date
from
    (    
        select
            *,
            row_number() over() rk1,
            rank() over(partition by p order by d) rk2
        from
            t1
        order by
            d
    ) t 
group by 1,rk1-rk2


4 累计(区间)计算

所谓累计计算或者是区间计算,就是指在特定的区间内进行求和。常见的累计区间有:从过去到现在,或者是最近的几天内。

例题4-1


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_sql sum求和 值小于不显示_16


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_中位数_17


  • 解法一

通过带条件的自连接,将每一行都进行分组求和,其结果就是累计求和


select 
    a.player_id,a.event_date,sum(b.games_played) games_played_so_far
from
    activity a join activity b on a.player_id=b.player_id and a.event_date>=b.event_date
group by 
    1,2
order by 
    1,2


  • 解法二

要实现累计求和,其实窗口函数就能直接达到这个效果


select
    player_id,event_date,sum(games_played) 
    over(partition by player_id order by event_date) games_played_so_far 
from
    Activity
group by
    1,2


例题4-2


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_sql sum求和 值小于不显示_18


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_sql sum求和 值小于不显示_19


  • 解法一

用join on date diff的方式,可以将每一条记录对应的前6条记录找出,然后再计算平均值


select
    a.visited_on,
    sum(b.amount) as amount,
    round(sum(b.amount)/7,2) as average_amount
from
    (
        select distinct visited_on from Customer 
        where visited_on >= (select date(min(visited_on)+6) from Customer)
    ) a
left join
    Customer b
on
    datediff(a.visited_on,b.visited_on) between 0 and 6
group by
    a.visited_on
order by
    a.visited_on


  • 解法二

用窗口函数的frame子句,可以直接指定要计算的范围


select * from
(
    select
        visited_on,
        sum(amt) over(rows between 6 preceding and current row) amount,
        round(avg(amt) over(rows between 6 preceding and current row),2) average_amount
    from
        (select visited_on,sum(amount) amt from Customer group by visited_on) t1
) t2
where visited_on >= (select date(min(visited_on)+6) from Customer)


5 条件求和

条件求和就是说,计算和的时候需要满足特定的条件才能相加。形式常为sum if,或者是sum case when。

例题5-1


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_SQL_20


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_窗口函数_21


解法一

容易想到用sum中带一个if条件,将股票不同的操作状态产生的和进行相减.


select
    stock_name,
    sum(if(operation='Sell',price,0)) - 
    sum(if(operation='Buy',price,0)) capital_gain_loss
from
    Stocks
group by
    1


  • 解法二

对解法一进行优化


select
    stock_name,
    sum(if(operation='Buy',-price,price)) capital_gain_loss
from
    Stocks
group by
    1


例题5-2


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_中位数_22


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_窗口函数_23


  • 解法一

利用groupby找出每个用户的最小日期即为登录日期,然后用datediff找出最近90天的登录用户即可


select
    login_date,count(1) user_count
from
    (
        select user_id,min(activity_date) login_date from Traffic 
        where activity = 'login' group by user_id
    ) t
where 
    datediff('2019-06-30',login_date) <= 90
group by
    login_date
order by
    login_date


  • 解法二

窗口函数求出每个用户的首次登录日期然后做筛选,再求登录用户数量


with t as(
    select
        user_id,activity_date,
        rank() over(partition by user_id order by activity_date) rk
    from
        Traffic
    where
        activity = 'login'
)
select 
    activity_date login_date,
    count(distinct user_id) as user_count
from 
    t 
where 
    rk = 1 and datediff('2019-06-30',activity_date) <= 90
group by 
    1
order by 
    1


例题5-3

此题巨难,涉及的知识点包括条件求和,连续和递归。


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_中位数_24


Drivers table:
+-----------+------------+
| driver_id | join_date  |
+-----------+------------+
| 10        | 2019-12-10 |
| 8         | 2020-1-13  |
| 5         | 2020-2-16  |
| 7         | 2020-3-8   |
| 4         | 2020-5-17  |
| 1         | 2020-10-24 |
| 6         | 2021-1-5   |
+-----------+------------+

Rides table:
+---------+---------+--------------+
| ride_id | user_id | requested_at |
+---------+---------+--------------+
| 6       | 75      | 2019-12-9    |
| 1       | 54      | 2020-2-9     |
| 10      | 63      | 2020-3-4     |
| 19      | 39      | 2020-4-6     |
| 3       | 41      | 2020-6-3     |
| 13      | 52      | 2020-6-22    |
| 7       | 69      | 2020-7-16    |
| 17      | 70      | 2020-8-25    |
| 20      | 81      | 2020-11-2    |
| 5       | 57      | 2020-11-9    |
| 2       | 42      | 2020-12-9    |
| 11      | 68      | 2021-1-11    |
| 15      | 32      | 2021-1-17    |
| 12      | 11      | 2021-1-19    |
| 14      | 18      | 2021-1-27    |
+---------+---------+--------------+

AcceptedRides table:
+---------+-----------+---------------+---------------+
| ride_id | driver_id | ride_distance | ride_duration |
+---------+-----------+---------------+---------------+
| 10      | 10        | 63            | 38            |
| 13      | 10        | 73            | 96            |
| 7       | 8         | 100           | 28            |
| 17      | 7         | 119           | 68            |
| 20      | 1         | 121           | 92            |
| 5       | 7         | 42            | 101           |
| 2       | 4         | 6             | 38            |
| 11      | 8         | 37            | 43            |
| 15      | 8         | 108           | 82            |
| 12      | 8         | 38            | 34            |
| 14      | 1         | 90            | 74            |
+---------+-----------+---------------+---------------+

Result table:
+-------+----------------+----------------+
| month | active_drivers | accepted_rides |
+-------+----------------+----------------+
| 1     | 2              | 0              |
| 2     | 3              | 0              |
| 3     | 4              | 1              |
| 4     | 4              | 0              |
| 5     | 5              | 0              |
| 6     | 5              | 1              |
| 7     | 5              | 1              |
| 8     | 5              | 1              |
| 9     | 5              | 0              |
| 10    | 6              | 0              |
| 11    | 6              | 2              |
| 12    | 6              | 1              |
+-------+----------------+----------------+

By the end of January --> two active drivers (10, 8) and no accepted rides.
By the end of February --> three active drivers (10, 8, 5) and no accepted rides.
By the end of March --> four active drivers (10, 8, 5, 7) and one accepted ride (10).
By the end of April --> four active drivers (10, 8, 5, 7) and no accepted rides.
By the end of May --> five active drivers (10, 8, 5, 7, 4) and no accepted rides.
By the end of June --> five active drivers (10, 8, 5, 7, 4) and one accepted ride (13).
By the end of July --> five active drivers (10, 8, 5, 7, 4) and one accepted ride (7).
By the end of August --> five active drivers (10, 8, 5, 7, 4) and one accepted ride (17).
By the end of Septemeber --> five active drivers (10, 8, 5, 7, 4) and no accepted rides.
By the end of October --> six active drivers (10, 8, 5, 7, 4, 1) and no accepted rides.
By the end of November --> six active drivers (10, 8, 5, 7, 4, 1) and two accepted rides (20, 5).
By the end of December --> six active drivers (10, 8, 5, 7, 4, 1) and one accepted ride (2).


  • 解答

recursive是递归表,生成每个月。

t1是活跃司机表,找出2020年每个月累积的活跃司机再加上19年的司机数量即是题目所需的司机数量。

t2是找出每个月的所有接受的行程。 最后合并两个表就行~


with 
recursive t(m) as(
    select 1 m 
    union all 
    select m + 1 from t where m < 12
),
t1 as(
    select 
        t.m month,
        sum(if(cnt is null,0,cnt)) over(order by t.m) + 
        (select count(1) from Drivers where year(join_date)<2020) active_drivers
    from 
        t 
    left join 
        (select month(join_date) m,count(distinct driver_id) cnt 
         from Drivers where year(join_date)=2020 group by 1) tmp
    on 
        t.m = tmp.m
),
t2 as(
    select 
        m,count(a.ride_id) accepted_rides 
    from 
        t 
    left join 
        Rides r 
    on 
        t.m = month(requested_at) and year(requested_at) = 2020
    left join 
        AcceptedRides a on a.ride_id = r.ride_id 
    group by
        1
)
select month,active_drivers,accepted_rides from t1 join t2 on t1.month = t2.m


  • 解法二

2020年的司机数量均加上2019的数量太麻烦了,可以直接把2019年的司机数量归类到2020年的1月去


with recursive mon_time as 
(
    select 1 as month
    union all 
    select month + 1 from mon_time where month <= 11
)
select  m.month,
        sum(ifnull(d.num_driver,0)) over(order by m.month) as active_drivers,
        ifnull(num_rides,0) as accepted_rides
from mon_time m 
left join 
(
    select if(year(join_date)<'2020',1,month(join_date)) as month_driver,
    count(driver_id) as num_driver
    from drivers
    where year(join_date)<='2020'
    group by if(year(join_date)<'2020',1,month(join_date))
)d  on m.month = month_driver
left join
(
    select month(requested_at) as month_rides,
    count(ride_id) as num_rides
    from rides 
    where ride_id in (select ride_id from acceptedrides)
    and year(requested_at) = '2020'
    group by month(requested_at)
)r on m.month = month_rides
group by m.month
order by m.month


6 行转列

行转列就是指把符合某一个条件的值另起一个字段来输出。

例题6-1


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_中位数_25


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_窗口函数_26


  • 解答

只需要把每个id在符合月份的条件下输出即可。


select 
    id,
    max(if(month='Jan' ,revenue,null)) as 'Jan_Revenue',
    max(if(month='Feb' ,revenue,null)) as 'Feb_Revenue',
    max(if(month='Mar' ,revenue,null)) as 'Mar_Revenue',
    max(if(month='Apr' ,revenue,null)) as 'Apr_Revenue',
    max(if(month='May' ,revenue,null)) as 'May_Revenue',
    max(if(month='Jun' ,revenue,null)) as 'Jun_Revenue',
    max(if(month='Jul' ,revenue,null)) as 'Jul_Revenue',
    max(if(month='Aug' ,revenue,null)) as 'Aug_Revenue',
    max(if(month='Sep' ,revenue,null)) as 'Sep_Revenue',
    max(if(month='Oct' ,revenue,null)) as 'Oct_Revenue',
    max(if(month='Nov' ,revenue,null)) as 'Nov_Revenue',
    max(if(month='Dec' ,revenue,null)) as 'Dec_Revenue'
from 
    Department
group by 
    id


用case when也可以达到一样的效果。

例题6-2


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_中位数_27


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_SQL_28


  • 解答

先进行表连接,把每个item的所属种类找出,然后根据种类进行group by,基于逻辑函数if判断出周内每一天的销量,再求和即可。


select 
    item_category as CATEGORY,
    sum(if(w=0,quantity,0)) as MONDAY,
    sum(if(w=1,quantity,0)) as TUESDAY,
    sum(if(w=2,quantity,0)) as WEDNESDAY,
    sum(if(w=3,quantity,0)) as THURSDAY,
    sum(if(w=4,quantity,0)) as FRIDAY,
    sum(if(w=5,quantity,0)) as SATURDAY,
    sum(if(w=6,quantity,0)) as SUNDAY
from
    (select item_category,quantity,weekday(order_date) w 
     from Items i left join Orders o on i.item_id = o.item_id) t
group by
    1
order by
    1


7 递归

递归常用于补齐某个字段所有会出现的可能性,如1,3,5可以用递归补齐2和4。

例题7-1


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_中位数_29


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_中位数_30


  • 解答

在已经145的情况下要求出2和3,可以用递归的方式生成1-5然后再用带join和条件筛选即可找出2和3.


with recursive t(n) as(
    select 1
    union all
    select 1+n from t where n < (select max(customer_id) from Customers)
)
select
    n ids
from
    t left join Customers c on t.n = c.customer_id
where
    c.customer_id is null


例题7-2

此题是LC的SQL里面最难的几道题之一,考点涉及递归、联合查询、条件求和。


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_sql sum求和 值小于不显示_31


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_sql sum求和 值小于不显示_32


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_窗口函数_33


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_窗口函数_34


  • 解答

第一个recursive表找出0到max的所有可能的cnt,第二个临时表找出所有交易次数的次数,然后两边连接即可


with
recursive b(n) as(
    select 0
    union all
    select n+1 from b where n < 
    (select max(cnt) from (select user_id,transaction_date ,count(1) cnt from Transactions group by 1,2 ) t )
),
a as(
    select cnt,count(1) t 
    from
        (
            select
                sum(amount>0) cnt
            from
                (
                    select user_id,visit_date transaction_date,-1 amount 
                    from Visits
                    union all
                    select * from Transactions
                ) t1
            group by
                user_id,transaction_date
        ) t2
    group by 1
)
select
    b.n transactions_count,ifnull(a.t,0) visits_count
from
    b left join a on b.n = a.cnt


8 笛卡尔积

所谓笛卡尔积就是指a,b两表连接时不做任何的join操作,然a表的每一行记录都有b表的所有记录。

例题8-1

要求写一段 SQL 语句,查询出每个学生参加每一门科目测试的次数,结果按 student_idsubject_name 排序。

查询结构格式如下所示:


Students table:
+------------+--------------+
| student_id | student_name |
+------------+--------------+
| 1          | Alice        |
| 2          | Bob          |
| 13         | John         |
| 6          | Alex         |
+------------+--------------+
Subjects table:
+--------------+
| subject_name |
+--------------+
| Math         |
| Physics      |
| Programming  |
+--------------+
Examinations table:
+------------+--------------+
| student_id | subject_name |
+------------+--------------+
| 1          | Math         |
| 1          | Physics      |
| 1          | Programming  |
| 2          | Programming  |
| 1          | Physics      |
| 1          | Math         |
| 13         | Math         |
| 13         | Programming  |
| 13         | Physics      |
| 2          | Math         |
| 1          | Math         |
+------------+--------------+
Result table:
+------------+--------------+--------------+----------------+
| student_id | student_name | subject_name | attended_exams |
+------------+--------------+--------------+----------------+
| 1          | Alice        | Math         | 3              |
| 1          | Alice        | Physics      | 2              |
| 1          | Alice        | Programming  | 1              |
| 2          | Bob          | Math         | 1              |
| 2          | Bob          | Physics      | 0              |
| 2          | Bob          | Programming  | 1              |
| 6          | Alex         | Math         | 0              |
| 6          | Alex         | Physics      | 0              |
| 6          | Alex         | Programming  | 0              |
| 13         | John         | Math         | 1              |
| 13         | John         | Physics      | 1              |
| 13         | John         | Programming  | 1              |
+------------+--------------+--------------+----------------+
结果表需包含所有学生和所有科目(即便测试次数为0):
Alice 参加了 3 次数学测试, 2 次物理测试,以及 1 次编程测试;
Bob 参加了 1 次数学测试, 1 次编程测试,没有参加物理测试;
Alex 啥测试都没参加;
John  参加了数学、物理、编程测试各 1 次。


  • 解答

要计算出每一位同学对应每一个科目的考试次数,显然就需要同学对科目做笛卡尔积。


select
    st.student_id,st.student_name,su.subject_name,
    count(e.subject_name) attended_exams
from
    Students st join Subjects su left join Examinations e
on
    st.student_id = e.student_id and su.subject_name = e.subject_name
group by
    1,3
order by
    1,3


例题8-2


SQL server sum求和小数点后面的数学不见了 sql求和的列很多_SQL_35


编写一段 SQL 查询每个产品每年的总销售额,并包含 product_id, product_name 以及 report_year 等信息。

销售年份的日期介于 2018 年到 2020 年之间。你返回的结果需要按 product_id 和 report_year 排序。

查询结果格式如下例所示:


Product table:
+------------+--------------+
| product_id | product_name |
+------------+--------------+
| 1          | LC Phone     |
| 2          | LC T-Shirt   |
| 3          | LC Keychain  |
+------------+--------------+

Sales table:
+------------+--------------+-------------+---------------------+
| product_id | period_start | period_end  | average_daily_sales |
+------------+--------------+-------------+---------------------+
| 1          | 2019-01-25   | 2019-02-28  | 100                 |
| 2          | 2018-12-01   | 2020-01-01  | 10                  |
| 3          | 2019-12-01   | 2020-01-31  | 1                   |
+------------+--------------+-------------+---------------------+

Result table:
+------------+--------------+-------------+--------------+
| product_id | product_name | report_year | total_amount |
+------------+--------------+-------------+--------------+
| 1          | LC Phone     |    2019     | 3500         |
| 2          | LC T-Shirt   |    2018     | 310          |
| 2          | LC T-Shirt   |    2019     | 3650         |
| 2          | LC T-Shirt   |    2020     | 10           |
| 3          | LC Keychain  |    2019     | 31           |
| 3          | LC Keychain  |    2020     | 31           |
+------------+--------------+-------------+--------------+
LC Phone 在 2019-01-25 至 2019-02-28 期间销售,该产品销售时间总计35天。销售总额 35*100 = 3500。
LC T-shirt 在 2018-12-01 至 2020-01-01 期间销售,该产品在2018年、2019年、2020年的销售时间分别是31天、365天、1天,2018年、2019年、2020年的销售总额分别是31*10=310、365*10=3650、1*10=10。
LC Keychain 在 2019-12-01 至 2020-01-31 期间销售,该产品在2019年、2020年的销售时间分别是:31天、31天,2019年、2020年的销售总额分别是31*1=31、31*1=31。


用联合查询和笛卡尔积的方式列出商品对应的所有可能年份,然后根据period字段和年份去做判断,找出商品存在的那几年,最后根据每一年的结束值减去每一年的起始值得到天数,再乘以商品售价即可。

  • 根据年份数量进行判断,再用年份做差进行条件求和
with t as(
    select 
        s.product_id,p.product_name,y,
        date(concat(y,'-01','-01')) ys,
        date(concat(y,'-12','-31')) ye,
        period_start ps,
        period_end pe, 
        average_daily_sales a,
        count(1) over(partition by product_id) c
    from 
        Product p join Sales s using(product_id),
        (select '2018' y union all select '2019' y union all select '2020' y) t
    where 
        y between year(period_start) and year(period_end)
    group by 
        1,3  
    order by
        1,3 
)
select
    product_id,product_name,y report_year,
    case when c = 1 then datediff(pe,ps)*a+a
         when y = left(ps,4) then datediff(ye,ps)*a+a
         when y = left(pe,4) then datediff(pe,ys)*a+a
         else 365 * a end as total_amount
from t


  • 直接用if函数找出每一年的起始与结束,用datediff找出起始和结束相差的天数+1即可
select 
    s.PRODUCT_ID, PRODUCT_NAME, date_format(bound, '%Y') REPORT_YEAR,
    (datediff(
        if(bound < period_end, bound, period_end), 
        if(makedate(year(bound), 1) > period_start, makedate(year(bound), 1), period_start)
    ) + 1) * average_daily_sales TOTAL_AMOUNT
from product p join (
    select '2018-12-31' bound
    union all
    select '2019-12-31' bound
    union all
    select '2020-12-31' bound
) bounds join sales s
on 
    p.product_id = s.product_id 
    and year(bound) between year(period_start) and year(period_end)
order by s.product_id, report_year