MySQL8.0开始像Oracle一样支持窗口函数了,先来跑个SQL,感受一下窗口函数的魅力。
mysql> create table numbers(val int); mysql> insert into numbers values(1),(1),(2),(3),(3),(3),(4),(4),(5); mysql> SELECT -> val, -> ROW_NUMBER() OVER w AS 'row_number', -> RANK() OVER w AS 'rank', -> DENSE_RANK() OVER w AS 'dense_rank', -> CUME_DIST() OVER w AS 'cume_dist', -> PERCENT_RANK() OVER w AS 'percent_rank' -> FROM numbers -> WINDOW w AS (ORDER BY val); +------+------------+------+------------+--------------------+--------------+ | val | row_number | rank | dense_rank | cume_dist | percent_rank | +------+------------+------+------------+--------------------+--------------+ | 1 | 1 | 1 | 1 | 0.2222222222222222 | 0 | | 1 | 2 | 1 | 1 | 0.2222222222222222 | 0 | | 2 | 3 | 3 | 2 | 0.3333333333333333 | 0.25 | | 3 | 4 | 4 | 3 | 0.6666666666666666 | 0.375 | | 3 | 5 | 4 | 3 | 0.6666666666666666 | 0.375 | | 3 | 6 | 4 | 3 | 0.6666666666666666 | 0.375 | | 4 | 7 | 7 | 4 | 0.8888888888888888 | 0.75 | | 4 | 8 | 7 | 4 | 0.8888888888888888 | 0.75 | | 5 | 9 | 9 | 5 | 1 | 1 | +------+------------+------+------------+--------------------+--------------+ 9 rows in set (0.00 sec)
有了窗口函数,以前一些复制的SQL,可以很简单的实现了,下面分别介绍几种常用的窗口函数。
row_number()
用来求满足特定分组的唯一行数据,它跟group by不同,group by不能返回重复的分组。
例:求每个部门dept_no员工号emp_no最大的员工数据
(1)方法1:使用窗口函数
第一步: mysql> select t.*,row_number() over(partition by t.dept_no order by t.emp_no desc) rn from t_group t; +--------+---------+------------+------------+----+ | emp_no | dept_no | from_date | to_date | rn | +--------+---------+------------+------------+----+ | 31112 | d002 | 1986-12-01 | 1993-12-10 | 1 | | 10004 | d004 | 1986-12-01 | 9999-01-01 | 1 | | 50449 | d005 | 1986-12-01 | 9999-01-01 | 1 | | 40983 | d005 | 1986-12-01 | 9999-01-01 | 2 | | 30970 | d005 | 1986-12-01 | 2017-03-29 | 3 | | 24007 | d005 | 1986-12-01 | 9999-01-01 | 4 | | 22744 | d006 | 1986-12-01 | 9999-01-01 | 1 | | 49667 | d007 | 1986-12-01 | 9999-01-01 | 1 | | 48317 | d008 | 1986-12-01 | 1989-01-11 | 1 | | 46554 | d008 | 1986-12-01 | 1992-05-27 | 2 | +--------+---------+------------+------------+----+ 10 rows in set (0.01 sec) 第二步: mysql> select a.* from (select t.*,row_number() over(partition by t.dept_no order by t.emp_no desc) rn from t_group t)a where a.rn=1; +--------+---------+------------+------------+----+ | emp_no | dept_no | from_date | to_date | rn | +--------+---------+------------+------------+----+ | 31112 | d002 | 1986-12-01 | 1993-12-10 | 1 | | 10004 | d004 | 1986-12-01 | 9999-01-01 | 1 | | 50449 | d005 | 1986-12-01 | 9999-01-01 | 1 | | 22744 | d006 | 1986-12-01 | 9999-01-01 | 1 | | 49667 | d007 | 1986-12-01 | 9999-01-01 | 1 | | 48317 | d008 | 1986-12-01 | 1989-01-11 | 1 | +--------+---------+------------+------------+----+ 6 rows in set (0.00 sec)
(2)方法2:使用传统变量@方法
第一步: mysql> select t.*, if(@dept_no=t.dept_no, @rn:=@rn+1, @rn:=1)as rn, @dept_no:=t.dept_no as calc_dept_no from (select * from t_group t order by t.dept_no,t.emp_no desc)t, (select @rn:=0 rn, @dept_no:='')b; +--------+---------+------------+------------+------+--------------+ | emp_no | dept_no | from_date | to_date | rn | calc_dept_no | +--------+---------+------------+------------+------+--------------+ | 31112 | d002 | 1986-12-01 | 1993-12-10 | 1 | d002 | | 10004 | d004 | 1986-12-01 | 9999-01-01 | 1 | d004 | | 50449 | d005 | 1986-12-01 | 9999-01-01 | 1 | d005 | | 40983 | d005 | 1986-12-01 | 9999-01-01 | 2 | d005 | | 30970 | d005 | 1986-12-01 | 2017-03-29 | 3 | d005 | | 24007 | d005 | 1986-12-01 | 9999-01-01 | 4 | d005 | | 22744 | d006 | 1986-12-01 | 9999-01-01 | 1 | d006 | | 49667 | d007 | 1986-12-01 | 9999-01-01 | 1 | d007 | | 48317 | d008 | 1986-12-01 | 1989-01-11 | 1 | d008 | | 46554 | d008 | 1986-12-01 | 1992-05-27 | 2 | d008 | +--------+---------+------------+------------+------+--------------+ 10 rows in set, 5 warnings (0.00 sec) 第二步: mysql> select * from ( -> select t.*, if(@dept_no=t.dept_no, @rn:=@rn+1, @rn:=1)as rn, @dept_no:=t.dept_no as calc_dept_no from (select * from t_group t order by t.dept_no,t.emp_no desc)t, (select @rn:=0 rn, @dept_no:='')b)c where c.rn=1; +--------+---------+------------+------------+------+--------------+ | emp_no | dept_no | from_date | to_date | rn | calc_dept_no | +--------+---------+------------+------------+------+--------------+ | 31112 | d002 | 1986-12-01 | 1993-12-10 | 1 | d002 | | 10004 | d004 | 1986-12-01 | 9999-01-01 | 1 | d004 | | 50449 | d005 | 1986-12-01 | 9999-01-01 | 1 | d005 | | 22744 | d006 | 1986-12-01 | 9999-01-01 | 1 | d006 | | 49667 | d007 | 1986-12-01 | 9999-01-01 | 1 | d007 | | 48317 | d008 | 1986-12-01 | 1989-01-11 | 1 | d008 | +--------+---------+------------+------------+------+--------------+ 6 rows in set, 5 warnings (0.00 sec)
rank()
同一个分组中,如果排序的列值相同,编号相同;
列值不同,编号跳跃增加,如两个并列第1,那么下一个名次的编号就是3;
如下按照dept_no分组,dept_no='d005'列对应to_date='9999-01-01'有三个1,那么to_date='2017-03-29'跳跃增加的编号就是4;
mysql> select t.*, rank() over(partition by t.dept_no order by t.to_date desc) rn from t_group t; +--------+---------+------------+------------+----+ | emp_no | dept_no | from_date | to_date | rn | +--------+---------+------------+------------+----+ | 31112 | d002 | 1986-12-01 | 1993-12-10 | 1 | | 10004 | d004 | 1986-12-01 | 9999-01-01 | 1 | | 24007 | d005 | 1986-12-01 | 9999-01-01 | 1 | | 40983 | d005 | 1986-12-01 | 9999-01-01 | 1 | | 50449 | d005 | 1986-12-01 | 9999-01-01 | 1 | | 30970 | d005 | 1986-12-01 | 2017-03-29 | 4 | | 22744 | d006 | 1986-12-01 | 9999-01-01 | 1 | | 49667 | d007 | 1986-12-01 | 9999-01-01 | 1 | | 46554 | d008 | 1986-12-01 | 1992-05-27 | 1 | | 48317 | d008 | 1986-12-01 | 1989-01-11 | 2 | +--------+---------+------------+------------+----+ 10 rows in set (0.00 sec)
dense_rank()
同一个分组中,如果排序的列值相同,编号相同;
列值不同,编号顺序增加,如两个并列第1,那么下一个名次的编号就是2;
mysql> select t.*, dense_rank() over(partition by t.dept_no order by t.to_date desc) rn from t_group t; +--------+---------+------------+------------+----+ | emp_no | dept_no | from_date | to_date | rn | +--------+---------+------------+------------+----+ | 31112 | d002 | 1986-12-01 | 1993-12-10 | 1 | | 10004 | d004 | 1986-12-01 | 9999-01-01 | 1 | | 24007 | d005 | 1986-12-01 | 9999-01-01 | 1 | | 40983 | d005 | 1986-12-01 | 9999-01-01 | 1 | | 50449 | d005 | 1986-12-01 | 9999-01-01 | 1 | | 30970 | d005 | 1986-12-01 | 2017-03-29 | 2 | | 22744 | d006 | 1986-12-01 | 9999-01-01 | 1 | | 49667 | d007 | 1986-12-01 | 9999-01-01 | 1 | | 46554 | d008 | 1986-12-01 | 1992-05-27 | 1 | | 48317 | d008 | 1986-12-01 | 1989-01-11 | 2 | +--------+---------+------------+------------+----+ 10 rows in set (0.00 sec)
lag(expr,offset,default) over() 返回当前行列对应的上一行数据
lead(expr,offset,default) over() 返回当前行列对应的下一行数据
mysql> select s.*, lag(s.salary,1,null) over(partition by s.emp_no order by s.to_date asc) rn from salaries s where s.emp_no=10001; +--------+--------+------------+------------+-------+ | emp_no | salary | from_date | to_date | rn | +--------+--------+------------+------------+-------+ | 10001 | 60117 | 1986-06-26 | 1987-06-26 | NULL | | 10001 | 62102 | 1987-06-26 | 1988-06-25 | 60117 | | 10001 | 66074 | 1988-06-25 | 1989-06-25 | 62102 | | 10001 | 66596 | 1989-06-25 | 1990-06-25 | 66074 | | 10001 | 66961 | 1990-06-25 | 1991-06-25 | 66596 | | 10001 | 71046 | 1991-06-25 | 1992-06-24 | 66961 | | 10001 | 74333 | 1992-06-24 | 1993-06-24 | 71046 | | 10001 | 75286 | 1993-06-24 | 1994-06-24 | 74333 | | 10001 | 75994 | 1994-06-24 | 1995-06-24 | 75286 | | 10001 | 76884 | 1995-06-24 | 1996-06-23 | 75994 | | 10001 | 80013 | 1996-06-23 | 1997-06-23 | 76884 | | 10001 | 81025 | 1997-06-23 | 1998-06-23 | 80013 | | 10001 | 81097 | 1998-06-23 | 1999-06-23 | 81025 | | 10001 | 84917 | 1999-06-23 | 2000-06-22 | 81097 | | 10001 | 85112 | 2000-06-22 | 2001-06-22 | 84917 | | 10001 | 85097 | 2001-06-22 | 2002-06-22 | 85112 | | 10001 | 88958 | 2002-06-22 | 9999-01-01 | 85097 | +--------+--------+------------+------------+-------+ 17 rows in set (0.00 sec) mysql> select s.*, lead(s.salary,1,0) over(partition by s.emp_no order by s.to_date asc) rn from salaries s where s.emp_no=10001; +--------+--------+------------+------------+-------+ | emp_no | salary | from_date | to_date | rn | +--------+--------+------------+------------+-------+ | 10001 | 60117 | 1986-06-26 | 1987-06-26 | 62102 | | 10001 | 62102 | 1987-06-26 | 1988-06-25 | 66074 | | 10001 | 66074 | 1988-06-25 | 1989-06-25 | 66596 | | 10001 | 66596 | 1989-06-25 | 1990-06-25 | 66961 | | 10001 | 66961 | 1990-06-25 | 1991-06-25 | 71046 | | 10001 | 71046 | 1991-06-25 | 1992-06-24 | 74333 | | 10001 | 74333 | 1992-06-24 | 1993-06-24 | 75286 | | 10001 | 75286 | 1993-06-24 | 1994-06-24 | 75994 | | 10001 | 75994 | 1994-06-24 | 1995-06-24 | 76884 | | 10001 | 76884 | 1995-06-24 | 1996-06-23 | 80013 | | 10001 | 80013 | 1996-06-23 | 1997-06-23 | 81025 | | 10001 | 81025 | 1997-06-23 | 1998-06-23 | 81097 | | 10001 | 81097 | 1998-06-23 | 1999-06-23 | 84917 | | 10001 | 84917 | 1999-06-23 | 2000-06-22 | 85112 | | 10001 | 85112 | 2000-06-22 | 2001-06-22 | 85097 | | 10001 | 85097 | 2001-06-22 | 2002-06-22 | 88958 | | 10001 | 88958 | 2002-06-22 | 9999-01-01 | 0 | +--------+--------+------------+------------+-------+ 17 rows in set (0.00 sec)
sum()、min()、max()增强
rows between unbounded preceding and unbounded following 从开始到最后一行
rows between unbounded preceding and current row 从开始到目前行
mysql> select s.*, -> sum(s.salary) over(partition by s.emp_no order by s.to_date asc rows between unbounded preceding and unbounded following) s1, -> sum(s.salary) over(partition by s.emp_no order by s.to_date asc rows between unbounded preceding and current row) s2 -> from salaries s where s.emp_no=10001; +--------+--------+------------+------------+---------+---------+ | emp_no | salary | from_date | to_date | s1 | s2 | +--------+--------+------------+------------+---------+---------+ | 10001 | 60117 | 1986-06-26 | 1987-06-26 | 1281612 | 60117 | | 10001 | 62102 | 1987-06-26 | 1988-06-25 | 1281612 | 122219 | | 10001 | 66074 | 1988-06-25 | 1989-06-25 | 1281612 | 188293 | | 10001 | 66596 | 1989-06-25 | 1990-06-25 | 1281612 | 254889 | | 10001 | 66961 | 1990-06-25 | 1991-06-25 | 1281612 | 321850 | | 10001 | 71046 | 1991-06-25 | 1992-06-24 | 1281612 | 392896 | | 10001 | 74333 | 1992-06-24 | 1993-06-24 | 1281612 | 467229 | | 10001 | 75286 | 1993-06-24 | 1994-06-24 | 1281612 | 542515 | | 10001 | 75994 | 1994-06-24 | 1995-06-24 | 1281612 | 618509 | | 10001 | 76884 | 1995-06-24 | 1996-06-23 | 1281612 | 695393 | | 10001 | 80013 | 1996-06-23 | 1997-06-23 | 1281612 | 775406 | | 10001 | 81025 | 1997-06-23 | 1998-06-23 | 1281612 | 856431 | | 10001 | 81097 | 1998-06-23 | 1999-06-23 | 1281612 | 937528 | | 10001 | 84917 | 1999-06-23 | 2000-06-22 | 1281612 | 1022445 | | 10001 | 85112 | 2000-06-22 | 2001-06-22 | 1281612 | 1107557 | | 10001 | 85097 | 2001-06-22 | 2002-06-22 | 1281612 | 1192654 | | 10001 | 88958 | 2002-06-22 | 9999-01-01 | 1281612 | 1281612 | +--------+--------+------------+------------+---------+---------+ 17 rows in set (0.00 sec) mysql> select s.*, -> min(s.salary) over(partition by s.emp_no order by s.to_date desc rows between unbounded preceding and unbounded following) m1, -> min(s.salary) over(partition by s.emp_no order by s.to_date desc rows between unbounded preceding and current row) m2, -> min(s.salary) over(partition by s.emp_no) m3 -> from salaries s where s.emp_no=10001; +--------+--------+------------+------------+-------+-------+-------+ | emp_no | salary | from_date | to_date | m1 | m2 | m3 | +--------+--------+------------+------------+-------+-------+-------+ | 10001 | 88958 | 2002-06-22 | 9999-01-01 | 60117 | 88958 | 60117 | | 10001 | 85097 | 2001-06-22 | 2002-06-22 | 60117 | 85097 | 60117 | | 10001 | 85112 | 2000-06-22 | 2001-06-22 | 60117 | 85097 | 60117 | | 10001 | 84917 | 1999-06-23 | 2000-06-22 | 60117 | 84917 | 60117 | | 10001 | 81097 | 1998-06-23 | 1999-06-23 | 60117 | 81097 | 60117 | | 10001 | 81025 | 1997-06-23 | 1998-06-23 | 60117 | 81025 | 60117 | | 10001 | 80013 | 1996-06-23 | 1997-06-23 | 60117 | 80013 | 60117 | | 10001 | 76884 | 1995-06-24 | 1996-06-23 | 60117 | 76884 | 60117 | | 10001 | 75994 | 1994-06-24 | 1995-06-24 | 60117 | 75994 | 60117 | | 10001 | 75286 | 1993-06-24 | 1994-06-24 | 60117 | 75286 | 60117 | | 10001 | 74333 | 1992-06-24 | 1993-06-24 | 60117 | 74333 | 60117 | | 10001 | 71046 | 1991-06-25 | 1992-06-24 | 60117 | 71046 | 60117 | | 10001 | 66961 | 1990-06-25 | 1991-06-25 | 60117 | 66961 | 60117 | | 10001 | 66596 | 1989-06-25 | 1990-06-25 | 60117 | 66596 | 60117 | | 10001 | 66074 | 1988-06-25 | 1989-06-25 | 60117 | 66074 | 60117 | | 10001 | 62102 | 1987-06-26 | 1988-06-25 | 60117 | 62102 | 60117 | | 10001 | 60117 | 1986-06-26 | 1987-06-26 | 60117 | 60117 | 60117 | +--------+--------+------------+------------+-------+-------+-------+ 17 rows in set (0.00 sec) mysql> select s.*, -> max(s.salary) over(partition by s.emp_no order by s.to_date asc rows between unbounded preceding and unbounded following) m1, -> max(s.salary) over(partition by s.emp_no order by s.to_date asc rows between unbounded preceding and current row) m2, -> max(s.salary) over(partition by s.emp_no) m3 -> from salaries s where s.emp_no=10001; +--------+--------+------------+------------+-------+-------+-------+ | emp_no | salary | from_date | to_date | m1 | m2 | m3 | +--------+--------+------------+------------+-------+-------+-------+ | 10001 | 60117 | 1986-06-26 | 1987-06-26 | 88958 | 60117 | 88958 | | 10001 | 62102 | 1987-06-26 | 1988-06-25 | 88958 | 62102 | 88958 | | 10001 | 66074 | 1988-06-25 | 1989-06-25 | 88958 | 66074 | 88958 | | 10001 | 66596 | 1989-06-25 | 1990-06-25 | 88958 | 66596 | 88958 | | 10001 | 66961 | 1990-06-25 | 1991-06-25 | 88958 | 66961 | 88958 | | 10001 | 71046 | 1991-06-25 | 1992-06-24 | 88958 | 71046 | 88958 | | 10001 | 74333 | 1992-06-24 | 1993-06-24 | 88958 | 74333 | 88958 | | 10001 | 75286 | 1993-06-24 | 1994-06-24 | 88958 | 75286 | 88958 | | 10001 | 75994 | 1994-06-24 | 1995-06-24 | 88958 | 75994 | 88958 | | 10001 | 76884 | 1995-06-24 | 1996-06-23 | 88958 | 76884 | 88958 | | 10001 | 80013 | 1996-06-23 | 1997-06-23 | 88958 | 80013 | 88958 | | 10001 | 81025 | 1997-06-23 | 1998-06-23 | 88958 | 81025 | 88958 | | 10001 | 81097 | 1998-06-23 | 1999-06-23 | 88958 | 81097 | 88958 | | 10001 | 84917 | 1999-06-23 | 2000-06-22 | 88958 | 84917 | 88958 | | 10001 | 85112 | 2000-06-22 | 2001-06-22 | 88958 | 85112 | 88958 | | 10001 | 85097 | 2001-06-22 | 2002-06-22 | 88958 | 85112 | 88958 | | 10001 | 88958 | 2002-06-22 | 9999-01-01 | 88958 | 88958 | 88958 | +--------+--------+------------+------------+-------+-------+-------+ 17 rows in set (0.00 sec)
还有很多没有提到的窗口函数,可以参考手册。
CUME_DIST()
FIRST_VALUE()
LAST_VALUE()
NTH_VALUE()
NTILE()
参考链接
12.21.1 Window Function Descriptions