MySQL——GROUP BY语句详解
1. GROUP BY语句
Group By语句可以根据一个或多个列对结果集进行分组,在分组的列上我们可以使用COUNT, SUM, AVG等函数。它的语法为select column_name, function(column_name) from table_name where column_name operator value group by column_name;
。
这里,我们使用employee_tbl数据表来分析一些实例。首先,employee_tbl数据表的生成代码如下。
create table employee_tbl(
id int not null,
name char(10) not null default '',
date datetime not null,
singin tinyint(4) not null default '0',
primary key(id)
)engine=InnoDB default charset=utf8;
insert into employee_tbl values
('1', '小明', '2016-04-22 15:25:33', '1'),
('2', '小王', '2016-04-20 15:25:47', '3'),
('3', '小丽', '2016-04-19 15:26:02', '2'),
('4', '小王', '2016-04-07 15:26:14', '4'),
('5', '小明', '2016-04-11 15:26:40', '4'),
('6', '小明', '2016-04-04 15:26:54', '2');
employee_tbl数据表的结果如下。
+----+------+---------------------+--------+
| id | name | date | singin |
+----+------+---------------------+--------+
| 1 | 小明 | 2016-04-22 15:25:33 | 1 |
| 2 | 小王 | 2016-04-20 15:25:47 | 3 |
| 3 | 小丽 | 2016-04-19 15:26:02 | 2 |
| 4 | 小王 | 2016-04-07 15:26:14 | 4 |
| 5 | 小明 | 2016-04-11 15:26:40 | 4 |
| 6 | 小明 | 2016-04-04 15:26:54 | 2 |
+----+------+---------------------+--------+
可以使用Group By语句将employee_tbl按name进行分组,并统计每个人有多少条记录,代码和结果如下。
select name, COUNT(*) from employee_tbl group by name;
+------+----------+
| name | COUNT(*) |
+------+----------+
| 小明 | 3 |
| 小王 | 2 |
| 小丽 | 1 |
+------+----------+
使用with rollup
可以实现在分组统计数据的基础上再进行总的统计,用NULL表示,代码和结果如下。
select name, SUM(singin) as singin_count from employee_tbl group by name with rollup;
+------+--------------+
| name | singin_count |
+------+--------------+
| 小丽 | 2 |
| 小明 | 7 |
| 小王 | 7 |
| NULL | 16 |
+------+--------------+
可以使用coalesce来设置一个可以取代NULL的名称,select coalesce(a, b, c)
说明:如果a == null, 则选择b;如果b == null,则选择c;如果a!=null,则选择a;如果a、b、c都为null,则返回null。代码和结果如下所示。
select coalesce(name, '总数'), SUM(singin) as singin_count from employee_tbl group by name with rollup;
+------------------------+--------------+
| coalesce(name, '总数') | singin_count |
+------------------------+--------------+
| 小丽 | 2 |
| 小明 | 7 |
| 小王 | 7 |
| 总数 | 16 |
+------------------------+--------------+
2. GROUP BY语句与聚合函数配合
依旧是employee_tbl数据表,我们先执行select * from employee_tbl group by name;
,看看会有怎样的结果。
+----+------+---------------------+--------+
| id | name | date | singin |
+----+------+---------------------+--------+
| 1 | 小明 | 2016-04-22 15:25:33 | 1 |
| 2 | 小王 | 2016-04-20 15:25:47 | 3 |
| 3 | 小丽 | 2016-04-19 15:26:02 | 2 |
+----+------+---------------------+--------+
和employee_tbl数据表对比,原先小明和小王分别对应有3条和2条记录,而通过group by语句最终只剩下了1条记录,那这是什么原因呢?实际上,group by语句在执行后,我们可以认为生成了如下的一个虚拟表(想象出来的)。
+----+------+---------------------+--------+
| id | name | date | singin |
+----+------+---------------------+--------+
| 1 | | 2016-04-22 15:25:33 | 1 |
| 5 | 小明 | 2016-04-11 15:26:40 | 4 |
| 6 | | 2016-04-04 15:26:54 | 2 |
+----+------+---------------------+--------+
| 2 | 小王 | 2016-04-20 15:25:47 | 3 |
| 4 | | 2016-04-07 15:26:14 | 4 |
+----+------+---------------------+--------+
| 3 | 小丽 | 2016-04-19 15:26:02 | 2 |
+----+------+---------------------+--------+
也就是说相同name的记录合并成了一行,如果执行select * 的话,它只会提取对应单元格中的第一个数据;而聚合函数就可以对多数据的单元格进行处理。所以,我们可以来看看下面这道题。
我们首先来创建department表,代码如下。
create database leetcode;
use leetcode;
create table department_1179 (
id int,
revenue int,
month varchar(11) not null,
primary key(id, month)
)engine=InnoDB default charset=utf8;
insert into department_1179 values
(1, 8000, 'Jan'),
(2, 9000, 'Jan'),
(3, 10000, 'Feb'),
(1, 7000, 'Feb'),
(1, 6000, 'Mar');
为了重新格式化department表,获得查询得到的结果表的形式,需要把行转为列,我们先尝试用下面的代码看看是什么效果?
use leetcode;
select id,
(case when month='Jan' then revenue end) as Jan_Revenue,
(case when month='Feb' then revenue end) as Feb_Revenue,
(case when month='Mar' then revenue end) as Mar_Revenue,
(case when month='Apr' then revenue end) as Apr_Revenue,
(case when month='May' then revenue end) as May_Revenue,
(case when month='Jun' then revenue end) as Jun_Revenue,
(case when month='Jul' then revenue end) as Jul_Revenue,
(case when month='Aug' then revenue end) as Aug_Revenue,
(case when month='Sep' then revenue end) as Sep_Revenue,
(case when month='Oct' then revenue end) as Oct_Revenue,
(case when month='Nov' then revenue end) as Nov_Revenue,
(case when month='Dec' then revenue end) as Dec_Revenue
from department_1179 group by id order by id;
这样就出现了错误,当id=1时,Jan_Revenue和Mar_Revenue都变成了NULL,这是由于case when只会提取多数据单元格中的第一个数据(id=1时,month对应的多数据单元格中包含Feb、Jan和Mar),如果第一个数据不符合条件,那么不会读取剩下的数据。所以这里我们应该使用聚合函数,如sum(case when month='Jan' then revenue end)
,当id=1时,它会在Feb、Jan和Mar中寻找符合条件的Jan,并返回其对应的revenue的值。代码和结果如下。
use leetcode;
select id,
sum(case when month='Jan' then revenue end) as Jan_Revenue,
sum(case when month='Feb' then revenue end) as Feb_Revenue,
sum(case when month='Mar' then revenue end) as Mar_Revenue,
sum(case when month='Apr' then revenue end) as Apr_Revenue,
sum(case when month='May' then revenue end) as May_Revenue,
sum(case when month='Jun' then revenue end) as Jun_Revenue,
sum(case when month='Jul' then revenue end) as Jul_Revenue,
sum(case when month='Aug' then revenue end) as Aug_Revenue,
sum(case when month='Sep' then revenue end) as Sep_Revenue,
sum(case when month='Oct' then revenue end) as Oct_Revenue,
sum(case when month='Nov' then revenue end) as Nov_Revenue,
sum(case when month='Dec' then revenue end) as Dec_Revenue
from department_1179 group by id order by id;
3. Group By语句与Having子句
Where子句可以筛选记录,但是需要注意的是Where子句是在聚合前先筛选记录(在group by和having前产生作用),而Having子句是在聚合后对组记录进行筛选,所以在group by子句后使用having条件是实现子查询的很好的方法,我们以下面这道题为例。
这道题显然我们需要先根据class进行分组,在分完组之后再使用Having子句查询选这个class的student数是否大于等于5,代码如下。
select class from courses group by class Having count(distinct student) >= 5;
注意,这里不能用Where,因为Where是在Group By语句前进行筛选的。