关于exists代替in

exists对外表用loop逐条查询,每次查询都会查看exists的条件语句,当exists里的条件语句能够返回记录行时(无论记录行是的多少,只要能返回),条件就为真,返回当前loop到的这条记录,反之如果exists里的条件语句不能返回记录行,则当前loop到的这条记录被丢弃,exists的条件就像一个bool条件,当能返回结果集则为true,不能返回结果集则为false。

假设现在有如下两条SQL语句:

select * from A where exists (select * from B where B.id = A.id);
select * from A where A.id in (select id from B);

查询1可以转化以下伪代码,便于理解:

j = 0;
for (i = 0; i < count(A); i++) {
  a = get_record(A, i); // 从A表逐条获取记录
  if (B.id == a.id) // 如果子条件成立
    result[j++] = a;
}
return result;

大概就是这么个意思,其实可以看到,查询1主要是用到了B表的索引,A表如何对查询的效率影响不大。

假设B表的所有id为1,2,3,查询2可以转换为:

select * from A where A.id = 1 or A.id = 2 or A.id = 3;

这个好理解了,这里主要是用到了A的索引,B表如何对查询影响不大。

如果查询的两个表大小相当,那么用in和exists差别不大。

如果两个表中一个较小,一个是大表,则子查询表大的用exists,子查询表小的用in。

实验

准备数据

t_dept表

mysql> create table t_dept like departments;
mysql> insert into t_dept select * from departments;
mysql> alter table t_dept drop index dept_name;

mysql> show index from t_dept;
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t_dept |          0 | PRIMARY  |            1 | dept_no     | A         |           9 |     NULL | NULL   |      | BTREE      |         |               |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 row in set (0.00 sec)

mysql> select count(*) from t_dept;
+----------+
| count(*) |
+----------+
|        9 |
+----------+
1 row in set (0.00 sec)

dept_emp表:

mysql> show index from dept_emp;
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| dept_emp |          0 | PRIMARY  |            1 | emp_no      | A         |      300188 |     NULL | NULL   |      | BTREE      |         |               |
| dept_emp |          0 | PRIMARY  |            2 | dept_no     | A         |      331570 |     NULL | NULL   |      | BTREE      |         |               |
| dept_emp |          1 | dept_no  |            1 | dept_no     | A         |           8 |     NULL | NULL   |      | BTREE      |         |               |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

mysql> select count(*) from dept_emp;
+----------+
| count(*) |
+----------+
|   331603 |
+----------+
1 row in set (0.07 sec)

A为大表,B为小表

使用in查询数据:

mysql> explain select * from dept_emp where dept_no in (select dept_no from t_dept);
+----+-------------+----------+------------+-------+---------------+---------+---------+--------------------------+-------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys | key     | key_len | ref                      | rows  | filtered | Extra       |
+----+-------------+----------+------------+-------+---------------+---------+---------+--------------------------+-------+----------+-------------+
|  1 | SIMPLE      | t_dept   | NULL       | index | PRIMARY       | PRIMARY | 4       | NULL                     |     9 |   100.00 | Using index |
|  1 | SIMPLE      | dept_emp | NULL       | ref   | dept_no       | dept_no | 4       | employees.t_dept.dept_no | 41446 |   100.00 | NULL        |
+----+-------------+----------+------------+-------+---------------+---------+---------+--------------------------+-------+----------+-------------+
2 rows in set, 1 warning (0.01 sec)

使用in查询数据相当于下面的join:

mysql> explain select de.* from dept_emp de,t_dept d where de.dept_no=d.dept_no;
+----+-------------+-------+------------+-------+---------------+---------+---------+---------------------+-------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref                 | rows  | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+---------------------+-------+----------+-------------+
|  1 | SIMPLE      | d     | NULL       | index | PRIMARY       | PRIMARY | 4       | NULL                |     9 |   100.00 | Using index |
|  1 | SIMPLE      | de    | NULL       | ref   | dept_no       | dept_no | 4       | employees.d.dept_no | 41446 |   100.00 | NULL        |
+----+-------------+-------+------------+-------+---------------+---------+---------+---------------------+-------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

使用exists查询数据:

mysql> explain select * from dept_emp de where exists (select d.dept_no from t_dept d where d.dept_no=de.dept_no);
+----+--------------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys | key     | key_len | ref                  | rows   | filtered | Extra       |
+----+--------------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+
|  1 | PRIMARY            | de    | NULL       | ALL    | NULL          | NULL    | NULL    | NULL                 | 331570 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | d     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | employees.de.dept_no |      1 |   100.00 | Using index |
+----+--------------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+
2 rows in set, 2 warnings (0.00 sec)

dept_emp为大表,t_dept为小表,可见使用in效率更高,扫描的行数更少。

A为小表,B为大表

使用in查询数据:

mysql> explain select * from t_dept d where d.dept_no in (select de.dept_no from dept_emp de);
+----+-------------+-------+------------+------+---------------+---------+---------+---------------------+-------+----------+----------------------------+
| id | select_type | table | partitions | type | possible_keys | key     | key_len | ref                 | rows  | filtered | Extra                      |
+----+-------------+-------+------------+------+---------------+---------+---------+---------------------+-------+----------+----------------------------+
|  1 | SIMPLE      | d     | NULL       | ALL  | PRIMARY       | NULL    | NULL    | NULL                |     9 |   100.00 | NULL                       |
|  1 | SIMPLE      | de    | NULL       | ref  | dept_no       | dept_no | 4       | employees.d.dept_no | 41446 |   100.00 | Using index; FirstMatch(d) |
+----+-------------+-------+------------+------+---------------+---------+---------+---------------------+-------+----------+----------------------------+
2 rows in set, 1 warning (0.00 sec)

使用exists查询数据:

mysql> explain select * from t_dept d where exists (select de.dept_no from dept_emp de where d.dept_no=de.dept_no);
+----+--------------------+-------+------------+------+---------------+---------+---------+---------------------+-------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys | key     | key_len | ref                 | rows  | filtered | Extra       |
+----+--------------------+-------+------------+------+---------------+---------+---------+---------------------+-------+----------+-------------+
|  1 | PRIMARY            | d     | NULL       | ALL  | NULL          | NULL    | NULL    | NULL                |     9 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | de    | NULL       | ref  | dept_no       | dept_no | 4       | employees.d.dept_no | 41446 |   100.00 | Using index |
+----+--------------------+-------+------------+------+---------------+---------+---------+---------------------+-------+----------+-------------+
2 rows in set, 2 warnings (0.00 sec)

t_dept为小表,dept_emp为大表,可见使用in和使用exists效率差不多,扫描的行数一样,为什么?按照之前的分析,子查询表大的用exists效率不应该更高吗?

因为MySQL对in语句做了大量的优化,具体做了啥优化呢?查询的最后显示有一个警告,答案就在这个警告中:

mysql> show warnings;
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                      |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `employees`.`d`.`dept_no` AS `dept_no`,`employees`.`d`.`dept_name` AS `dept_name` from `employees`.`t_dept` `d` semi join (`employees`.`dept_emp` `de`) where (`employees`.`de`.`dept_no` = `employees`.`d`.`dept_no`) |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)