【MySQL】exists与in的比较

原创

morris131 2023-04-23 10:15:52 博主文章分类：MySQL ©著作权

文章标签 mysql 数据库 sql 索引性能优化 文章分类 HarmonyOS 后端开发

©著作权归作者所有：来自51CTO博客作者morris131的原创作品，请联系作者获取转载授权，否则将追究法律责任

关于exists代替in

exists对外表用loop逐条查询，每次查询都会查看exists的条件语句，当exists里的条件语句能够返回记录行时（无论记录行是的多少，只要能返回），条件就为真，返回当前loop到的这条记录，反之如果exists里的条件语句不能返回记录行，则当前loop到的这条记录被丢弃，exists的条件就像一个bool条件，当能返回结果集则为true，不能返回结果集则为false。

假设现在有如下两条SQL语句：

select * from A where exists (select * from B where B.id = A.id);
select * from A where A.id in (select id from B);

查询1可以转化以下伪代码，便于理解：

j = 0;
for (i = 0; i < count(A); i++) {
　　a = get_record(A, i); // 从A表逐条获取记录
　　if (B.id == a.id) // 如果子条件成立
　　　　result[j++] = a;
}
return result;

大概就是这么个意思，其实可以看到，查询1主要是用到了B表的索引，A表如何对查询的效率影响不大。

假设B表的所有id为1,2,3，查询2可以转换为：

select * from A where A.id = 1 or A.id = 2 or A.id = 3;

这个好理解了，这里主要是用到了A的索引，B表如何对查询影响不大。

如果查询的两个表大小相当，那么用in和exists差别不大。

如果两个表中一个较小，一个是大表，则子查询表大的用exists，子查询表小的用in。

实验

准备数据

t_dept表

mysql> create table t_dept like departments;
mysql> insert into t_dept select * from departments;
mysql> alter table t_dept drop index dept_name;

mysql> show index from t_dept;
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t_dept |          0 | PRIMARY  |            1 | dept_no     | A         |           9 |     NULL | NULL   |      | BTREE      |         |               |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 row in set (0.00 sec)

mysql> select count(*) from t_dept;
+----------+
| count(*) |
+----------+
|        9 |
+----------+
1 row in set (0.00 sec)

dept_emp表：

mysql> show index from dept_emp;
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| dept_emp |          0 | PRIMARY  |            1 | emp_no      | A         |      300188 |     NULL | NULL   |      | BTREE      |         |               |
| dept_emp |          0 | PRIMARY  |            2 | dept_no     | A         |      331570 |     NULL | NULL   |      | BTREE      |         |               |
| dept_emp |          1 | dept_no  |            1 | dept_no     | A         |           8 |     NULL | NULL   |      | BTREE      |         |               |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

mysql> select count(*) from dept_emp;
+----------+
| count(*) |
+----------+
|   331603 |
+----------+
1 row in set (0.07 sec)

A为大表，B为小表

使用in查询数据：

mysql> explain select * from dept_emp where dept_no in (select dept_no from t_dept);
+----+-------------+----------+------------+-------+---------------+---------+---------+--------------------------+-------+----------+-------------+
| id | select_type | table    | partitions | type  | possible_keys | key     | key_len | ref                      | rows  | filtered | Extra       |
+----+-------------+----------+------------+-------+---------------+---------+---------+--------------------------+-------+----------+-------------+
|  1 | SIMPLE      | t_dept   | NULL       | index | PRIMARY       | PRIMARY | 4       | NULL                     |     9 |   100.00 | Using index |
|  1 | SIMPLE      | dept_emp | NULL       | ref   | dept_no       | dept_no | 4       | employees.t_dept.dept_no | 41446 |   100.00 | NULL        |
+----+-------------+----------+------------+-------+---------------+---------+---------+--------------------------+-------+----------+-------------+
2 rows in set, 1 warning (0.01 sec)

使用in查询数据相当于下面的join：

mysql> explain select de.* from dept_emp de,t_dept d where de.dept_no=d.dept_no;
+----+-------------+-------+------------+-------+---------------+---------+---------+---------------------+-------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref                 | rows  | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+---------------------+-------+----------+-------------+
|  1 | SIMPLE      | d     | NULL       | index | PRIMARY       | PRIMARY | 4       | NULL                |     9 |   100.00 | Using index |
|  1 | SIMPLE      | de    | NULL       | ref   | dept_no       | dept_no | 4       | employees.d.dept_no | 41446 |   100.00 | NULL        |
+----+-------------+-------+------------+-------+---------------+---------+---------+---------------------+-------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)

使用exists查询数据：

mysql> explain select * from dept_emp de where exists (select d.dept_no from t_dept d where d.dept_no=de.dept_no);
+----+--------------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys | key     | key_len | ref                  | rows   | filtered | Extra       |
+----+--------------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+
|  1 | PRIMARY            | de    | NULL       | ALL    | NULL          | NULL    | NULL    | NULL                 | 331570 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | d     | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | employees.de.dept_no |      1 |   100.00 | Using index |
+----+--------------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+
2 rows in set, 2 warnings (0.00 sec)

dept_emp为大表，t_dept为小表，可见使用in效率更高，扫描的行数更少。

A为小表，B为大表

使用in查询数据：

mysql> explain select * from t_dept d where d.dept_no in (select de.dept_no from dept_emp de);
+----+-------------+-------+------------+------+---------------+---------+---------+---------------------+-------+----------+----------------------------+
| id | select_type | table | partitions | type | possible_keys | key     | key_len | ref                 | rows  | filtered | Extra                      |
+----+-------------+-------+------------+------+---------------+---------+---------+---------------------+-------+----------+----------------------------+
|  1 | SIMPLE      | d     | NULL       | ALL  | PRIMARY       | NULL    | NULL    | NULL                |     9 |   100.00 | NULL                       |
|  1 | SIMPLE      | de    | NULL       | ref  | dept_no       | dept_no | 4       | employees.d.dept_no | 41446 |   100.00 | Using index; FirstMatch(d) |
+----+-------------+-------+------------+------+---------------+---------+---------+---------------------+-------+----------+----------------------------+
2 rows in set, 1 warning (0.00 sec)

使用exists查询数据：

mysql> explain select * from t_dept d where exists (select de.dept_no from dept_emp de where d.dept_no=de.dept_no);
+----+--------------------+-------+------------+------+---------------+---------+---------+---------------------+-------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys | key     | key_len | ref                 | rows  | filtered | Extra       |
+----+--------------------+-------+------------+------+---------------+---------+---------+---------------------+-------+----------+-------------+
|  1 | PRIMARY            | d     | NULL       | ALL  | NULL          | NULL    | NULL    | NULL                |     9 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | de    | NULL       | ref  | dept_no       | dept_no | 4       | employees.d.dept_no | 41446 |   100.00 | Using index |
+----+--------------------+-------+------------+------+---------------+---------+---------+---------------------+-------+----------+-------------+
2 rows in set, 2 warnings (0.00 sec)

t_dept为小表，dept_emp为大表，可见使用in和使用exists效率差不多，扫描的行数一样，为什么？按照之前的分析，子查询表大的用exists效率不应该更高吗？

因为MySQL对in语句做了大量的优化，具体做了啥优化呢？查询的最后显示有一个警告，答案就在这个警告中：

mysql> show warnings;
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                                                                                                                                      |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select `employees`.`d`.`dept_no` AS `dept_no`,`employees`.`d`.`dept_name` AS `dept_name` from `employees`.`t_dept` `d` semi join (`employees`.`dept_emp` `de`) where (`employees`.`de`.`dept_no` = `employees`.`d`.`dept_no`) |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)