关于exists代替in
exists对外表用loop逐条查询,每次查询都会查看exists的条件语句,当exists里的条件语句能够返回记录行时(无论记录行是的多少,只要能返回),条件就为真,返回当前loop到的这条记录,反之如果exists里的条件语句不能返回记录行,则当前loop到的这条记录被丢弃,exists的条件就像一个bool条件,当能返回结果集则为true,不能返回结果集则为false。
假设现在有如下两条SQL语句:
select * from A where exists (select * from B where B.id = A.id);
select * from A where A.id in (select id from B);
查询1可以转化以下伪代码,便于理解:
j = 0;
for (i = 0; i < count(A); i++) {
a = get_record(A, i); // 从A表逐条获取记录
if (B.id == a.id) // 如果子条件成立
result[j++] = a;
}
return result;
大概就是这么个意思,其实可以看到,查询1主要是用到了B表的索引,A表如何对查询的效率影响不大。
假设B表的所有id为1,2,3,查询2可以转换为:
select * from A where A.id = 1 or A.id = 2 or A.id = 3;
这个好理解了,这里主要是用到了A的索引,B表如何对查询影响不大。
如果查询的两个表大小相当,那么用in和exists差别不大。
如果两个表中一个较小,一个是大表,则子查询表大的用exists,子查询表小的用in。
实验
准备数据
t_dept表
mysql> create table t_dept like departments;
mysql> insert into t_dept select * from departments;
mysql> alter table t_dept drop index dept_name;
mysql> show index from t_dept;
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t_dept | 0 | PRIMARY | 1 | dept_no | A | 9 | NULL | NULL | | BTREE | | |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
1 row in set (0.00 sec)
mysql> select count(*) from t_dept;
+----------+
| count(*) |
+----------+
| 9 |
+----------+
1 row in set (0.00 sec)
dept_emp表:
mysql> show index from dept_emp;
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| dept_emp | 0 | PRIMARY | 1 | emp_no | A | 300188 | NULL | NULL | | BTREE | | |
| dept_emp | 0 | PRIMARY | 2 | dept_no | A | 331570 | NULL | NULL | | BTREE | | |
| dept_emp | 1 | dept_no | 1 | dept_no | A | 8 | NULL | NULL | | BTREE | | |
+----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)
mysql> select count(*) from dept_emp;
+----------+
| count(*) |
+----------+
| 331603 |
+----------+
1 row in set (0.07 sec)
A为大表,B为小表
使用in查询数据:
mysql> explain select * from dept_emp where dept_no in (select dept_no from t_dept);
+----+-------------+----------+------------+-------+---------------+---------+---------+--------------------------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+---------+---------+--------------------------+-------+----------+-------------+
| 1 | SIMPLE | t_dept | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 9 | 100.00 | Using index |
| 1 | SIMPLE | dept_emp | NULL | ref | dept_no | dept_no | 4 | employees.t_dept.dept_no | 41446 | 100.00 | NULL |
+----+-------------+----------+------------+-------+---------------+---------+---------+--------------------------+-------+----------+-------------+
2 rows in set, 1 warning (0.01 sec)
使用in查询数据相当于下面的join:
mysql> explain select de.* from dept_emp de,t_dept d where de.dept_no=d.dept_no;
+----+-------------+-------+------------+-------+---------------+---------+---------+---------------------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+---------------------+-------+----------+-------------+
| 1 | SIMPLE | d | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 9 | 100.00 | Using index |
| 1 | SIMPLE | de | NULL | ref | dept_no | dept_no | 4 | employees.d.dept_no | 41446 | 100.00 | NULL |
+----+-------------+-------+------------+-------+---------------+---------+---------+---------------------+-------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
使用exists查询数据:
mysql> explain select * from dept_emp de where exists (select d.dept_no from t_dept d where d.dept_no=de.dept_no);
+----+--------------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+
| 1 | PRIMARY | de | NULL | ALL | NULL | NULL | NULL | NULL | 331570 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | d | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.de.dept_no | 1 | 100.00 | Using index |
+----+--------------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+
2 rows in set, 2 warnings (0.00 sec)
dept_emp为大表,t_dept为小表,可见使用in效率更高,扫描的行数更少。
A为小表,B为大表
使用in查询数据:
mysql> explain select * from t_dept d where d.dept_no in (select de.dept_no from dept_emp de);
+----+-------------+-------+------------+------+---------------+---------+---------+---------------------+-------+----------+----------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+---------+---------+---------------------+-------+----------+----------------------------+
| 1 | SIMPLE | d | NULL | ALL | PRIMARY | NULL | NULL | NULL | 9 | 100.00 | NULL |
| 1 | SIMPLE | de | NULL | ref | dept_no | dept_no | 4 | employees.d.dept_no | 41446 | 100.00 | Using index; FirstMatch(d) |
+----+-------------+-------+------------+------+---------------+---------+---------+---------------------+-------+----------+----------------------------+
2 rows in set, 1 warning (0.00 sec)
使用exists查询数据:
mysql> explain select * from t_dept d where exists (select de.dept_no from dept_emp de where d.dept_no=de.dept_no);
+----+--------------------+-------+------------+------+---------------+---------+---------+---------------------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+------+---------------+---------+---------+---------------------+-------+----------+-------------+
| 1 | PRIMARY | d | NULL | ALL | NULL | NULL | NULL | NULL | 9 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | de | NULL | ref | dept_no | dept_no | 4 | employees.d.dept_no | 41446 | 100.00 | Using index |
+----+--------------------+-------+------------+------+---------------+---------+---------+---------------------+-------+----------+-------------+
2 rows in set, 2 warnings (0.00 sec)
t_dept为小表,dept_emp为大表,可见使用in和使用exists效率差不多,扫描的行数一样,为什么?按照之前的分析,子查询表大的用exists效率不应该更高吗?
因为MySQL对in语句做了大量的优化,具体做了啥优化呢?查询的最后显示有一个警告,答案就在这个警告中:
mysql> show warnings;
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Note | 1003 | /* select#1 */ select `employees`.`d`.`dept_no` AS `dept_no`,`employees`.`d`.`dept_name` AS `dept_name` from `employees`.`t_dept` `d` semi join (`employees`.`dept_emp` `de`) where (`employees`.`de`.`dept_no` = `employees`.`d`.`dept_no`) |
+-------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)