Mysql 的 sql扫描行预估

转载

mob6454cc6d5f87 2024-09-06 23:30:20

文章标签 Mysql 的 sql扫描行预估 mysql MySQL 字段重启 文章分类 MySQL 数据库

　　MySQL8.0有一种复合索引可以跳过扫描范围访问方法，什么意思呢？简单的说，对于复合索引(a,b,c)，我们以往常说“where a=xx and b=xx”是可以用到索引，但“where b=xx and c=xx”是用不到索引的。但MySQL8.0中，后者也是可以用到索引了。

　　我们建一张测试表，其测试数据是104万，如下：

root@localhost|test>select version();
+-----------+
| version() |
+-----------+
| 8.0.19    |
+-----------+
1 row in set (0.00 sec)

root@localhost|test>show create table t\G
*************************** 1. row ***************************
       Table: t
Create Table: CREATE TABLE `t` (
  `id` int NOT NULL AUTO_INCREMENT,
  `dept` tinyint DEFAULT NULL comment '部门ID,1~10',
  `name` varchar(30) COLLATE utf8mb4_bin DEFAULT NULL comment '姓名',
  `create_time` datetime DEFAULT NULL comment '注册时间',
  PRIMARY KEY (`id`),
  KEY `idx_muti` (`create_time`,`name`,`dept`)
) ENGINE=InnoDB AUTO_INCREMENT=1441756 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.00 sec)

root@localhost|test>select count(1) from t;
+----------+
| count(1) |
+----------+
|  1048576 |
+----------+
1 row in set (0.11 sec)

　　从上可看到这个表有一个复合索引idx_muti (create_time,name,dept)，我们尝试直接查dept字段看是否能用到索引

root@localhost|test>explain select count(1) from t  where dept=5;
+----+-------------+-------+------------+-------+---------------+----------+---------+------+---------+----------+--------------------------+
| id | select_type | table | partitions | type  | possible_keys | key      | key_len | ref  | rows    | filtered | Extra                    |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+---------+----------+--------------------------+
|  1 | SIMPLE      | t     | NULL       | index | idx_muti      | idx_muti | 131     | NULL | 1045864 |    10.00 | Using where; Using index |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+---------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

root@localhost|test>select count(1) from t  where dept=5;        
+----------+
| count(1) |
+----------+
|   105271 |
+----------+
1 row in set (1.16 sec)

　　从执行计划可以看到对于复合索引(a,b,c)，我们再也不需要非要先使用(a,b)为前提才能使用c了，感觉真的方便了不少！

　　慢着，这真正执行的耗时感觉有点不太对，我们尝试不走索引看看

root@localhost|test>explain select count(1) from t ignore index(idx_muti) where dept=5;
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | t     | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1045864 |    10.00 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

root@localhost|test>select count(1) from t ignore index(idx_muti) where dept=5;         
+----------+
| count(1) |
+----------+
|   105271 |
+----------+
1 row in set (0.70 sec)

全表扫描的耗时居然比使用过索的耗时还要少，难道是因为innodb的缓存？

　　我们在my.cnf文件加入“innodb_buffer_pool_load_at_startup=0;”和“innodb_buffer_pool_dump_at_shutdown=0;”两个参数然后重启MySQL，再执行上述全表扫描的SQL，发现其执行时间都是在0.7秒左右，始终比使用索引还要快。

　　我们打开官档，官档的大体意思就是说，对于(a,b)的复合索引，如果“where b>xxx”的查询条件，那么先获取a字段的不同值，然后通过不同的值去构造出“where a=xx and b>xxx”，一直把a的不同值构造出来执行。

跳过扫描范围访问方法

请考虑以下情形：

CREATE TABLE t1 (f1 INT NOT NULL, f2 INT NOT NULL, PRIMARY KEY(f1, f2));
INSERT INTO t1 VALUES
  (1,1), (1,2), (1,3), (1,4), (1,5),
  (2,1), (2,2), (2,3), (2,4), (2,5);
INSERT INTO t1 SELECT f1, f2 + 5 FROM t1;
INSERT INTO t1 SELECT f1, f2 + 10 FROM t1;
INSERT INTO t1 SELECT f1, f2 + 20 FROM t1;
INSERT INTO t1 SELECT f1, f2 + 40 FROM t1;
ANALYZE TABLE t1;

EXPLAIN SELECT f1, f2 FROM t1 WHERE f2 > 40;

为了执行该查询，MySQL可以选择索引扫描以获取所有行（索引包括要选择的所有列），然后应用子句中的f2 > 40 条件WHERE以产生最终结果集。

范围扫描比全索引扫描更有效，但是在这种情况下不能使用，因为f1在第一个索引列上没有条件。然而，如MySQL的8.0.13的，优化器可以执行多个范围扫描，一个用于的每个值f1，使用一种称为跳跃扫描方法是类似于松散索引扫描

在第一个索引部分的不同值f1（索引前缀）之间跳过。
f2 > 40对其余索引部分上的条件，对每个不同的前缀值执行子范围扫描。

对于前面显示的数据集，算法的运行方式如下：

获取第一个关键部分的第一个不同值（f1 = 1）。
根据第一和第二关键部分（f1 = 1 AND f2 > 40）构造范围。
执行范围扫描。
获取第一个关键部分的下一个不同值（f1 = 2）。
根据第一和第二关键部分（f1 = 2 AND f2 > 40）构造范围。
执行范围扫描。

使用此策略可减少访问的行数，因为MySQL会跳过不符合每个构造范围的行。此跳过扫描访问方法适用于以下情况：

表T具有至少一个复合索引，其关键部分的形式为（[A_1，...，A_ k，] B_1，...，B_ m，C [，D_1，...，D_ n]）。关键部分A和D可能为空，但B和C必须为非空。
该查询仅引用一个表。
查询不使用GROUP BY或 DISTINCT。
该查询仅引用索引中的列。
A_1，...，A_上的谓词k必须是相等谓词，并且它们必须是常量。这包括 IN()操作员。
该查询必须是一个联合查询。即， AND的OR 条件： (cond1(key_part1) OR cond2(key_part1)) AND (cond1(key_part2) OR ...) AND ...
C上必须有范围条件。
D列上的条件是允许的。D上的条件必须与C上的范围条件结合使用。

　　从官档的说明来看，这种跳过扫描范围访问方法适用于复合索引(a,b)中，a的索引基数比较小的（就是a字段的值大部份相同，例如性名、年龄、类型、状态之类），如果其索引基数比较大，则全表扫描可能比这种跳过扫描范围访问要更快些。

　　以下我们验证下复合索引(a,b)中，a的索引基数比较小的情况。

　　在实验前，我们先重启一次MySQL

#删除原来的索引
root@localhost|test>drop index idx_muti on t;
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

#新建一个索引，以dept字段作为前缀字段。
root@localhost|test>create index idx_muti on t(dept, create_time);
Query OK, 0 rows affected (4.75 sec)
Records: 0  Duplicates: 0  Warnings: 0

root@localhost|test>explain select count(1) from t where create_time>='2020-01-01' and create_time<'2020-02-01';
+----+-------------+-------+------------+-------+---------------+----------+---------+------+--------+----------+----------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key      | key_len | ref  | rows   | filtered | Extra                                  |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+--------+----------+----------------------------------------+
|  1 | SIMPLE      | t     | NULL       | range | idx_muti      | idx_muti | 8       | NULL | 116183 |   100.00 | Using where; Using index for skip scan |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+--------+----------+----------------------------------------+
1 row in set, 1 warning (0.00 sec)

root@localhost|test>select count(1) from t where create_time>='2020-01-01' and create_time<'2020-02-01';        
+----------+
| count(1) |
+----------+
|    75932 |
+----------+
1 row in set (0.08 sec)

#重启MySQL
root@localhost|test>system systemctl restart mysql 

#尝试使用全表扫描的方法
root@localhost|test>explain select count(1) from t ignore index(idx_muti) where create_time>='2020-01-01' and create_time<'2020-02-01';
ERROR 2006 (HY000): MySQL server has gone away        #因为上面重启了mysql，因此此处链接丢失了需要重连
No connection. Trying to reconnect...
Connection id:    8
Current database: test

+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | t     | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 1045864 |    11.11 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.03 sec)

root@localhost|test>select count(1) from t ignore index(idx_muti) where create_time>='2020-01-01' and create_time<'2020-02-01';        
+----------+
| count(1) |
+----------+
|    75932 |
+----------+
1 row in set (0.71 sec)

　　可见此时使用索引的效率更高。因此使用新特性时要注意其适用范围，否则可能会适得其反。

官档参考：https://dev.mysql.com/doc/refman/8.0/en/range-optimization.html#range-access-skip-scan

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。