简介
做开发和运维的朋友,对MySQL数据库的复合索引一定不会陌生,可是要创建一个合理的,并且高效的复合索引,你就得了解复合索引的运行原理,你的了解什么是最左前缀原则。
复合索引运行原理
在MySQL数据库中,如果一个应用频繁地使用相同的几个字段查询结果,可以考虑建立这几个字段的联合索引来提高查询效率,可是在MySQL数据库中,复合索引是如何查找到应用需要的记录的呢,先来看一副复合索引原理图
图中的复合索引的字段顺序为(age,first_name,second_name),从图中可以看出,B+tree索引树中的根节点和枝节点的数据,都是age,而叶子节点则存储了age,first_name,second_name3个字段信息,并且存储方式也按照age,first_name,second_name三个字段排好序的,age先排好,再按照first_name排序,最后按照second_name排好。
举个例子,如果应用想好查询(age=1 and first_name='黄' and second_name='安')的记录,复合索引是如何搜索的呢,首先会从根节点开始寻找age=1的所有叶子节点,然后在叶子节点内部,通过二分法匹配出所有的age=1 and first_name='黄' and second_name='安'的记录,并找到对应的主键ID,最后回表返回完整的记录。
最左匹配原则
对于复合索引,总是从索引的最左边的字段开始,接着往后,中间不能跳过。例如创建了多列索引(age,first_name,second_name),会先匹配age字段,再匹配first_name字段,再匹配second_name字段的,中间不能跳过。MySQL数据库会一直向右匹配直到遇到范围查询(>、
估计有朋友看到上面的话,有点懵,没事,下面用实际案例才解释最左匹配原则。
创建测试表和记录
创建测试表t_test3
Create Table: CREATE TABLE `t_test3` ( `id` int(11) NOT NULL, `age` int(11) DEFAULT NULL, `first_name` char(20) DEFAULT NULL, `second_name` char(20) DEFAULT NULL, `address` char(20) DEFAULT NULL, `intime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT CHARSET=utf81 row in set (0.20 sec)insert into t_test3(id,age,first_name,second_name,address) values(1,1,'李','安','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(2,1,'李','邦','朝阳2');insert into t_test3(id,age,first_name,second_name,address) values(3,1,'李','当','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(4,1,'李','安','朝阳我说的是');insert into t_test3(id,age,first_name,second_name,address) values(5,1,'黄','安','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(6,2,'黄','邦','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(7,2,'李','安','朝阳d');insert into t_test3(id,age,first_name,second_name,address) values(8,2,'黄','当','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(9,3,'李','安','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(10,3,'李','邦','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(11,4,'黄','安','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(12,4,'李','当','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(13,5,'黄','安','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(14,6,'黄','当','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(15,6,'李','邦','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(16,6,'李','邦','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(17,7,'黄','安','朝阳');insert into t_test3(id,age,first_name,second_name,address) values(18,7,null,'安','朝阳');
创建3个测试索引
alter table t_test3 add index idx_t_test3_age(age);alter table t_test3 add index idx_t_test3_age_first_name(age,first_name);alter table t_test3 add index idx_t_test3_age_first_name_second_name(age,first_name,second_name);
测试大于号(>)
测试SQL语句的执行计划(age=1 and first_name>'李' and second_name='安')
mysql> explain select * from t_test3 where age=1 and first_name>'李' and second_name='安';+----+-------------+---------+------------+-------+-----------------------------------------------------------------------------------+----------------------------+---------+------+------+----------+------------------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+---------+------------+-------+-----------------------------------------------------------------------------------+----------------------------+---------+------+------+----------+------------------------------------+| 1 | SIMPLE | t_test3 | NULL | range | idx_t_test3_age,idx_t_test3_age_first_name,idx_t_test3_age_first_name_second_name | idx_t_test3_age_first_name | 66 | NULL | 1 | 10.00 | Using index condition; Using where |+----+-------------+---------+------------+-------+-----------------------------------------------------------------------------------+----------------------------+---------+------+------+----------+------------------------------------+1 row in set, 1 warning (0.00 sec)
从执行计划可以看出,最终使用的索引是idx_t_test3_age_first_name,这就证明当MySQL数据库开始搜索age=1之后,紧接着搜索first_name>'李',由于碰到了大于号(>),索引搜索就停止了。
测试大于号(
测试SQL语句的执行计划(age=1 and first_name
mysql> explain select * from t_test3 where age=1 and first_name
从执行计划可以看出,最终使用的索引是idx_t_test3_age_first_name,这就证明当MySQL数据库开始搜索age=1之后,紧接着搜索first_name
测试like
测试SQL语句的执行计划(age=1 and first_name like '李%' and second_name='安')
mysql> explain select * from t_test3 where age=1 and first_name like '李%' and second_name='安';+----+-------------+---------+------------+-------+-----------------------------------------------------------------------------------+----------------------------+---------+------+------+----------+------------------------------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+---------+------------+-------+-----------------------------------------------------------------------------------+----------------------------+---------+------+------+----------+------------------------------------+| 1 | SIMPLE | t_test3 | NULL | range | idx_t_test3_age,idx_t_test3_age_first_name,idx_t_test3_age_first_name_second_name | idx_t_test3_age_first_name | 66 | NULL | 4 | 10.00 | Using index condition; Using where |+----+-------------+---------+------------+-------+-----------------------------------------------------------------------------------+----------------------------+---------+------+------+----------+------------------------------------+1 row in set, 1 warning (0.00 sec)
从执行计划可以看出,最终使用的索引是idx_t_test3_age_first_name,这就证明当MySQL数据库开始搜索age=1之后,紧接着搜索first_name like '李%',由于碰到了like,索引搜索就停止了,不过like用法会比较特殊,如果like后面的字符串中,通配符放在首位,则不会走索引,如果不放在首位,则会走索引。
下面来测试一下,通配符放在首位的情况
mysql> explain select * from t_test3 where age=1 and first_name like '%李%' and second_name='安';+----+-------------+---------+------------+------+-----------------------------------------------------------------------------------+-----------------+---------+-------+------+----------+-------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+---------+------------+------+-----------------------------------------------------------------------------------+-----------------+---------+-------+------+----------+-------------+| 1 | SIMPLE | t_test3 | NULL | ref | idx_t_test3_age,idx_t_test3_age_first_name,idx_t_test3_age_first_name_second_name | idx_t_test3_age | 5 | const | 5 | 5.56 | Using where |+----+-------------+---------+------------+------+-----------------------------------------------------------------------------------+-----------------+---------+-------+------+----------+-------------+1 row in set, 1 warning (0.00 sec)
从执行计划中可以看出,最终使用的索引是idx_t_test3_age。
上面就是复合索引运行原理。
复合索引列选择原则
1.复合索引的索引列,尽量选择区分度高的列作为索引,区分度的公式是 count(distinct col)/count(*),表示字段不重复的比例,比例越大我们扫描的记录数越少,唯一键的区分度是 1,而一些状态、性别字段可能在大数据面前区分度就是 0。一般需要 join 的字段都要求区分度 0.1 以上,即平均 1 条扫描 10 条记录。
2.尽可能减少复合索引列的个数
存在在线上系统表,开发人员在(orderid,merid,orderdate)上创建了复合索引,可实际上orderid基本就不会重复,实际上只需要在orderid上创建索引,即可满足应用查询要求。