使用or会导致sql无法使用索引。可以使用union进行改写
SELECT
`member_id`
FROM
`feel_member`
WHERE
(
`account` = 'leader'
OR `mobile` = 'leader'
OR `email` = 'leader'
OR `name` = 'leader'
)
AND `type` = 1
AND `closed` = 0
LIMIT
1;
以select * from film_actor where c1 = “xxx” and c2 = "xxx"为例:
- 当c1列和c2列选择性较高时,按照c1和c2条件进行查询性能高且返回数据集较小,再对两个数据量较小的数据集求交集的操作成本也比较低,最终整个语句查询高效;
- 当c1列或c2列选择性较差且统计信息不准时,比如整表数据量1000万,按照c2列条件返回800万数据,按照c1列返回100条数据,此时按照c2列条件进行索引扫描+聚集索引查询的操作成本极高(可能是整表扫描的百倍消耗),对100条数据和800万数据求交集的成本也极高,最终导致整条SQL需要消耗大量CPU和IO资源,且相应时间超长,而如果值使用c1列的索引,查询消耗资源少且性能较高。
即使用select * from film_actor where c1 = “xxx” union all select * from film_actor where c2 = "xxx"往往更好。
改写:
SELECT
`member_id`
FROM
`feel_member`
WHERE
`account` = 'leader'
AND `type` = 1
AND `closed` = 0
union all
(SELECT
`member_id`
FROM
`feel_member`
WHERE
`mobile` = 'leader'
AND `type` = 1
AND `closed` = 0 )
union all
SELECT
`member_id`
FROM
`feel_member`
WHERE
`email` = 'leader'
AND `type` = 1
AND `closed` = 0
union all
SELECT
`member_id`
FROM
`feel_member`
WHERE
`name` = 'leader'
AND `type` = 1
AND `closed` = 0
LIMIT
1;
union all 和 union的区别:
显示结果不同,union会自动压缩重复的结果,union all会把所有的结果全部显示出来。union会涉及排序所以性能较union all差一些。
- 单列索引和复合索引的选择
问题SQL:
mysql>explain SELECT
product_subject_rule_id,
property_type,
property_value,
product_subject_id,
submited_num,
created_at,
updated_at,
enabled
FROM
product_subject_rule
WHERE
(
product_subject_id = 14684
AND submited_num = 1
AND enabled = 1
);
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-wzybOl6w-1599375296795)(index_files/62a83599-71ec-424c-8e9e-d4106327cf37.png)]
可以看到执行计划是全表扫描,性能并不好,加索引可以提高性能。但是sql中where过滤条件有三个需要怎样判断加单列的索引还是复合索引比较好呢?单列索引应该加在哪个列上呢?
复合索引相比单列索引会占用更多的字段也相应的会浪费更多的资源。所以在索引选择上需要考虑待加索引的列的对于整个SQL来讲的过滤性如何。
mysql>select count(*) from product_subject_rule
+--------------------+
| count(*) |
+--------------------+
| 73768 |
+--------------------+
返回行数:[1],耗时:18 ms.
mysql>select count(distinct product_subject_id) from product_subject_rule;
+----------------------------------------------+
| count(distinct product_subject_id) |
+----------------------------------------------+
| 14682 |
+----------------------------------------------+
返回行数:[1],耗时:33 ms.
可以看到总数据量为7万行左右,我们的目标列去重后为1万5千行左右。过滤性还是很好的。或者使用group by可以更加直观的看出来过滤性能。
mysql>select product_subject_id,count(*) from product_subject_rule group by product_subject_id order by count(*) desc limit 10;
+------------------------------+--------------------+
| product_subject_id | count(*) |
+------------------------------+--------------------+
| 14634 | 95 |
| 26 | 82 |
| 14630 | 60 |
| 14657 | 45 |
| 14627 | 29 |
| 14650 | 26 |
| 31 | 25 |
| 14658 | 21 |
| 14659 | 20 |
| 27 | 17 |
+------------------------------+--------------------+
返回行数:[50],耗时:44 ms.
现在基本可以确定product_subject_id是索引列的候选列之一,接下来确定一下是否需要复合索引
mysql>select submited_num,count(*) from product_subject_rule group by submited_num order by count(*) desc limit 10;
+------------------------+--------------------+
| submited_num | count(*) |
+------------------------+--------------------+
| 1 | 73212 |
| 2 | 137 |
| 3 | 77 |
| 4 | 62 |
| 5 | 41 |
| 6 | 37 |
| 8 | 24 |
| 7 | 23 |
| 9 | 18 |
| 10 | 16 |
+------------------------+--------------------+
返回行数:[10],耗时:34 ms.
mysql>select enabled,count(*) from product_subject_rule group by enabled order by count(*) desc limit 10;
+-------------------+--------------------+
| enabled | count(*) |
+-------------------+--------------------+
| 1 | 73371 |
| 0 | 397 |
+-------------------+--------------------+
返回行数:[2],耗时:40 ms.
可以看到存在比较严重的数据倾斜,成为索引列其效果也不是很理想
还有一个简单的办法就是分别测试带剩余条件和不带剩余条件的查询返回数据量的对比,如果相差不大那么说明其过滤性并不是很好,也就不需要成为索引列。
如下所示:
mysql>SELECT
count(*)
FROM
product_subject_rule
WHERE
(
product_subject_id = 14684
AND submited_num = 1
AND enabled = 1
);
+--------------------+
| count(*) |
+--------------------+
| 1 |
+--------------------+
返回行数:[1],耗时:28 ms.
mysql>SELECT
count(*)
FROM
product_subject_rule
WHERE
(
product_subject_id = 14684
);
+--------------------+
| count(*) |
+--------------------+
| 1 |
+--------------------+
返回行数:[1],耗时:19 ms.
#前后结果没变化也可以判断出来
最终选择增加单列索引,
alter table `newtest-product`.`product_subject_rule` add index `pro_sub_rul_idx` (`product_subject_id`)