在一条单表查询语句真正执行之前,MySQL的查询优化器会找出执行该语句所有可能使用的方案,对比之后找出成本最低的方案,这个成本最低的方案就是所谓的执行计划,之后才会调用存储引擎提供的接口真正的执行查询。

准备数据

employees数据库来自MySQL官方示例数据库employees。

mysql> use employees;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> create table t_emp like employees;
Query OK, 0 rows affected (0.03 sec)

mysql> insert into t_emp select * from employees;
Query OK, 300024 rows affected (2.57 sec)
Records: 300024  Duplicates: 0  Warnings: 0

mysql> alter table t_emp add index idx_hire_date(hire_date);
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> alter table t_emp add index idx_birth_date(birth_date);
Query OK, 0 rows affected (0.01 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show index from t_emp;
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name       | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| t_emp |          0 | PRIMARY        |            1 | emp_no      | A         |      299645 |     NULL | NULL   |      | BTREE      |         |               |
| t_emp |          1 | idx_hire_date  |            1 | hire_date   | A         |        5590 |     NULL | NULL   |      | BTREE      |         |               |
| t_emp |          1 | idx_birth_date |            1 | birth_date  | A         |        4770 |     NULL | NULL   |      | BTREE      |         |               |
+-------+------------+----------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

待分析的SQL:

mysql> explain select * from t_emp where hire_date > '1990-11-20' and birth_date > '1840-11-20';
+----+-------------+-------+------------+------+------------------------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys                | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+------+------------------------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | t_emp | NULL       | ALL  | idx_hire_date,idx_birth_date | NULL | NULL    | NULL | 299645 |    25.00 | Using where |
+----+-------------+-------+------------+------+------------------------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

为什么走的是全表扫描,而不是索引idx_hire_date或idx_birth_date呢?

成本的计算过程

  1. 根据搜索条件,找出所有可能使用的索引
  2. 计算全表扫描的代价
  3. 计算使用不同索引执行查询的代价
  4. 对比各种执行方案的代价,找出成本最低的那个

计算全表扫描的代价

全表扫描即将聚簇索引从对应的页面加载到内存,然后检测记录是否满足条件计算。

计算公式:页面数 x 1 + 记录数 x 0.2

页面数和记录数如何获得?查看表的统计信息。

mysql> show table status like 't_emp'\G;
*************************** 1. row ***************************
           Name: t_emp
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 299645
 Avg_row_length: 50
    Data_length: 15220736
Max_data_length: 0
   Index_length: 9469952
      Data_free: 2097152
 Auto_increment: NULL
    Create_time: 2021-08-19 07:19:44
    Update_time: 2021-08-19 07:19:31
     Check_time: NULL
      Collation: latin1_swedish_ci
       Checksum: NULL
 Create_options:
        Comment:
1 row in set (0.00 sec)

从上面可以看到Rows: 299645,记录数为299645,数据的大小为Data_length=15220736
,页面数=数据大小/16/1024=929。

从下面的统计表的数据中也可以直接看出记录数和页面数:

mysql> select * from mysql.innodb_table_stats where table_name='t_emp'\G;
*************************** 1. row ***************************
           database_name: employees
              table_name: t_emp
             last_update: 2021-08-19 07:19:44
                  n_rows: 299645
    clustered_index_size: 929
sum_of_other_index_sizes: 578
1 row in set (0.00 sec)

因此全表扫描的总成本为:929x1.0+299645x0.2=60858。

来看一下MySQL算的是多少:

mysql> explain format=json  select * from t_emp where hire_date > '1990-11-20' and birth_date > '1840-11-20';
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "60858.00"
    },
    "table": {
      "table_name": "t_emp",
      "access_type": "ALL",
      "possible_keys": [
        "idx_hire_date",
        "idx_birth_date"
      ],
      "rows_examined_per_scan": 299645,
      "rows_produced_per_join": 74910,
      "filtered": "25.00",
      "cost_info": {
        "read_cost": "45875.85",
        "eval_cost": "14982.15",
        "prefix_cost": "60858.00",
        "data_read_per_join": "3M"
      },
      "used_columns": [
        "emp_no",
        "birth_date",
        "first_name",
        "last_name",
        "gender",
        "hire_date"
      ],
      "attached_condition": "((`employees`.`t_emp`.`hire_date` > '1990-11-20') and (`employees`.`t_emp`.`birth_date` > '1840-11-20'))"
    }
  }
} |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

计算使用索引idx_hire_date的代价

计算二级索引成本的大步骤为:

  1. 计算搜索二级索引树中满足条件的记录
  2. 回表查询

搜索二级索引树代价计算:

  • IO成本:范围区间占用页面数量
    ,查询优化器粗暴的认为读取索引的一个范围区间的I/O成本和读取一个页面是相同的。

CPU成本:需要回表的记录数。

如何计算idx_hire_date在hire_date > '1990-11-20’这个范围区间中包含多少二级索引记录

  • 当左边界记录和右边界记录相隔较小时
    ,直接遍历这些页面的PAGE HEADER中的PAGE_N_RECS字段(记录该页面有多少条数据)可以获得精确值。
  • 当左边界记录和右边界记录相隔较大时
    ,从左边界所在页面向右读取10个页面,计算平均每个页面中包含多少记录。再乘以左边界和右边界之间的页面数量即可
    ,左边界和右边界之间的页面数量可以从B+树中的父节点获取(若跨越太多页面则需要递归)。

这里我们可以用count(*)计算一下:

mysql> select count(*) from t_emp where hire_date > '1990-11-20';
+----------+
| count(*) |
+----------+
|   112374 |
+----------+
1 row in set (0.05 sec)

搜索二级索引树的成本:1x1.0+112374x0.2=22475.8

回表查询代价计算
:回表查询聚簇索引需加载的页面数量(用于计算IO成本)
,设计MySQL的大叔评估回表操作的I/O成本依旧很豪放,他们认为每次回表操作都相当于访问一个页面
,回表查询定位页面后,需要定位记录并且判断其他过滤条件(用于计算CPU成本)。

回表的成本:112374x1.0+112374x0.2=134848.8

总成本:22475.8+134848.8=157324.6

再来看一下MySQL算的是多少:

mysql> explain format=json  select * from t_emp force index(idx_hire_date) where hire_date > '1990-11-20' and birth_date > '1840-11-20';
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "209751.81"
    },
    "table": {
      "table_name": "t_emp",
      "access_type": "range",
      "possible_keys": [
        "idx_hire_date"
      ],
      "key": "idx_hire_date",
      "used_key_parts": [
        "hire_date"
      ],
      "key_length": "3",
      "rows_examined_per_scan": 149822,
      "rows_produced_per_join": 49935,
      "filtered": "33.33",
      "index_condition": "(`employees`.`t_emp`.`hire_date` > '1990-11-20')",
      "cost_info": {
        "read_cost": "199764.68",
        "eval_cost": "9987.13",
        "prefix_cost": "209751.81",
        "data_read_per_join": "2M"
      },
      "used_columns": [
        "emp_no",
        "birth_date",
        "first_name",
        "last_name",
        "gender",
        "hire_date"
      ],
      "attached_condition": "(`employees`.`t_emp`.`birth_date` > '1840-11-20')"
    }
  }
} |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

为什么和我们算的不一样?因为MySQL使用的是统计数据,计算出来的记录数为149822。

总成本=1 x 1.0 + 149822 x 0.2 + 149822 x 1.0 + 149822 x 0.2 = 209751.8

计算使用索引idx_birth_date的代价

二级索引idx_birth_date的成本的计算方式类似。

记录数:

mysql> select count(*) from t_emp where birth_date > '1840-11-20';
+----------+
| count(*) |
+----------+
|   300024 |
+----------+
1 row in set (0.08 sec)

二级索引扫描成本:1 x 1.0 + 300024 x 0.2=60005.8

回表成本:300024 x 1.0 + 300024 x 0.2=360,028.8

总成本:420034.6

通过对比各种执行方案的代价,找出成本最低的那个为全表扫描

看下MySQL算出来的成本:

mysql> explain format=json  select * from t_emp force index(idx_birth_date) where hire_date > '1990-11-20' and birth_date > '1840-11-20';
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "209751.81"
    },
    "table": {
      "table_name": "t_emp",
      "access_type": "range",
      "possible_keys": [
        "idx_birth_date"
      ],
      "key": "idx_birth_date",
      "used_key_parts": [
        "birth_date"
      ],
      "key_length": "3",
      "rows_examined_per_scan": 149822,
      "rows_produced_per_join": 49935,
      "filtered": "33.33",
      "index_condition": "(`employees`.`t_emp`.`birth_date` > '1840-11-20')",
      "cost_info": {
        "read_cost": "199764.68",
        "eval_cost": "9987.13",
        "prefix_cost": "209751.81",
        "data_read_per_join": "2M"
      },
      "used_columns": [
        "emp_no",
        "birth_date",
        "first_name",
        "last_name",
        "gender",
        "hire_date"
      ],
      "attached_condition": "(`employees`.`t_emp`.`hire_date` > '1990-11-20')"
    }
  }
} |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.00 sec)

这个索引可以看出MySQL的统计数据很不准确。

optimizer_trace

上面只能看到MySQL选择的方案的成本,要想看到MySQL对SQL的执行过程可以使用optimizer_trace来查看。

参数介绍:

  • QUERY:跟踪语句的文本。
  • TRACE:跟踪,JSON格式。
  • MISSING_BYTES_BEYOND_MAX_MEM_SIZE:每个记住的跟踪都是一个字符串,随着优化的进行扩展并将其附加数据。该optimizer_trace_max_mem_size 变量设置所有当前记忆的跟踪所使用的内存总量的限制。如果达到此限制,则当前跟踪不会扩展(因此是不完整的),并且该MISSING_BYTES_BEYOND_MAX_MEM_SIZE列显示该跟踪丢失的字节数。
  • INSUFFICIENT_PRIVILEGES:如果跟踪的查询使用SQL SECURITY值为的视图或存储的例程DEFINER,则可能是拒绝了除定义者之外的其他用户查看查询的跟踪。在这种情况下,跟踪显示为空,INSUFFICIENT_PRIVILEGES值为1。否则值为0。

我们可以对SQL进行optimizer_trace :

-- optimizer_trace默认不开启,开启会影响性能,用完记得关闭
mysql> show variables like '%optimizer_trace%';
+------------------------------+----------------------------------------------------------------------------+
| Variable_name                | Value                                                                      |
+------------------------------+----------------------------------------------------------------------------+
| optimizer_trace              | enabled=off,one_line=off                                                   |
| optimizer_trace_features     | greedy_search=on,range_optimizer=on,dynamic_range=on,repeated_subselect=on |
| optimizer_trace_limit        | 1                                                                          |
| optimizer_trace_max_mem_size | 16384                                                                      |
| optimizer_trace_offset       | -1                                                                         |
+------------------------------+----------------------------------------------------------------------------+
5 rows in set (0.01 sec)

-- 设置开启optimizer_trace
mysql> set optimizer_trace="enabled=on";
Query OK, 0 rows affected (0.00 sec)

mysql> explain select * from t_emp where hire_date > '1990-11-20' and birth_date > '1840-11-20';
+----+-------------+-------+------------+------+------------------------------+------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys                | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+------+------------------------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | t_emp | NULL       | ALL  | idx_hire_date,idx_birth_date | NULL | NULL    | NULL | 299468 |    25.00 | Using where |
+----+-------------+-------+------------+------+------------------------------+------+---------+------+--------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

mysql> select * from INFORMATION_SCHEMA.OPTIMIZER_TRACE\G;
*************************** 1. row ***************************
                            QUERY: explain select * from t_emp where hire_date > '1990-11-20' and birth_date > '1840-11-20'
                            TRACE: {
  "steps": [
    {
      "join_preparation": {
        "select#": 1,
        "steps": [
          {
            "expanded_query": "/* select#1 */ select `t_emp`.`emp_no` AS `emp_no`,`t_emp`.`birth_date` AS `birth_date`,`t_emp`.`first_name` AS `first_name`,`t_emp`.`last_name` AS `last_name`,`t_emp`.`gender` AS `gender`,`t_emp`.`hire_date` AS `hire_date` from `t_emp` where ((`t_emp`.`hire_date` > '1990-11-20') and (`t_emp`.`birth_date` > '1840-11-20'))"
          }
        ]
      }
    },
    {
      "join_optimization": {
        "select#": 1,
        "steps": [
          {
            "condition_processing": {
              "condition": "WHERE",
              "original_condition": "((`t_emp`.`hire_date` > '1990-11-20') and (`t_emp`.`birth_date` > '1840-11-20'))",
              "steps": [
                {
                  "transformation": "equality_propagation",
                  "resulting_condition": "((`t_emp`.`hire_date` > '1990-11-20') and (`t_emp`.`birth_date` > '1840-11-20'))"
                },
                {
                  "transformation": "constant_propagation",
                  "resulting_condition": "((`t_emp`.`hire_date` > '1990-11-20') and (`t_emp`.`birth_date` > '1840-11-20'))"
                },
                {
                  "transformation": "trivial_condition_removal",
                  "resulting_condition": "((`t_emp`.`hire_date` > '1990-11-20') and (`t_emp`.`birth_date` > '1840-11-20'))"
                }
              ]
            }
          },
          {
            "substitute_generated_columns": {
            }
          },
          {
            "table_dependencies": [
              {
                "table": "`t_emp`",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ]
              }
            ]
          },
          {
            "ref_optimizer_key_uses": [
            ]
          },
          {
            "rows_estimation": [
              {
                "table": "`t_emp`",
                "range_analysis": {
                  "table_scan": {
                    "rows": 299645,
                    "cost": 60860
                  },
                  "potential_range_indexes": [
                    {
                      "index": "PRIMARY",
                      "usable": false,
                      "cause": "not_applicable"
                    },
                    {
                      "index": "idx_hire_date",
                      "usable": true,
                      "key_parts": [
                        "hire_date",
                        "emp_no"
                      ]
                    },
                    {
                      "index": "idx_birth_date",
                      "usable": true,
                      "key_parts": [
                        "birth_date",
                        "emp_no"
                      ]
                    }
                  ],
                  "setup_range_conditions": [
                  ],
                  "group_index_range": {
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                  },
                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "idx_hire_date",
                        "ranges": [
                          "0x748d0f < hire_date"
                        ],
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 149822,
                        "cost": 179787,
                        "chosen": false,
                        "cause": "cost"
                      },
                      {
                        "index": "idx_birth_date",
                        "ranges": [
                          "0x74610e < birth_date"
                        ],
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 149822,
                        "cost": 179787,
                        "chosen": false,
                        "cause": "cost"
                      }
                    ],
                    "analyzing_roworder_intersect": {
                      "usable": false,
                      "cause": "too_few_roworder_scans"
                    }
                  }
                }
              }
            ]
          },
          {
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ],
                "table": "`t_emp`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 299645,
                      "access_type": "scan",
                      "resulting_rows": 74911,
                      "cost": 60858,
                      "chosen": true
                    }
                  ]
                },
                "condition_filtering_pct": 100,
                "rows_for_plan": 74911,
                "cost_for_plan": 60858,
                "chosen": true
              }
            ]
          },
          {
            "attaching_conditions_to_tables": {
              "original_condition": "((`t_emp`.`hire_date` > '1990-11-20') and (`t_emp`.`birth_date` > '1840-11-20'))",
              "attached_conditions_computation": [
              ],
              "attached_conditions_summary": [
                {
                  "table": "`t_emp`",
                  "attached": "((`t_emp`.`hire_date` > '1990-11-20') and (`t_emp`.`birth_date` > '1840-11-20'))"
                }
              ]
            }
          },
          {
            "refine_plan": [
              {
                "table": "`t_emp`"
              }
            ]
          }
        ]
      }
    },
    {
      "join_explain": {
        "select#": 1,
        "steps": [
        ]
      }
    }
  ]
}
MISSING_BYTES_BEYOND_MAX_MEM_SIZE: 0
          INSUFFICIENT_PRIVILEGES: 0
1 row in set (0.00 sec)

细心的你肯定会发现通过optimizer_trace打印的idx_hire_date成本与explain打印的不一致,差别在于回表的CPU成本。

explain:总成本=1 x 1.0 + 149822 x 0.2 + 149822 x 1.0 + 149822 x 0.2 = 209751.8

optimizer_trace:总成本=1 x 1.0 + 149822 x 0.2 + 149822 x 1.0 = 179787.4