1.背景

内存使用情况,决定着MySQL的性能,内存使用率过高会使系统响应时间变长,严重时内存耗尽还会出现OOM的情况。

2.资源检查

在操作系统层面,可以通过free命令查看系统内存资源使用情况,通过top -c命令查看进程使用内存占用情况。
root:~# free -mt

total        used        free      shared  buff/cache   available
Mem:          16046       14928         201          13         917         753
Swap:             0           0           0
Total:        16046       14928         201

free命令显示内存占用情况:
总内存: 16046MB ≈ 16GB
已用内存: 14928MB ≈ 14.9GB
可用内存: free + buff/cache = 1118MB ≈ 1.1GB
即可用内存比例: 1.1GB / 16GB = 6.8%,通常我们系统监控内存低于10%就会告警。

查看系统进程,检查内存占用情况。

top - 16:11:01 up 71 days,  7:23,  1 user,  load average: 0.09, 0.07, 0.20
Tasks: 188 total,   1 running, 187 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.1 us,  0.6 sy,  0.0 ni, 97.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 16431688 total,   208716 free, 15291936 used,   931036 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   766408 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                      
  6626 mysql     20   0 16.064g 0.014t   6436 S  12.0 91.0  66694:13 /usr/local/mysql/bin/m

3、共享内存

mysql> select VARIABLE_NAME, VARIABLE_VALUE, concat(VARIABLE_VALUE/1024/1024,' MB') AS VARIABLE_VALUE_MB from information_schema.SESSION_VARIABLES where variable_name in ('innodb_buffer_pool_size','innodb_log_buffer_size','innodb_additional_mem_pool_size','key_buffer_size','query_cache_size');
+-------------------------+----------------+-------------------+
| VARIABLE_NAME           | VARIABLE_VALUE | VARIABLE_VALUE_MB |
+-------------------------+----------------+-------------------+
| KEY_BUFFER_SIZE         | 33554432       | 32 MB             |
| QUERY_CACHE_SIZE        | 1048576        | 1 MB              |
| INNODB_LOG_BUFFER_SIZE  | 67108864       | 64 MB             |
| INNODB_BUFFER_POOL_SIZE | 10737418240    | 10240 MB          |
+-------------------------+----------------+-------------------+
4 rows in set, 1 warning (0.00 sec)

检查了Innodb buffer的内存参数设置值10240MB = 10G, 占总内存 10G/16GB = 62.5%,该值设置在合理的范围内,详细的参数介绍可以参考官方文档,当系统内存严重不足时, 快速恢复可以降低共享内存,调整该参数后,内存会立马释放:

mysql> set global innodb_buffer_pool_size=8589934592;
Query OK, 0 rows affected (0.00 sec)

mysql> select VARIABLE_NAME, VARIABLE_VALUE, concat(VARIABLE_VALUE/1024/1024,' MB') AS VARIABLE_VALUE_MB from information_schema.SESSION_VARIABLES where variable_name in ('innodb_buffer_pool_size','innodb_log_buffer_size','innodb_additional_mem_pool_size','key_buffer_size','query_cache_size');
+-------------------------+----------------+-------------------+
| VARIABLE_NAME           | VARIABLE_VALUE | VARIABLE_VALUE_MB |
+-------------------------+----------------+-------------------+
| KEY_BUFFER_SIZE         | 33554432       | 32 MB             |
| QUERY_CACHE_SIZE        | 1048576        | 1 MB              |
| INNODB_LOG_BUFFER_SIZE  | 67108864       | 64 MB             |
| INNODB_BUFFER_POOL_SIZE | 8589934592     | 8192 MB           |
+-------------------------+----------------+-------------------+
4 rows in set, 1 warning (0.00 sec)

mysql>  system free -mt
              total        used        free      shared  buff/cache   available
Mem:          16046       13020        2081          13         944        2666
Swap:             0           0           0
Total:        16046       13020        2081

这里我们为了快速释放系统内存,调整了INNODB_BUFFER_POOL_SIZE值后,可用内存恢复2G。

Session私有内存

共享内存中介绍的内存空间是实例创建时即分配的内存空间,并且是所有连接共享的。而出现 OOM 异常的实例通常都是由于下面各个连接私有的内存造成的。

mysql>  select VARIABLE_NAME, VARIABLE_VALUE, concat(VARIABLE_VALUE/1024/1024,' MB') AS VARIABLE_VALUE_MB from information_schema.SESSION_VARIABLES where variable_name in ('read_buffer_size','read_rnd_buffer_size','sort_buffer_size','join_buffer_size','binlog_cache_size','tmp_table_size');
+----------------------+----------------+-------------------+
| VARIABLE_NAME        | VARIABLE_VALUE | VARIABLE_VALUE_MB |
+----------------------+----------------+-------------------+
| SORT_BUFFER_SIZE     | 33554432       | 32 MB             |
| READ_RND_BUFFER_SIZE | 33554432       | 32 MB             |
| READ_BUFFER_SIZE     | 16777216       | 16 MB             |
| BINLOG_CACHE_SIZE    | 32768          | 0.03125 MB        |
| TMP_TABLE_SIZE       | 67108864       | 64 MB             |
| JOIN_BUFFER_SIZE     | 134217728      | 128 MB            |
+----------------------+----------------+-------------------+
6 rows in set, 1 warning (0.00 sec)

这里的私有内存JOIN_BUFFER_SIZE=128MB, 默认值是256KB。用于普通索引扫描、范围索引扫描和不使用索引而执行全表扫描的联接的缓冲区的最小大小。通常获得快速连接的最佳方法是添加索引。在无法添加索引时,增加join_buffer_size的值,以获得更快的完全连接。为两个表之间的每个完整连接分配一个连接缓冲区。对于没有使用索引的几个表之间的复杂联接,可能需要多个联接缓冲区。

除非使用块嵌套循环或批处理键访问算法,否则设置大于保存每个匹配行所需的缓冲区不会有任何好处,并且所有连接至少分配最小的大小,因此在全局将该变量设置为大值时要小心。最好保持全局设置较小,只在执行大型连接的会话中将会话设置更改为较大的值。如果全局大小大于使用它的大多数查询所需要的大小,那么内存分配时间可能会导致显著的性能下降。

通过检查私有内存,我们发现这是的JOIN_BUFFER_SIZE全局设置较大。

###4、打开内存监控方式
MySQL5.7版本通过performance_schema可以方便的查看内存占用情况, 前提是要打开监控,执行如下SQL语句,打开内存监控。

mysql> update performance_schema.setup_instruments set enabled = 'yes' where name like 'memory%';
mysql> select * from performance_schema.setup_instruments where name like 'memory%innodb%' limit 20;
+-------------------------------------------+---------+-------+
| NAME                                      | ENABLED | TIMED |
+-------------------------------------------+---------+-------+
| memory/innodb/adaptive hash index         | YES     | NO    |
| memory/innodb/buf_buf_pool                | YES     | NO    |
| memory/innodb/dict_stats_bg_recalc_pool_t | YES     | NO    |
| memory/innodb/dict_stats_index_map_t      | YES     | NO    |
| memory/innodb/dict_stats_n_diff_on_level  | YES     | NO    |
| memory/innodb/other                       | YES     | NO    |
| memory/innodb/row_log_buf                 | YES     | NO    |
| memory/innodb/row_merge_sort              | YES     | NO    |
| memory/innodb/std                         | YES     | NO    |
| memory/innodb/trx_sys_t::rw_trx_ids       | YES     | NO    |
| memory/innodb/partitioning                | YES     | NO    |
| memory/innodb/api0api                     | YES     | NO    |
| memory/innodb/btr0btr                     | YES     | NO    |
| memory/innodb/btr0bulk                    | YES     | NO    |
| memory/innodb/btr0cur                     | YES     | NO    |
| memory/innodb/btr0pcur                    | YES     | NO    |
| memory/innodb/btr0sea                     | YES     | NO    |
| memory/innodb/buf0buf                     | YES     | NO    |
| memory/innodb/buf0dblwr                   | YES     | NO    |
| memory/innodb/buf0dump                    | YES     | NO    |
+-------------------------------------------+---------+-------+
20 rows in set (0.00 sec)

该命令是在线打开内存统计,所以只会统计打开后新增的内存对象,打开前的内存对象不会统计,建议您打开后等待一段时间再执行后续步骤,便于找出内存使用高的线程。

5、内存相关表

这里我们查看performance_schema相关的内存监控表有哪些,分别可以统计哪些信息。

mysql> show tables like '%memory%';
+-----------------------------------------+
| Tables_in_performance_schema (%memory%) |
+-----------------------------------------+
| memory_summary_by_account_by_event_name |
| memory_summary_by_host_by_event_name    |
| memory_summary_by_thread_by_event_name  |
| memory_summary_by_user_by_event_name    |
| memory_summary_global_by_event_name     |
+-----------------------------------------+
5 rows in set (0.00 sec)

以上表的监控统计,分别统计线程、帐户、用户、主机间接执行内存操作等信息,每个内存汇总表都有一个或多个分组列,用于指示表如何聚合事件,参考如下表介绍:

(一)统计事件消耗内存
mysql> select event_name, SUM_NUMBER_OF_BYTES_ALLOC  from performance_schema.memory_summary_global_by_event_name order by SUM_NUMBER_OF_BYTES_ALLOC desc LIMIT 10;
+---------------------------------------+---------------------------+
| event_name                            | SUM_NUMBER_OF_BYTES_ALLOC |
+---------------------------------------+---------------------------+
| memory/sql/JOIN_CACHE                 |           966665202302976 |
| memory/memory/HP_PTRS                 |           304457132043176 |
| memory/innodb/mem0mem                 |            29273314618616 |
| memory/sql/thd::main_mem_root         |            18376092762472 |
| memory/sql/Filesort_buffer::sort_keys |             3155343016712 |
| memory/sql/String::value              |             2708659513792 |
| memory/sql/test_quick_select          |             2146347475648 |
| memory/sql/QUICK_RANGE_SELECT::alloc  |             1961041015680 |
| memory/mysys/IO_CACHE                 |             1463097599496 |
| memory/sql/TABLE                      |              635194922117 |
+---------------------------------------+---------------------------+
10 rows in set (0.01 sec)

注意到 “memory/sql/JOIN_CACHE” 消耗的内存最大。

(二)统计线程消耗内存
mysql> select thread_id, event_name, SUM_NUMBER_OF_BYTES_ALLOC from performance_schema.memory_summary_by_thread_by_event_name order by SUM_NUMBER_OF_BYTES_ALLOC desc limit 10; 
+-----------+-----------------------+---------------------------+
| thread_id | event_name            | SUM_NUMBER_OF_BYTES_ALLOC |
+-----------+-----------------------+---------------------------+
|   1922626 | memory/innodb/mem0mem |             1984233347055 |
|   1922439 | memory/innodb/mem0mem |             1404615671548 |
|   1954681 | memory/innodb/mem0mem |             1375641196768 |
|   1922431 | memory/innodb/mem0mem |             1350354644688 |
|   1954682 | memory/innodb/mem0mem |             1099479913383 |
|   1922625 | memory/innodb/mem0mem |             1097551130366 |
|   2686170 | memory/innodb/mem0mem |              992829979036 |
|   1922433 | memory/innodb/mem0mem |              874412348141 |
|   1922438 | memory/innodb/mem0mem |              863348539942 |
|   1922432 | memory/innodb/mem0mem |              754779357792 |
+-----------+-----------------------+---------------------------+
10 rows in set (0.02 sec)

上面统计结果发现到 “memory/innodb/mem0mem” 事件消耗的内存最多。

(三)统计账户消耗内存
mysql> select USER, HOST, EVENT_NAME, SUM_NUMBER_OF_BYTES_ALLOC from performance_schema.memory_summary_by_account_by_event_name order by SUM_NUMBER_OF_BYTES_ALLOC desc limit 10; 
+-------------------+---------------+-----------------------+---------------------------+
| USER              | HOST          | EVENT_NAME            | SUM_NUMBER_OF_BYTES_ALLOC |
+-------------------+---------------+-----------------------+---------------------------+
| c**********       | 192.168.*.*   | memory/sql/JOIN_CACHE |           638622579556352 |
| c**********       | 192.168.*.*   | memory/sql/JOIN_CACHE |           276949456912384 |
| b**********       | 192.168.*.*   | memory/memory/HP_PTRS |           166067384571544 |
| b**********       | 192.168.*.*   | memory/memory/HP_PTRS |            76145767762936 |
| b**********       | 192.168.*.*   | memory/sql/JOIN_CACHE |            42176612401152 |
| z**********       | 192.168.*.*   | memory/memory/HP_PTRS |            30219030003816 |
| s**********       | 192.168.*.*   | memory/innodb/mem0mem |            16114310343537 |
| b**********       | 192.168.*.*   | memory/sql/JOIN_CACHE |             9070736048128 |
| c**********       | 192.168.*.*   | memory/innodb/mem0mem |             4787044880366 |
| a**********       | 192.168.*.*   | memory/memory/HP_PTRS |             4764584763968 |
+-------------------+---------------+-----------------------+---------------------------+
10 rows in set (0.16 sec)

这里把相关敏感信息脱敏了,从上面发现用户 c* 占用内存最大。 事件还是memory/sql/JOIN_CACHE

(四)统计主机消耗内存
mysql > select  HOST, EVENT_NAME, SUM_NUMBER_OF_BYTES_ALLOC from performance_schema.memory_summary_by_host_by_event_name order by SUM_NUMBER_OF_BYTES_ALLOC desc limit 10; 

+---------------+-------------------------------+---------------------------+
| HOST          | EVENT_NAME                    | SUM_NUMBER_OF_BYTES_ALLOC |
+---------------+-------------------------------+---------------------------+
| 192.168.*.*   | memory/sql/JOIN_CACHE         |           681148560703488 |
| 192.168.*.*   | memory/sql/JOIN_CACHE         |           286124345917440 |
| 192.168.*.*   | memory/memory/HP_PTRS         |           166777472915704 |
| 192.168.*.*   | memory/memory/HP_PTRS         |            76851688652536 |
| 192.168.*.*   | memory/memory/HP_PTRS         |            30227900416896 |
| 192.168.*.*   | memory/innodb/mem0mem         |            16117796446781 |
| 192.168.*.*   | memory/memory/HP_PTRS         |             7457591548256 |
| 192.168.*.*   | memory/innodb/mem0mem         |             6596914688112 |
| 192.168.*.*   | memory/sql/thd::main_mem_root |             5979929501808 |
| 192.168.*.*   | memory/sql/thd::main_mem_root |             4795924383312 |
+---------------+-------------------------------+---------------------------+
10 rows in set (0.07 sec)

通过上面的主机,也能快速定位是哪台主机占用内存大,有必要时可以重启该主机的应用。

(五)统计用户消耗内存
mysql> select  USER, EVENT_NAME, SUM_NUMBER_OF_BYTES_ALLOC from performance_schema.memory_summary_by_user_by_event_name order by SUM_NUMBER_OF_BYTES_ALLOC desc limit 10; 
+-------------------+-------------------------------+---------------------------+
| USER              | EVENT_NAME                    | SUM_NUMBER_OF_BYTES_ALLOC |
+-------------------+-------------------------------+---------------------------+
| c**************** | memory/sql/JOIN_CACHE         |           915572036468736 |
| b**************** | memory/memory/HP_PTRS         |           242213179169888 |
| b**************** | memory/sql/JOIN_CACHE         |            51247348449280 |
| z**************** | memory/memory/HP_PTRS         |            30225123465960 |
| b**************** | memory/memory/HP_PTRS         |            21863256050496 |
| s**************** | memory/innodb/mem0mem         |            16116655699153 |
| e**************** | memory/sql/thd::main_mem_root |             9554995144176 |
| c**************** | memory/innodb/mem0mem         |             6360931505366 |
| a**************** | memory/memory/HP_PTRS         |             4765617785408 |
| s**************** | memory/sql/thd::main_mem_root |             3074066885016 |
+-------------------+-------------------------------+---------------------------+
10 rows in set (0.06 sec)

6、解决方式:

找到问题事件或线程后,您可以排查业务代码和环境,解决内存高的问题。上面统计结果发现到 “memory/sql/JOIN_CACHE” 事件消耗的内存最大。调整全局JOIN_BUFFER_SIZE=32MB,再观察内存占用情况。

mysql> set global join_buffer_size=33554432;
Query OK, 0 rows affected (0.00 sec)
mysql> quit

mysql> select VARIABLE_NAME, VARIABLE_VALUE, concat(VARIABLE_VALUE/1024/1024,' MB') AS VARIABLE_VALUE_MB from information_schema.SESSION_VARIABLES where variable_name in ('read_buffer_size','read_rnd_buffer_size','sort_buffer_size','join_buffer_size','binlog_cache_size','tmp_table_size');
+----------------------+----------------+-------------------+
| VARIABLE_NAME        | VARIABLE_VALUE | VARIABLE_VALUE_MB |
+----------------------+----------------+-------------------+
| SORT_BUFFER_SIZE     | 33554432       | 32 MB             |
| READ_RND_BUFFER_SIZE | 33554432       | 32 MB             |
| READ_BUFFER_SIZE     | 16777216       | 16 MB             |
| BINLOG_CACHE_SIZE    | 32768          | 0.03125 MB        |
| TMP_TABLE_SIZE       | 67108864       | 64 MB             |
| JOIN_BUFFER_SIZE     | 33554432       | 32 MB             |
+----------------------+----------------+-------------------+
6 rows in set, 1 warning (0.00 sec)

mysql> select * from information_schema.processlist where COMMAND='sleep' and Time>1000 order by Time desc;

这里会话参数调整后,需同时调整/etc/my.cnf的配置,下次服务启动永久生效,另外之前连接的会话线程由于已分配了该buffer大小,调整后内存并不会马上释放。