mysql 8.0.x单机、主从、MGR监控语句
监控列表
1.1. 数据库监控
告警项 | 表数据量及表空间碎片 |
告警表达式 | SELECT table_schema,SUM(data_length+index_length)/1024/1024 AS total_mb,SUM(data_length)/1024/1024 AS data_mb,SUM(index_length)/1024/1024 AS index_mb, SUM(data_free)/1024/1024 AS free_mb,COUNT(*) AS tables_num,CURDATE() AS today FROM information_schema.tables where table_schema not in (‘mysql’,‘sys’,‘information_schema’,‘performance_schema’) GROUP BY table_schema ORDER BY total_mb desc\G; |
告警阈值 | |
严重阈值 | |
补充说明 | 异常增长阈值根据业务确定。 非系统数据库大小,空间碎片程度。 记录数据库大小,是否存在异常增长。 查询数据库碎片程度,及时进行释放。 |
告警项 | 当前并发数 |
告警表达式 | SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = ‘Threads_running’; |
告警阈值 | |
严重阈值 | |
补充说明 | 超出当前稳定数量的20%告警,50%严重 当前并发执行stmt/command的数量,一般小于连接数,出现过多情况及时进行处理; Thread_running突然飙高的诱因: 1 客户端连接暴增; 2 系统性能瓶颈,如CPU,IO或者mem swap; 3 异常sql; |
告警项 | 当前连接数 |
告警表达式 | SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = ‘Threads_connected’; |
告警阈值 | |
严重阈值 | |
补充说明 | 超出当前稳定数量的20%告警,50%严重 当前连接数量,连接异常增多时,及时进行处理; |
告警项 | innodb****缓存命中率 |
告警表达式 | SELECT round(V2.VARIABLE_VALUE/(V1.VARIABLE_VALUE+v2.VARIABLE_VALUE) *100,1) FROM ( SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = ‘INNODB_BUFFER_POOL_READS’ ) AS v1, ( SELECT VARIABLE_VALUE FROM performance_schema.global_status WHERE VARIABLE_NAME = ‘INNODB_BUFFER_POOL_READ_REQUESTS’ ) AS v2; |
告警阈值 | 95 |
严重阈值 | 90 |
补充说明 | INNODB_BUFFER_POOL_READS:缓冲池无法满足的请求数 INNODB_BUFFER_POOL_READ_REQUESTS:从内存中读取的请求数,低于99%需增加innodb buffer pool |
告警项 | 当前锁等待 |
告警表达式 | SELECT IFNULL(SUM(TRX_LOCK_MEMORY_BYTES),0) AS lock_memory_bytes_total, IFNULL(SUM(TRX_ROWS_LOCKED),0) AS rows_locked_total, IFNULL(SUM(TRX_ROWS_MODIFIED),0) AS rows_modified_total,IFNULL(SUM(IF(TRX_WAIT_STARTED IS NULL, 0, 1)), 0) AS row_locks_current_waits FROM INFORMATION_SCHEMA.INNODB_TRX order by 4; |
告警阈值 | 10 |
严重阈值 | 20 |
补充说明 | 锁等待 |
告警项 | 所有数据库使用****buffer pool 大小 |
告警表达式 | select sum(allocated/1024) as G from sys.innodb_buffer_stats_by_schema; |
告警阈值 | 85 |
严重阈值 | 90 |
补充说明 | 数据库内存占用 |
告警项 | io****等待使用按用户排序 |
告警表达式 | select event_name,latency from sys.user_summary_by_file_io_type limit 10 order by max_latency; |
告警阈值 | 100ms |
严重阈值 | 500ms |
补充说明 | IO等待 |
告警项 | 等待事件平均等待时间排序 |
告警表达式 | select event_class,avg_latency from sys.wait_classes_global_by_avg_latency; |
告警阈值 | 60s |
严重阈值 | 120s |
补充说明 | 等待事件 |
告警项 | 每秒产生慢查询的数量 |
告警表达式 | show global status like ‘Slow_queries’; |
告警阈值 | 10 |
严重阈值 | 20 |
补充说明 | 慢查询 |
告警项 | 数据状态 |
告警表达式 | mysql_up{} |
告警阈值 | |
严重阈值 | Down |
补充说明 | 可用性 |
1.2. 主从复制监控
告警项 | 主从复制IO线程是否运行 |
告警表达式 | show replica status like ‘Slave_IO_Running’ |
告警阈值 | |
严重阈值 | NO |
补充说明 |
告警项 | 主从复制SQL线程是否运行 |
告警表达式 | show replica status like ‘Slave_SQL_Running’ |
告警阈值 | |
严重阈值 | NO |
补充说明 |
告警项 | 主从延迟 |
告警表达式 | show replica status like ‘Seconds_Behind_Master’ |
告警阈值 | 100 |
严重阈值 | 200 |
补充说明 |
1.3. MGR监控
告警项 | 本节点MGR状态 |
告警表达式 | select MEMBER_STATE from performance_schema.replication_group_members where member_id=@@server_uuid\G; |
告警阈值 | |
严重阈值 | offline |
补充说明 | 可用性 |
告警项 | 本节点SQL线程状态 |
告警表达式 | select SERVICE_STATE from performance_schema.replication_applier_status where CHANNEL_NAME = ‘group_replication_applier’\G |
告警阈值 | |
严重阈值 | offline |
补充说明 | 可用性 |
告警项 | 本节点SQL线程状态 |
告警表达式 | select SERVICE_STATE from performance_schema.replication_connection_status where CHANNEL_NAME =‘group_replication_applier’\G |
告警阈值 | |
严重阈值 | offline |
补充说明 | 可用性 |
告警项 | 本节点IO线程错误编号 |
告警表达式 | select LAST_ERROR_NUMBER from performance_schema.replication_connection_status where CHANNEL_NAME =‘group_replication_applier’\G |
告警阈值 | |
严重阈值 | >0 |
补充说明 |
告警项 | 本节点SQL线程事务重试次数 |
告警表达式 | select COUNT_TRANSACTIONS_RETRIES from performance_schema.replication_applier_status where CHANNEL_NAME = ‘group_replication_applier’\G |
告警阈值 | 3 |
严重阈值 | 5 |
补充说明 |
告警项 | 本节点应用落后事务数量 |
告警表达式 | select COUNT_TRANSACTIONS_IN_QUEUE from performance_schema.replication_group_member_stats where member_id=@@server_uuid\G |
告警阈值 | 10 |
严重阈值 | 20 |
补充说明 |
告警项 | 本节点冲突事务数量 |
告警表达式 | select COUNT_CONFLICTS_DETECTED from performance_schema.replication_group_member_stats where member_id=@@server_uuid\G |
告警阈值 | 5 |
严重阈值 | 10 |
补充说明 |