问题背景

最近客户的一套12c RAC环境 突然频繁报ORA-07445 一般对于7445和600后缀参数无法定位问题,需要进一步查看trc获取线索,对于生产环境有条件的最好提SR确认一下影响

问题调查

定位alert日志中的报错

Thu Jun 13 19:05:12 2024
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x7FFF0072FFA7] [PC:0x1BD286A, kkxmsiag()+1402] [flags: 0x0, count: 1]
Errors in file /app/oracle/rdbms/diag/rdbms/node/node1/trace/node1_m001_38381.trc  (incident=31589):
ORA-07445: exception encountered: core dump [kkxmsiag()+1402] [SIGSEGV] [ADDR:0x7FFF0072FFA7] [PC:0x1BD286A] [Address not mapped to object] []
Incident details in: /app/oracle/rdbms/diag/rdbms/node/node1/incident/incdir_31589/node1_m001_38381_i31589.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jun 13 19:05:14 2024
Dumping diagnostic data in directory=[cdmp_20240613190514], requested by (instance=1, osid=38381 (M001)), summary=[incident=31589].
Thu Jun 13 19:05:15 2024
Sweep [inc][31589]: completed
Sweep [inc2][31589]: completed
Thu Jun 13 19:06:55 2024
Thread 1 cannot allocate new log, sequence 283572
Checkpoint not complete
  Current log# 1 seq# 283571 mem# 0: +DATA/node/ONLINELOG/group_1.2906.1026755307
  Current log# 1 seq# 283571 mem# 1: +DATA/node/ONLINELOG/group_1.3181.1026755307
Thu Jun 13 19:06:58 2024
Thread 1 advanced to log sequence 283572 (LGWR switch)
  Current log# 2 seq# 283572 mem# 0: +DATA/node/ONLINELOG/group_2.1634.1026755323
  Current log# 2 seq# 283572 mem# 1: +DATA/node/ONLINELOG/group_2.2617.1026755323
Thu Jun 13 19:06:58 2024
Archived Log entry 585312 added for thread 1 sequence 283571 ID 0xa399d642 dest 1:
Thu Jun 13 19:07:13 2024
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x7FFF0072FFA7] [PC:0x1BD286A, kkxmsiag()+1402] [flags: 0x0, count: 1]
Errors in file /app/oracle/rdbms/diag/rdbms/node/node1/trace/node1_m001_44416.trc  (incident=30894):
ORA-07445: exception encountered: core dump [kkxmsiag()+1402] [SIGSEGV] [ADDR:0x7FFF0072FFA7] [PC:0x1BD286A] [Address not mapped to object] []
Incident details in: /app/oracle/rdbms/diag/rdbms/node/node1/incident/incdir_30894/node1_m001_44416_i30894.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jun 13 19:07:15 2024
Dumping diagnostic data in directory=[cdmp_20240613190715], requested by (instance=1, osid=44416 (M001)), summary=[incident=30894].
Thu Jun 13 19:07:16 2024
Sweep [inc][30894]: completed
Sweep [inc2][30894]: completed
Thu Jun 13 19:07:43 2024
Thread 1 cannot allocate new log, sequence 283573
Checkpoint not complete

Thu Jun 13 19:08:13 2024
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x7FFF0072FFA7] [PC:0x1BD286A, kkxmsiag()+1402] [flags: 0x0, count: 1]
Errors in file /app/oracle/rdbms/diag/rdbms/node/node1/trace/node1_m001_47612.trc  (incident=31446):
ORA-07445: exception encountered: core dump [kkxmsiag()+1402] [SIGSEGV] [ADDR:0x7FFF0072FFA7] [PC:0x1BD286A] [Address not mapped to object] []
Incident details in: /app/oracle/rdbms/diag/rdbms/node/node1/incident/incdir_31446/node1_m001_47612_i31446.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jun 13 19:08:14 2024
Dumping diagnostic data in directory=[cdmp_20240613190814], requested by (instance=1, osid=47612 (M001)), summary=[incident=31446].
Thu Jun 13 19:08:17 2024
Sweep [inc][31446]: completed
Sweep [inc2][31446]: completed
Thu Jun 13 19:18:46 2024
Thread 1 advanced to log sequence 283574 (LGWR switch)
  Current log# 5 seq# 283574 mem# 0: +DATA/node/ONLINELOG/group_5.2690.1026755165
  Current log# 5 seq# 283574 mem# 1: +DATA/node/ONLINELOG/group_5.1653.1026755165
Thu Jun 13 19:18:46 2024
Archived Log entry 585319 added for thread 1 sequence 283573 ID 0xa399d642 dest 1:
Thu Jun 13 19:19:15 2024
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x7FFF0072FFA7] [PC:0x1BD286A, kkxmsiag()+1402] [flags: 0x0, count: 1]
Errors in file /app/oracle/rdbms/diag/rdbms/node/node1/trace/node1_m001_77095.trc  (incident=29622):
ORA-07445: exception encountered: core dump [kkxmsiag()+1402] [SIGSEGV] [ADDR:0x7FFF0072FFA7] [PC:0x1BD286A] [Address not mapped to object] []
Incident details in: /app/oracle/rdbms/diag/rdbms/node/node1/incident/incdir_29622/node1_m001_77095_i29622.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Thu Jun 13 19:19:17 2024
Dumping diagnostic data in directory=[cdmp_20240613191917], requested by (instance=1, osid=77095 (M001)), summary=[incident=29622].
Thu Jun 13 19:19:19 2024
Sweep [inc][29622]: completed
Sweep [inc2][29622]: completed
Thu Jun 13 19:21:15 2024
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x7FFF0072FFA7] [PC:0x1BD286A, kkxmsiag()+1402] [flags: 0x0, count: 1]
Errors in file /app/oracle/rdbms/diag/rdbms/node/node1/trace/node1_m001_82122.trc  (incident=29623):
ORA-07445: exception encountered: core dump [kkxmsiag()+1402] [SIGSEGV] [ADDR:0x7FFF0072FFA7] [PC:0x1BD286A] [Address not mapped to object] []
Incident details in: /app/oracle/rdbms/diag/rdbms/node/node1/incident/incdir_29623/node1_m001_82122_i29623.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

查看 node1_m001_38381_i31589.trc的相关内容 可以直接定位到一条Current SQL 并且是由SYS.DBMS_AUTO_REPORT_INTERNAL调用

[oracle@NODB1 trace]$ more /app/oracle/rdbms/diag/rdbms/node/node1/incident/incdir_31589/node1_m001_38381_i31589.trc
…………
Starting SQL statement dump
SQL Information
user_id=0 user_name=SYS module=MMON_SLAVE action=Automatic Report Flush
sql_id=0w26sk6t6gq98 plan_hash_value=1388734953 problem_type=4
[TOC00140]
----- Current SQL Statement for this session (sql_id=0w26sk6t6gq98) -----
SELECT XMLTYPE(DBMS_REPORT.GET_REPORT_WITH_SUMMARY(:B1 )) FROM DUAL
[TOC00141]
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
  object      line  object
  handle    number  name
0xa5975250        62  package body SYS.DBMS_AUTO_REPORT_INTERNAL
0xa5975250       683  package body SYS.DBMS_AUTO_REPORT_INTERNAL
0xa5975250       827  package body SYS.DBMS_AUTO_REPORT_INTERNAL
0x313cbc948         1  anonymous block
[TOC00141-END]
[TOC00140-END]
sql_text_length=68
sql=SELECT XMLTYPE(DBMS_REPORT.GET_REPORT_WITH_SUMMARY(:B1 )) FROM DUAL
…………

目前通过初步调查获取的已知信息如下 利用这些信息去MOS上查询是否有相关类型问题


1,数据库版本为12.1.0.2。
2,OS平台为Linux x86_64。
3,相关SQL为
SELECT XMLTYPE(DBMS_REPORT.GET_REPORT_WITH_SUMMARY(:B1 )) FROM DUAL

4,报错信息为
ORA-07445: exception encountered: core dump [kkxmsiag()+1402]

5,ACTION 为 Automatic Report Flush

经过查询,这次的问题与MOS上 这篇文档高度吻合Automatic Report Flush =

ORA-7445[opifcr] or ORA-7445[kkxmsiag] or ORA-7445[evaopn2] or ORA-7445[koxsh2copy] or ORA-7445[plsql_run] Etc. (Doc ID 2309250.1) 在这里插入图片描述

在这里插入图片描述 可以看到导致本次ORA-07445的原因是MMON (Manageability Monitor) Slave 进程执行的自动报告刷新(Automatic Report Flush)可能会导致 ORA-600或ORA-7445错误 是一个12c版本已知的 Bug 23342649 - ORA-600[17099] IN M001 PROCESS DURING AUTOMATIC REPORT FLUSH TASK 并且在19c的版本也可能出现此bug

解决方案

处理方法也很简单 通过设置隐含参数来禁用 Automatic Report Flush

SQL> alter system set "_report_capture_cycle_time"=0; /* Default is 60 seconds */

相关拓展补充

此外该文档还进行了一个补充High CPU Usage and/or Frequent Occurrences of ORA-12850 or ORA-12751 For Monitor Queries by MMON From 12.1 (Doc ID 2102131.1)

官方说明

There is a new feature in 12C called "Automatic Report Capturing Feature". 
As part of this feature, some monitoring SQLs are executed by MMON_SLAVE to identify the resource-intensive SQLs 
and generate the SQL Monitoring report automatically for those SQLs. 
Those SQLs consume little more CPU and it is expected behavior being a new feature. 
Such monitoring queries can be identified from (G)V$SQLSTATS.

 

However, If the CPU consumption is significantly high then it is not an expected behavior 
and could be due to optimizer choosing suboptimal plan for the SQL statements.

This can happen due to Adaptive Optimization, a new feature in 12c.

12C 中有一项新功能,称为“自动报告捕获功能”。作为此功能的一部分,MMON_SLAVE执行一些监控 SQL,以识别资源密集型 SQL, 并自动为这些 SQL 生成 SQL 监控报告。 这些 SQL 消耗的 CPU 很少,预期行为是一项新功能。 可以从 (G)V$SQLSTATS 中识别此类监视查询。

但是,如果 CPU 消耗非常高,则这不是预期行为,可能是由于优化器为 SQL 语句选择了次优计划。

这可能是由于自适应优化(12c 中的一项新功能)而发生的。

如果是因为 12.1 中的新特性,MMON 消耗了更多的 CPU 从两个MMON_SLAVE执行与监视器相关的查询时,CPU 消耗和花费的时间过高,如下所示:

WITH MONITOR_DATA AS (SELECT INST_ID, KEY, NVL2(PX_QCSID, NULL, STATUS)STATUS,
FIRST_REFRESH_TIME, LAST_REFRESH_TIME, REFRESH_COUNT, PROCESS_NAME, SID, SQL_ID,
SQL_EXEC_START, SQL_EXEC_ID, DBOP_NAME, DBOP_EXEC_ID, SQL_PLAN_HASH_VALUE,
SESSION_SERIAL#, SQL_TEXT, IS_FULL_SQLTEXT, PX_SERVER#, PX_SERVER_GROUP, PX_SERVER_SET,
PX_QCINST_ID, PX_QCSID, CASE WHEN ELAPSED_TIME < (CPU_TIME+ APPLICATION_WAIT_TIME+ CONCURRENCY_WAIT_TIME+ CL
...;

当消耗CPU过高时就会出现 ORA-12850 的报错

Thu Sep 08 04:00:41 2016
Errors in file /app/oracle/diag/rdbms/dbname/dbinstance/trace/dbinstance_m002_14490.trc:
ORA-12850: Could not allocate slaves on all specified instances: 3 needed, 2 allocated

当不使用并行性时,可能会出现 ORA-12751。

解决该问题的思路也是通过这条隐含参数来禁用 Automatic Report Capturing Feature

SQL> alter system set "_report_capture_cycle_time"=0; /* Default is 60 seconds */