近期在支持的几套AIX+12.1.0.2RAC环境接连发现在GRID安装目录出现有大量审计、单个trace文件很大的情况,从审计及trace文件内容基本可以对应到是12.1.0.2版本的相关bug; 大量的audit文件(我遇到的最多一个有50多万个,另三套库在15-20万个之间),oraagent trace进程的trace文件达到30多GB等,这些问题稍有不注意可能带来GRID安装目录爆满或者所在目录的inode用光引起无法创建文件等,进而引起集群的异常;或者因目录中文件多导致向这些目录中写入日志的一些进程出现内存泄漏等(例如:ORACLE官方BLOG上:一个由于log/trace 目录文件太多导致的内存泄露。https://blogs.oracle.com/database4cn/%E4%B8%80%E4%B8%AA%E7%94%B1%E4%BA%8Elogtrace-%E7%9B%AE%E5%BD%95%E6%96%87%E4%BB%B6%E5%A4%AA%E5%A4%9A%E5%AF%BC%E8%87%B4%E7%9A%84%E5%86%85%E5%AD%98%E6%B3%84%E9%9C%B2%E3%80%82 ) 

同时大量的trace/audit等文件,导致直接使用rm删除也报错Argument list too long,要慢慢删除,浪费大量时间,还是定期删除的好~~ 当然这些问题未发现会直接导致集群或数据库出现异常,算是一个潜在风险点。 

目前已知的12.1.0.2版本集群环境,会产生大量audit/trace的bug有如下: 

1.oraagent trace文件非常大,达到几十G 参考文档:ora.asm resource related gimh messages flood the agent trace (文档 ID 2256687.1),内容主要类似:

我的:
*** TRACE SEGMENT RENAMED TO /u01/app/12.1.0.2/grid_base/diag/crs/XXXXXdb1/crs/trace/crsd_oraagent_oracle.trc_1.trc ***

Trace file /u01/app/12.1.0.2/grid_base/diag/crs/XXXXXdb1/crs/trace/crsd_oraagent_oracle.trc
Oracle Database 12c Clusterware Release 12.1.0.2.0 - Production Copyright 1996, 2014 Oracle. All rights reserved.
2017-06-26 14:27:30.459123 : USRTHRD:3655: {0:19:2} Gimh::destructor gimh_dest_query_ctx rc=0
2017-06-26 14:27:24.766877 :CLSDYNAM:3655: [ora.XXXXXdb.db]{0:19:2} [check] Gimh::check OH /u01/app/oracle/product/12.1.0.2/db_1 SID XXXXXdb1
MOS文档上的:
2017-03-13 16:37:21.499773 :CLSDYNAM:1543: [ ora.asm]{0:9:3} [check] Gimh::check OH /ee/oracle/crshome SID +ASM2
2017-03-13 16:37:21.499989 : USRTHRD:1543: {0:9:3} Gimh::destructor gimh_dest_query_ctx rc=0
2017-03-13 16:37:21.500062 : USRTHRD:1543: {0:9:3} Gimh::destructor gimh_dest_inst_ctx rc=0

2.ASM LMHB进程的trace文件非常大,我遇到的在10G左右 参考文档:Huge ASM LMHB trace: kjgcr_ServiceGCR: KJGCR_METRICS: Local metric check number of RT processes (文档 ID 2137683.1) MOS文档上的trace内容:

*** 2016-05-13 10:59:08.112
kjgcr_ServiceGCR: KJGCR_METRICS: Local metric check number of RT processes, id 16 failed
*** 2016-05-13 10:59:08.112
kjgcr_DoPreAction: KJGCR_PREACTION? - id 12

3.audit文件很多,有ASM实例和数据库实例都非常多的情况 ASM实例的audit多,应该是对应MOS文档: Bug 16539062 - ASM agent connects to ASM instance every seconds in Oracle Restart configuration. (文档 ID 16539062.8) Many audit files are generated on Oracle Restart after upgrading to 12c (文档 ID 2137020.1 我的ASM的审计trace如下,DB的审计没记录:

$ ls |wc -l
547173
$ pwd
/u01/app/12.1.0/grid/rdbms/audit
$ cat +ASM2_ora_18940050_20160430002737726773143795.aud
Audit file /u01/app/12.1.0/grid/rdbms/audit/+ASM2_ora_18940050_20160430002737726773143795.aud
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
ORACLE_HOME = /u01/app/12.1.0/grid
System name: AIX
Node name: AAAAdb2
Release: 1
Version: 6
Machine: 00FA0FE04C00
Instance name: +ASM2
Redo thread mounted by this instance: 0 <none>
Oracle process number: 34
Unix process pid: 18940050, image: (TNS V1-V3)

Sat Apr 30 00:27:37 2016 +08:00
LENGTH : '142'
ACTION :[7] 'CONNECT'
DATABASE USER:[1] '/'
PRIVILEGE :[6] 'SYSDBA'
CLIENT USER:[4] 'grid'
CLIENT TERMINAL:[0] ''
STATUS:[1] '0'
DBID:[0] ''

Sat Apr 30 00:27:37 2016 +08:00
LENGTH : '212'
ACTION :[76] 'select g.state into :STATE from v$asm_diskgroup_stat g where g.name = :NAME'
DATABASE USER:[1] '/'
PRIVILEGE :[6] 'SYSDBA'
CLIENT USER:[4] 'grid'
CLIENT TERMINAL:[0] ''
STATUS:[1] '0'
DBID:[0] ''