RAC ASM instances crash with ORA-29702 when multiple ASM instances start(ed) [ID 733262.1]


 

修改时间 21-JAN-2011     类型 PROBLEM     状态 PUBLISHED

 


In this Document
  Symptoms   Cause   Solution


Applies to:

Oracle Server - Enterprise Edition - Version: 10.1.0.2 to 11.1.0.7 - Release: 10.1 to 11.1


Information in this document applies to any platform.


***Checked for relevance on 21-Jan-2011***

Symptoms

On a new RAC installation, 1) cannot startup all 3 ASM instances or 2) if these are started, they


(or some ASM instances) do crash.



1) If attempting to restart an ASM instance:



    From srvctl, it gives


$ ./srvctl start asm -n wlrzd4447
 PRKS-1009 : Failed to start ASM instance "+ASM2" on node "wlrzd4447", [PRKS-
 1009 : Failed to start ASM instance "+ASM2" on node "wlrzd4447", [CRS-0215:
 Could not start resource 'ora.wlrzd4447.ASM2.asm'.]]
 [PRKS-1009 : Failed to start ASM instance "+ASM2" on node "wlrzd4447", [CRS-
 0215: Could not start resource 'ora.wlrzd4447.ASM2.asm'.]]


    From sqlplus, it gives:



SQL> startup
 ORA-03113: end-of-file on communication channel

2) The alert log for any of the ASM instance that crash, shows:

ORA-29702: error occurred in Cluster Group Service operation
 LMON: terminating instance due to error 29702

 

 .



Cause

Using not bonded, multiple NICs for the cluster interconnect.

From any of the ASM instance's alert logs, watch for multiple entries like 'Interface type 1 ...
configured from OCR for use as a cluster interconnect':

Mon Aug 18 22:57:28 2008
 Starting ORACLE instance (normal)
 LICENSE_MAX_SESSION = 0
 LICENSE_SESSIONS_WARNING = 0
Interface type 1 eth2 xxx.xxx.x.x configured from OCR for use as a cluster interconnect
 Interface type 1 berth34 xxx.xxx.xxx.x configured from OCR for use as a cluster interconnectInterface type 1 eth0 xx.xx.xx.x configured from OCR for use as a public interface
 Picked latch-free SCN scheme 3
 Using LOG_ARCHIVE_DEST_1 parameter default value as /opt/oracle/product/asm/10.2.0/dbs/arch
 Autotune of undo retention is turned off.
 LICENSE_MAX_USERS = 0
 SYS auditing is disabled
 ksdpec: called for event 13740 prior to event group initialization
 Starting up ORACLE RDBMS Version: 10.2.0.4.0.


Note: IP addresses have been replaced by 'x' characters for privacy.


This is causing peer-to-peer failure between the clustered ASM instances, since the two NICs used
for the cluster interconnect are not bonded.

.

Solution

In this case it was not the intent to use more than one NIC for the Oracle Cluster interconnect, so steps for properly bonding NICs to be used for cluster interconnect will not be discussed.

To implement the solution, please execute the following steps:

1)  Follow the following article to first query all devices configured for the interconnect
     and then to delete the unnecessary device:

     Note 283684.1 (How to Change Interconnect/Public Interface IP Subnet in a 10g Cluster)

     Example:


   1a) Deleted the bert34 interface
          $ cd /opt/oracle/product/crs/10.2.0/bin
          $ ./oifcfg getif
               eth0 xx.xx.xx.x global public
               eth2 xxx.xxx.x.x global cluster_interconnect
               bert34 xxx.xxx.xxx.x global cluster_interconnect
          $ ./oifcfg delif -global berth34
          $ ./oifcfg getif
               eth0 xx.xx.xx.x global public
               eth2 xxx.xxx.x.x global cluster_interconnect



      Note: IP addresses have been replaced by 'x' characters for privacy.


2.) Stop the ASM instances in all nodes.



3.) Start the ASM instance in order (node1, node2 and then node3)