系统:CentOS 7.9

数据库:oracle 11.2.0.4

环境:asm单实例

问题描述:GI安装成功后,dbca安装数据库时报错crs-0259、prcr-1006、prcr-1071,如下图所示:

dbca建库报错crs-0259 以及启动集群报CRS-4124&CRS-400 处理记录_perl

查dbca日志:

[oracle@histest orcl]$ tail -3000 trace.log | grep PRCR
[Thread-95] [ 2022-07-17
09:27:28.345 CST ]
[HASIDBRegistrationStep.executeImpl:253]
Exception while registering with
HAS
PRCR-1006 : Failed to add resource
ora.orcl.db for orcl
PRCR-1071 : Failed to register or
update resource ora.orcl.db

官方显示此为Bug 11886915:CRS-0259 WHEN REGISTERING THE DATABASE WITH ORACLE RESTART

解决过程:

[grid@histest ~]$ crsctl stop has
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'histest'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'histest'
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'histest'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'histest' succeeded
CRS-2677: Stop of 'ora.DATA.dg' on 'histest' succeeded
CRS-2679: Attempting to clean 'ora.DATA.dg' on 'histest'
CRS-2681: Clean of 'ora.DATA.dg' on 'histest' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'histest'
CRS-2677: Stop of 'ora.asm' on 'histest' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'histest'
CRS-2677: Stop of 'ora.cssd' on 'histest' succeeded
CRS-2673: Attempting to stop 'ora.evmd' on 'histest'
CRS-2677: Stop of 'ora.evmd' on 'histest' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'histest' has completed
CRS-4133: Oracle High Availability Services has been stopped.

重启has报错.

[grid@histest ~]$ crsctl start has
CRS-4124: Oracle High Availability Services startup failed.
CRS-4000: Command Start failed, or completed with errors.
message日志:
Jul 17 09:46:08 histest su: (to grid) root on pts/1
Jul 17 09:46:08 histest dbus[980]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)
Jul 17 09:46:08 histest dbus[980]: [system] Successfully activated service 'org.freedesktop.problems'
Jul 17 09:48:17 histest su: (to oracle) root on pts/2
Jul 17 09:49:54 histest root: exec /u01/app/grid/product/11.2.0/grid/perl/bin/perl -I/u01/app/grid/product/11.2.0/grid/perl/lib /u01/app/grid/product/11.2.0/grid/bin/crswrapexece.pl/u01/app/grid/product/11.2.0/grid/crs/install/s_crsconfig_histest_env.txt /u01/app/grid/product/11.2.0/grid/bin/ohasd.bin "reboot"
Jul 17 09:50:01 histest systemd: Started Session 11 of user root.

对于启动has报错CRS-4124、CRS-4000,按以下步骤排查解决.

1、查看/u01/app/grid/product/11.2.0/grid/perl/bin/perl权限

原因:在系统重启后,手动启动CRS/HAS会存在该告警(文档ID 1624661.1),不过此处并没有重启系统,官方文档说GRID_HOME中perl执行文件的所有权由于某些原因会被修改,它应属于grid用户,而不是oracle用户.

[root@histest log]# ll /u01/app/grid/product/11.2.0/grid/perl/bin/perl
-rwxr-xr-x 1 grid oinstall 1424555 Jul 21 2011 /u01/app/grid/product/11.2.0/grid/perl/bin/perl

如上所示:perl文件属性为grid:oinstall,不过官方文档中该文件权限为700,判断不是该文件造成的此异常.

2、查网上资料,说是尝试修改/var/tmp/.oracle/npohasd文件属性解决此问题.

[root@histest .oracle]# pwd
/var/tmp/.oracle
[root@histest .oracle]# ll
total 0
prw-r--r-- 1 grid oinstall 0 Jul 17 08:51 npohasd
[root@histest .oracle]# chown -R root:oinstall npohasd
[root@histest .oracle]# ll
total 0
prw-r--r-- 1 root oinstall 0 Jul 17 08:51 npohasd

结果:修改后开启HAS依然失败,此后将该文件属性重新改为grid:oinstall.

3、查mos Doc ID 1612325.1,发现场景相似

官方文档显示此异常原因:

Permission issue
Relinked the binaries and restarted the server again so that init.ohasd came up
fine, but ohasd and other daemons wouldn't start and no sockets get
created OS start S96ohasd, it will wait for init.ohasd to write the pipe.
What happened here is init.ohasd was started, then all socket files got removed by
the manual removal, then when you start ohasd again, it will wait there since those socket files was removed manually

解决方案:

Clear all sockets under /var/tmp/.oracle or /tmp/.oracle if any and then open two terminals of the same node, where stack is not coming up.

会话1:

[root@leo ~]# /u01/app/grid/product/11.2.0/grid/bin/crsctl start has
CRS-4123: Oracle High Availability Services has been started.

会话1 start has后,立即在会话2执行以下语句,待has启动成功后,按CTRL+C终止dd命令.

[root@leo .oracle]# dd if=/var/tmp/.oracle/npohasd of=/dev/null bs=1024 count=1

查看crs stack状态:

最初状态:

[grid@leo ~]$ ps -ef|grep d.bin
grid 2080 1 0 10:32 ? 00:00:00 /u01/app/grid/product/11.2.0/grid/bin/ohasd.bin reboot
grid 2857 2799 0 10:49 pts/2 00:00:00 grep --color=auto d.bin

has开启后状态:

[grid@leo ~]$ ps -ef|grep d.bin
grid 2080 1 0 10:32 ? 00:00:01 /u01/app/grid/product/11.2.0/grid/bin/ohasd.bin reboot
grid 3124 1 0 11:00 ? 00:00:00 /u01/app/grid/product/11.2.0/grid/bin/oraagent.bin
grid 3139 1 0 11:00 ? 00:00:00 /u01/app/grid/product/11.2.0/grid/bin/evmd.bin
grid 3141 1 0 11:00 ? 00:00:00 /u01/app/grid/product/11.2.0/grid/bin/tnslsnr LISTENER -inherit
grid 3176 3139 0 11:00 ? 00:00:00 /u01/app/grid/product/11.2.0/grid/bin/evmlogger.bin -o /u01/app/grid/product/11.2.0/grid/evm/log/evmlogger.info -l /u01/app/grid/product/11.2.0/grid/evm/log/evmlogger.log
grid 3183 1 0 11:00 ? 00:00:00 /u01/app/grid/product/11.2.0/grid/bin/cssdagent
grid 3206 1 0 11:00 ? 00:00:00 /u01/app/grid/product/11.2.0/grid/bin/ocssd.bin
grid 3241 2799 0 11:00 pts/2 00:00:00 grep --color=auto d.bin

确认集群状态:

[grid@leo ~]$ crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE leo
ora.LISTENER.lsnr
ONLINE ONLINE leo
ora.asm
ONLINE ONLINE leo Started
ora.ons
OFFLINE OFFLINE leo
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.cssd
1 ONLINE ONLINE leo
ora.diskmon
1 OFFLINE OFFLINE

ora.evmd
1 ONLINE ONLINE leo
ora.orcl.db
1 ONLINE ONLINE leo Open

集群起来后,重新注册database.

[oracle@histest ~]$ srvctl add database -d orcl -o /u01/app/oracle/product/11.2.0/db_1

此后重新dbca建库,无异常发生.

参考文档:

​https://blog.csdn.net/Evils798/article/details/8692898​

​http://blog.itpub.net/7728585/viewspace-1806208/​