一. OHASD 说明
Oracle 的Restart 特性是Oracle 11g里的新特性,在讲这个特性之前先看一下Oracle 11g RAC的进程。之前的Blog 有说明。
Oracle 11gR2RAC 进程说明
Oracle 11gR2 中对CRSD资源进行了重新分类: Local Resources 和 Cluster Resources。
[grid@rac2 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.FRA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.OCRVOTING.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.gsd
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.net1.network
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.ons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac1
ora.cvu
1 ONLINE ONLINE rac1
ora.oc4j
1 ONLINE ONLINE rac1
ora.rac1.vip
1 ONLINE ONLINE rac1
ora.rac2.vip
1 ONLINE ONLINE rac2
ora.scan1.vip
1 ONLINE ONLINE rac1
ora.sdd.db
1 ONLINE ONLINE rac1 Open
2 ONLINE ONLINE rac2 Open
[grid@rac2 ~]$
在之前的Blog中,提到Oracle 的命令有分层。
Oracle RAC 常用维护工具和命令
http://www.cndba.cn/Dave/article/1015
对应起来:
Local Resources 属于应用层,
Cluster Resources 属于集群层。
我们这里要说的Oracle Restart 就是对Cluster Resource的一个管理。
在Oracle 10g RAC 安装时,在运行root.sh时,会在/etc/inittab文件的最后加入ora.crs,ora.cssd,ora.evmd 三个进程。 这样以后每次系统启动时,Clusterware 也会自动启动,其中EVMD和CRSD 两个进程如果出现异常,则系统会自动重启这两个进程,如果是CSSD 进程异常,系统会立即重启。
而在Oracle 11gR2中,只会将ohasd 写入/etc/inittab 文件。
官网对OHASD 的说明:
Oracle High Availability Services Daemon(OHASD) :This process anchors the lower part of the Oracle Clusterwarestack, which consists of processes that facilitate cluster operations.
可以使用如下命令查看OHASD管理的资源:
[grid@rac2 ~]$ crsctl stat res -init -t
[grid@rac2 ~]$ ps -ef|grep ohasd
root 1057 1 0 Dec21 ? 00:00:00 /bin/sh /etc/init.d/init.ohasdrun
root 2274 1 0 Dec21 ? 00:22:53/u01/app/grid/11.2.0/bin/ohasd.bin reboot
二. Oracle Restart 说明
2.1 说明
官网的文档如下:
About Oracle Restart
http://docs.oracle.com/cd/E11882_01/server.112/e10595/restart001.htm
Oracle Restartimproves the availability of your Oracle database. When you install OracleRestart, various Oracle components can be automatically restarted after ahardware or software failure or whenever your database host computer restarts.
--Oracle Restart 能提高数据库的可用性,当安装了Oracle Restart 之后,在系统出现硬件或者软件问题,或者主机重启之后,OracleRestart 管理的组件都能自动的进行启动。
Oracle Restart 管理的组件如下表:
Component | Notes |
Database instance | Oracle Restart can accommodate multiple databases on a single host computer. |
Oracle Net listener | - |
Database services | Does not include the default service created upon installation because it is automatically managed by Oracle Database, and does not include any default services created during database creation. |
Oracle Automatic Storage Management (Oracle ASM) instance | - |
Oracle ASM disk groups | Restarting a disk group means mounting it. |
Oracle Notification Services (ONS) | In a standalone server environment, ONS can be used in Oracle Data Guard installations for automating failover of connections between primary and standby database through Fast Application Notification (FAN). ONS is a service for sending FAN events to integrated clients upon failover. |
OracleRestart 会周期性的检查和监控这些组件的状态,如果发现某个组件fail,那么就会shutdown并restart 该组件。
Oracle Restart 只能用于not-cluster的环境。 对于Oracle RAC 环境,Oracle Clusterware 会提供automatically restart的功能。
对于非集群环境,只需要安装OracleGrid Infrastructure,在安装的时候选择“仅安装网格基础结构软件”,然后运行如下脚本来安装Oracle Restart:
$GRID_HOME/crs/install/roothas.pl
如果是先安装了Oracle Restart,然后使用dbca创建了实例,那么DBCA会自动的把Oracle 添加到OracleRestart的配置里。 当DBCA启动数据库时,数据库会和其他组件(如disk group)之间建立依赖关系,然后Oracle Restart 开始管理数据库。
当安装了Oracle Restart 后,一些Create操作会自动的创建Oracle 的Compents并将其自动的添加到Oracle Restart configuration中。 这类操作如下表所示:
Create Operation | Created Component Automatically Added to Oracle Restart Configuration? |
Create a database with OUI or DBCA | Yes |
Create a database with the CREATE DATABASE SQL statement | No |
Create an Oracle ASM instance with OUI, DBCA, or ASMCA | Yes |
Create a disk group (any method) | Yes |
Add a listener with NETCA | Yes |
Create a database service with SRVCTL | Yes |
Create a database service by modifying the SERVICE_NAMES initialization parameter | No |
Create a database service with DBMS_SERVICE.CREATE_SERVICE | No |
Create a standby database | No |
同样,一些delete/drop/remove操作也会自动的从Oracle Restart Configuration中进行更新,具体如下表:
Operation | Deleted Component Automatically Removed from Oracle Restart Configuration? |
Delete a database with DBCA | Yes |
Delete a database by removing database files with operating system commands | No |
Delete a listener with NETCA | Yes |
Drop an Oracle ASM disk group (any method) | Yes |
Delete a database service with SRVCTL | Yes |
Delete a database service by any other means | No |
Oracle Restart 由OHASD 进程来管理。 这个就是第一节介绍OHASD的原因。 对于standalone server,使用OHASD 来管理Oracle Restart ,并且不需要CRSD进程的支持。 可以使用OHASD管理的组件如下:
1.CSSD: This is used for Group Services as it was inprevious releases (when it was installed using “localconfig add“)
2.ASM Instance :if Automatic Storage Management is used.
3.ASM Disk Groups: if Automatic Storage Management is used.
4.Listeners
5.Database Instances
6.Database Services
7.ONS/EONS :Used for automatic failover of connections usingFast Application Notification (FAN) in a Data Guard environment
OHASD 是一个后台的守护进程,其可用来启动和监控OracleRestart 进程。 该进程由/etc/init.d/ohasd 脚本来初始化,并有root用户来执行ohasd.bin 来启动。
使用Oracle Restart 有如下好处:
1. Automatic resource startup atboot time without using shell scripts or the Oracle supplied dbstart and dbshut scripts.
2. Resources are started in thecorrect sequence based on dependencies in the OLR(Oracle Local Resource).
3. Resources are also monitored by ohasd foravailability and may be restarted in place if they fail.
4. Role managed services for DataGuard.
5. Consistency of command lineinterfaced tools using crsctl and srvctl as is done withclusters.
2.2 使用CRSCTL 命令管理Oracle Restart Stack
官网的说明如下:
Stopping and Restarting Oracle Restart forMaintenance Operations
http://docs.oracle.com/cd/E11882_01/server.112/e10595/restart004.htm
CRSCTL Command Reference
http://docs.oracle.com/cd/E11882_01/server.112/e25494/restart006.htm
CRSCTL 命令可选参数:
Command | Description |
check | Displays the Oracle Restart status. |
config | Displays the Oracle Restart configuration. |
Disables automatic restart of Oracle Restart. | |
enable | Enables automatic restart of Oracle Restart. |
start | Starts Oracle Restart. |
stop | Stops Oracle Restart. |
注:
以下操作需要已root用户执行
2.2.1 手工停止Oracle Restart: crsctl stop has [-f]
注意:该命令只争对当前服务器有效。
[root@rac1 bin]# ./crsctl stop has
CRS-2791: Starting shutdown of Oracle HighAvailability Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.crsd' on'rac1'
CRS-2790: Starting shutdown of ClusterReady Services-managed resources on 'rac1'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr'on 'rac1'
CRS-2673: Attempting to stop'ora.OCRVOTING.dg' on 'rac1'
CRS-2673: Attempting to stop 'ora.sdd.db'on 'rac1'
CRS-2673: Attempting to stop'ora.LISTENER.lsnr' on 'rac1'
CRS-2673: Attempting to stop 'ora.oc4j' on'rac1'
CRS-2673: Attempting to stop 'ora.cvu' on'rac1'
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr'on 'rac1' succeeded
CRS-2673: Attempting to stop'ora.scan1.vip' on 'rac1'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on'rac1' succeeded
CRS-2673: Attempting to stop 'ora.rac1.vip'on 'rac1'
CRS-2677: Stop of 'ora.rac1.vip' on 'rac1'succeeded
CRS-2672: Attempting to start'ora.rac1.vip' on 'rac2'
CRS-2677: Stop of 'ora.scan1.vip' on 'rac1'succeeded
CRS-2672: Attempting to start 'ora.scan1.vip'on 'rac2'
CRS-2676: Start of 'ora.scan1.vip' on'rac2' succeeded
CRS-2676: Start of 'ora.rac1.vip' on 'rac2'succeeded
CRS-2672: Attempting to start'ora.LISTENER_SCAN1.lsnr' on 'rac2'
CRS-2677: Stop of 'ora.sdd.db' on 'rac1'succeeded
CRS-2673: Attempting to stop 'ora.DATA.dg'on 'rac1'
CRS-2673: Attempting to stop 'ora.FRA.dg'on 'rac1'
CRS-2676: Start of'ora.LISTENER_SCAN1.lsnr' on 'rac2' succeeded
CRS-2677: Stop of 'ora.FRA.dg' on 'rac1'succeeded
CRS-2677: Stop of 'ora.DATA.dg' on 'rac1'succeeded
CRS-2677: Stop of 'ora.oc4j' on 'rac1'succeeded
CRS-2672: Attempting to start 'ora.oc4j' on'rac2'
CRS-2677: Stop of 'ora.cvu' on 'rac1'succeeded
CRS-2672: Attempting to start 'ora.cvu' on'rac2'
CRS-2676: Start of 'ora.cvu' on 'rac2'succeeded
CRS-2677: Stop of 'ora.OCRVOTING.dg' on'rac1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on'rac1'
CRS-2677: Stop of 'ora.asm' on 'rac1'succeeded
CRS-2676: Start of 'ora.oc4j' on 'rac2'succeeded
CRS-2673: Attempting to stop 'ora.ons' on'rac1'
CRS-2677: Stop of 'ora.ons' on 'rac1'succeeded
CRS-2673: Attempting to stop'ora.net1.network' on 'rac1'
CRS-2677: Stop of 'ora.net1.network' on'rac1' succeeded
CRS-2792: Shutdown of Cluster ReadyServices-managed resources on 'rac1' has completed
CRS-2677: Stop of 'ora.crsd' on 'rac1'succeeded
CRS-2673: Attempting to stop 'ora.mdnsd' on'rac1'
CRS-2673: Attempting to stop 'ora.ctssd' on'rac1'
CRS-2673: Attempting to stop 'ora.evmd' on'rac1'
CRS-2673: Attempting to stop 'ora.asm' on'rac1'
CRS-2677: Stop of 'ora.evmd' on 'rac1'succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'rac1'succeeded
CRS-2677: Stop of 'ora.ctssd' on 'rac1'succeeded
CRS-2677: Stop of 'ora.asm' on 'rac1'succeeded
CRS-2673: Attempting to stop'ora.cluster_interconnect.haip' on 'rac1'
CRS-2677: Stop of'ora.cluster_interconnect.haip' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on'rac1'
CRS-2677: Stop of 'ora.cssd' on 'rac1'succeeded
CRS-2673: Attempting to stop 'ora.crf' on'rac1'
CRS-2677: Stop of 'ora.crf' on 'rac1'succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on'rac1'
CRS-2677: Stop of 'ora.gipcd' on 'rac1'succeeded
CRS-2673: Attempting to stop 'ora.gpnpd' on'rac1'
CRS-2677: Stop of 'ora.gpnpd' on 'rac1'succeeded
CRS-2793: Shutdown of Oracle HighAvailability Services-managed resources on 'rac1' has completed
CRS-4133: Oracle High Availability Serviceshas been stopped.
[root@rac1 bin]#
注意:
我这里测试的是Oracle11gR2的环境,我们在节点1上执行该命令,只把节点1上的进程停了,而把相关的资源转移到我们的节点2上了,因此也证实了我们上面的说的,该命令只争对当前服务器有效。
2.2.2 启动HAS
[root@rac1 bin]# ./crsctl start has
CRS-4123: Oracle High Availability Serviceshas been started.
[root@rac1 bin]#
从上面看只是启动了HAS。实际上后面会把Oracle Restart 管理的资源都会启动。这个可以使用crs_stat 命令来进程验证,不过Oracle 11g的进程启动过程比较慢,需要耐心等待。
[root@rac1 u01]# shcrs_stat.sh
Name Target State Host
------------------------------ ------------------- -------
ora.DATA.dg ONLINE ONLINE rac1
ora.FRA.dg ONLINE ONLINE rac1
ora.LISTENER.lsnr ONLINE ONLINE rac1
ora.LISTENER_SCAN1.lsnr ONLINE ONLINE rac2
ora.OCRVOTING.dg ONLINE ONLINE rac1
ora.asm ONLINE ONLINE rac1
ora.cvu ONLINE ONLINE rac2
ora.gsd OFFLINE OFFLINE
ora.net1.network ONLINE ONLINE rac1
ora.oc4j ONLINE ONLINE rac2
ora.ons ONLINE ONLINE rac1
ora.rac1.ASM1.asm ONLINE ONLINE rac1
ora.rac1.LISTENER_RAC1.lsnr ONLINE ONLINE rac1
ora.rac1.gsd OFFLINE OFFLINE
ora.rac1.ons ONLINE ONLINE rac1
ora.rac1.vip ONLINE ONLINE rac1
ora.rac2.ASM2.asm ONLINE ONLINE rac2
ora.rac2.LISTENER_RAC2.lsnr ONLINE ONLINE rac2
ora.rac2.gsd OFFLINE OFFLINE
ora.rac2.ons ONLINE ONLINE rac2
ora.rac2.vip ONLINE ONLINE rac2
ora.scan1.vip ONLINE ONLINE rac2
ora.sdd.db ONLINE ONLINE rac2
2.2.3 禁用HAS(Restart)在server 重启后的自动启动
[root@rac1 bin]# ./crsctl disable has
CRS-4621: Oracle High Availability Servicesautostart is disabled.
[root@rac1 bin]#
2.2.4 查看HAS(Restart)的状态
[root@rac1 bin]# ./crsctl config has
CRS-4621: Oracle High Availability Servicesautostart is disabled.
2.2.5 启用HAS(Restart)在server 重启后的自启动
[root@rac1 bin]# ./crsctl enable has
CRS-4622: Oracle High Availability Servicesautostart is enabled.
--查看has的状态,验证刚才命令的效果:
[root@rac1 bin]# ./crsctl config has
CRS-4622: Oracle High Availability Servicesautostart is enabled.
[root@rac1 bin]#
2.2.6 查看Restart 当前状态
[root@rac1 bin]# ./crsctl check has
CRS-4638: Oracle High Availability Servicesis online
2.2.7 查看Oracle Restart 中由OHASD管理的resource 状态
[root@rac1 bin]# ./crsctl stat res -t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.FRA.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.LISTENER.lsnr
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.OCRVOTING.dg
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.asm
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
ora.gsd
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
ora.net1.network
ONLINE ONLINE rac1
ONLINE ONLINE rac2
ora.ons
ONLINE ONLINE rac1
ONLINE ONLINE rac2
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac2
ora.cvu
1 ONLINE ONLINE rac2
ora.oc4j
1 ONLINE ONLINE rac2
ora.rac1.vip
1 ONLINE ONLINE rac1
ora.rac2.vip
1 ONLINE ONLINE rac2
ora.scan1.vip
1 ONLINE ONLINE rac2
ora.sdd.db
1 ONLINE ONLINE rac1 Open
2 ONLINE ONLINE rac2 Open
[root@rac1 bin]#
2.3 使用SRVCTL 命令管理Restart(OHASD)
可以手工的使用SRVCTL 命令来管理Oracle Restart。从Oracle Restart 配置里添加或者删除一些组件。当我们手工的添加一个组件到到Oracle Restart,并使用SRVCTL启用该组件,那么Oracle Restart 就开始管理该组件,并根据需要决定是否对该组件进行重启。
官方文档的说明如下:
SRVCTL Command Reference for Oracle Restart
http://docs.oracle.com/cd/E11882_01/server.112/e25494/restart005.htm
Configuring OracleRestart
http://docs.oracle.com/cd/E11882_01/server.112/e10595/restart002.htm
SRVCTL命令主要有如下选项:
Command | Description |
add | Adds a component to the Oracle Restart configuration. |
config | Displays the Oracle Restart configuration for a component. |
Disables management by Oracle Restart for a component. | |
enable | Reenables management by Oracle Restart for a component. |
getenv | Displays environment variables in the Oracle Restart configuration for a database, Oracle ASM instance, or listener. |
modify | Modifies the Oracle Restart configuration for a component. |
remove | Removes a component from the Oracle Restart configuration. |
setenv | Sets environment variables in the Oracle Restart configuration for a database, Oracle ASM instance, or listener. |
start | Starts the specified component. |
status | Displays the running status of the specified component. |
stop | Stops the specified component. |
Unsets environment variables in the Oracle Restart configuration for a database, Oracle ASM instance, or listener. |
--This example adds thedatabase with the DB_UNIQUE_NAME dbcrm:
srvctl add database -d dbcrm -o/u01/app/oracle/product/11.2.0/dbhome_1
--This example adds thesame database and also establishes a dependency between the database and thedisk groups DATA and RECOVERY.
srvctl add database -d dbcrm -o/u01/app/oracle/product/11.2.0/dbhome_1 -a "DATA,RECOVERY"
--The following commandadds a listener (named LISTENER) running out of the database Oracle homeand listening on TCP port 1522:
srvctl add listener -p TCP:1522 -o /u01/app/oracle/product/11.2.0/dbhome_1
注意srvctl命令中的config 选项,其是用来限制相关Resource 信息的:
[grid@rac1 ~]$ srvctl config asm -a
ASM home: /u01/app/grid/11.2.0
ASM listener: LISTENER
ASM is enabled.
[grid@rac1 ~]$
这里srvctl 命令是非常常用的。在2.1 节里也说明了,有些操作会自动的把相关的resource 添加到Restart里,从而来进行监控,但有些操作不会添加到Restart里,这就需要我们手工的来进行添加。
最后一天,如果已经安装过了Restart,如果机器名称发生了改变,就需要重新配置Oracle Restart,具体参考MOS 文档:[ID986740.1]。