11GR2版本GI中新增加的重要组件OHAS(Oracle High Availability Service)和其他相关的组件,资源,下图是11GR2版本中GI组件之间启动关系。

RAC 10G集群启动脚本_守护进程

 

OHAS

主要体现在集群启动方式和资源管理方式方面。

 

集群启动方式10G版本

10G版本集群管理软件(CRS)。从集群的启动角度来说,10G版本的集群通过/etc/inittab文件中下面标红的三行代码来启动。数据库版本Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production

 

# cat /etc/inittab

ap::sysinit:/sbin/autopush -f /etc/iu.ap

sp::sysinit:/sbin/soconfig -f /etc/sock2path

smf::sysinit:/lib/svc/bin/svc.startd    >/dev/msglog 2<>/dev/msglog </dev/console

p3:s1234:powerfail:/usr/sbin/shutdown -y -i5 -g0 >/dev/msglog 2<>/dev/msglog

h1:3:respawn:/etc/init.d/init.evmd run >/dev/null 2>&1 </dev/null

h2:3:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&1 </dev/null

h3:3:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null

 

虽然以上三个脚本是同时被调用的,但是守护进程之间是有依存关系的。首先需要启动cssd.bin并确保其能够正常工作,之后才能够启动crsd.bin并确保其正常工作,最后启动evmd.bin并确保其正常工作。

 

Init.cssd:负责启动ocssd.bin守护进程和其他css层面的守护进程,从而完成对集群的构建工作。

Init.crsd:负责启动crsd.bin守护进程并且调用racg模块来启动相应的资源,从而完成对集群应用程序资源的启动。

Init.evmd:负责启动evmd.bin守护进程,从而实现集群节点的事件发布。

 

[oracle@webdb1 ~]$ ls -l /etc/inittab
-rw-r--r-- 1 root root 1869 Jan 23  2013 /etc/inittab

[oracle@webdb1 ~]$ ls -l /etc/init.d/init.cssd
-r-xr-xr-x 1 root root 55166 Jan 23  2013 /etc/init.d/init.cssd

 

 

接下来,看一下每个脚本的内容,只列举一部分脚本,主要体现主要功能。

(1)init.crsd脚本

...............................................................................................................

ORA_CRS_HOME=/opt/oracle/product/CRS

ORACLE_USER=oracle

 

ORACLE_HOME=$ORA_CRS_HOME

 

export ORACLE_HOME

export ORA_CRS_HOME

export ORACLE_USER

 

# Set DISABLE_OPROCD to false. Platforms that do not ship an oprocd

# binary should override this below.

DISABLE_OPROCD=false

# Default OPROCD timeout values defined here, so that it can be

# over-ridden as needed by a platform.

# default Timout of 1000 ms and a margin of 500ms

OPROCD_DEFAULT_TIMEOUT=1000

OPROCD_DEFAULT_MARGIN=500

# default Timeout for other actions

OPROCD_CHECK_TIMEOUT=2000

OPROCD_STOP_TIMEOUT=2000

OPROCD_DEFAULT_HISTORGRAM=

 

# Incase /bin/hostname is not present in a particular platform, we

# may have to do something different.

HOSTN=/bin/hostname

EXPRN=/usr/bin/expr

CUT=/usr/bin/cut

AWK='/bin/awk'

ECHO='echo'

 

TR=/bin/tr

#solaris on amd and SPARC has issue with /bin/tr

[ 'SunOS' = `/bin/uname` ] && TR=/usr/xpg4/bin/tr

#on Linux tr is at /usr/bin/tr

[ 'Linux' = `/bin/uname` ] && TR=/usr/bin/tr

 

 

 

#If the hostname is an IP address, let hostname

#remain as IP address

HOST=`$HOSTN`

len1=`$EXPRN "$HOST" : '.*'`

len2=`$EXPRN match $HOST '[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*'`

 

# Strip off domain name in case /bin/hostname returns

# FQDN hostname

if [ $len1 != $len2 ]; then

 HOST=`$ECHO $HOST | $CUT -d'.' -f1 `

fi

 

HOST=`$ECHO $HOST | $TR '[:upper:]' '[:lower:]'`

 

# Default Location for commands on most platforms

PS='/bin/ps'

# ps -e is expected to search for all processes on the box and provide

# terse binary name output so that column count does not truncate binary

# names and confuse grep.

PSE='/bin/ps -e'

PSEF='/bin/ps -ef'

HEAD='/bin/head'

GREP='/bin/grep'

KILL='/bin/kill'

KILLTERM='/bin/kill -TERM'

KILLDIE='/bin/kill -9'

KILLCHECK="/bin/kill -0 $$"

SLEEP='/bin/sleep'

NULL='/dev/null'

............................................................可以看到,首先定义了集群使用的一些环境变量和需要使用的操作系统命令。

...............................................................................................................

 

PLATFORM=`$UNAME`

 

MAXFILE=65536

 

case $PLATFORM in

Linux)

 LD_LIBRARY_PATH=$ORA_CRS_HOME/lib

       export LD_LIBRARY_PATH

       FAST_REBOOT="/sbin/reboot -n -f & $SLEEP 1 ; $ECHO b > /proc/sysrq-trigger"

       HEAD='/usr/bin/head'

...............................................................................................................

HP-UX) MACH_HARDWARE=`/bin/uname -m`

...............................................................................................................

     LD_LIBRARY_PATH=$ORA_CRS_HOME/lib:$NMAPIDIR_64:/usr/lib:$LD_LIBRARY_PATH

       export LD_LIBRARY_PATH

       # Presence of this file indicates that vendor clusterware is installed

       SKGXNLIB=${NMAPIDIR_64}/libnmapi2.${SO_EXT}

       if [ -f $SKGXNLIB ]; then

         USING_VC=1

       fi

...............................................................................................................

SunOS) MACH_HARDWARE=`/bin/uname -i`

 ARCH=`/usr/bin/isainfo -b`

       CLUSTERDIR=/opt/ORCLcluster

 LD_LIBRARY_PATH=$ORA_CRS_HOME/lib:$CLUSTERDIR/lib:/usr/lib:/usr/ucblib:$LD_LIBRARY_PATH

       LD_LIBRARY_PATH_64=$ORA_CRS_HOME/lib:$CLUSTERDIR/lib:/usr/lib:/usr/ucblib:$LD_LIBRARY_PATH_64

       if [ "${MACH_HARDWARE}${ARCH}" = "i86pc64" ]; then

           LD_LIBRARY_PATH=$ORA_CRS_HOME/lib:$CLUSTERDIR/lib:/usr/lib/amd64:/usr/ucblib/amd64:$LD_LIBRARY_PATH

           LD_LIBRARY_PATH_64=$ORA_CRS_HOME/lib:$CLUSTERDIR/lib:/usr/lib/amd64:/usr/ucblib/amd64:$LD_LIBRARY_PATH_64

...............................................................................................................可以看到为不同操作系统设置对应环境变量。

...............................................................................................................

'stop')

    $LOGMSG "Oracle CSSD being stopped"

 

    # disable CSS startup until the next boot

    $ID/init.cssd norun

 

    # shutdown the OPROCD process if it is running

    if [ ! -f $NOOPROCD ]; then

       $OPROCD stop -t $OPROCD_STOP_TIMEOUT 2>$NULL

    fi

 

    # No steps are necessary for shutting down clsomon. It will go down

    # automatically when CSS is shutdown.

 

    # Shut down oclsvmon if it is up.

    if [ ! -f $NOCLSVMON ]; then

      $EVAL $FINDCLSVMON | $AWK '{ print $2 ; }' | $XARGS $KILLTERM > $NULL 2>&1

    fi

 

    # Invalidate init.cssd fatal pidfiles.

    $ECHO "stopped" > $CSSFBOOT

 

    $TOUCH $NOOPROCD

    $TOUCH $NOCLSVMON

    $TOUCH $NOCLSOMON

 

    # Now tell it to shut down.

    if [ -x "$CRSCTL" ]; then

      $CRSCTL stop crs

    fi

 

    $ECHO "Shutdown has begun. The daemons should exit soon."

    ;;

 

'run')

    # Foreground run, for single instance or single-node installs only.

    # If this is used in a cluster install, RDBMS datafile corruption is

    # likely.

 

    # Run the startcheck to see whether we should continue

    $ID/init.cssd startcheck

    while [ "$?" != "0" ]; do

      $SLEEP $RUNRECHECKTIME

      $ID/init.cssd startcheck

    done

 

    cd $ORA_CRS_HOME/log/$HOST/cssd

 

    # If there is an old corefile by such a collision prone name, then

    # rename it to something safe.

    if [ -f ./core ]; then

      $MVF ./core "$UNIQUECORE"

    fi

 

    # Arguments. By default none.

    OCSSD_ARGS=

    

    $ORA_CRS_HOME/bin/ocssd $OCSSD_ARGS

    ;;

 

'fatal')

    # This action is invoked to start the CSS daemon in cluster mode,

    # and one or more of its accompanying daemons oprocd or clsvmon or clsomon

    # This respawn wrapper is done in lieu of adding new entries to inittab.

 

    # Check to see if we are supposed to run this boot.

    $ID/init.cssd startcheck

    while [ "$?" != "0" ]; do

      $SLEEP $RUNRECHECKTIME

      $ID/init.cssd startcheck

    done

 

    # See discussion in LocalFence

$EVAL $CLEANREBOOTLOCK

..........................................................................................................

    $ECHO "See documentation at the top of $0 about supported commands."

    exit 1;

    ;;

..........................................................................................................init.cssd根据输入的参数决定需要执行的操作,如果输入启动参数为fatal则正常启动cssd守护进程和其他相关守护进程。

 

 

(2)Init.crsd

ORA_CRS_HOME=/opt/oracle/product/CRS

ORACLE_HOME=$ORA_CRS_HOME

export ORA_CRS_HOME

export ORACLE_HOME

 

ORACLE_USER=oracle

 

UMASK=/bin/umask

SED=/bin/sed

CAT=/bin/cat

LOGMSG="/bin/logger -puser.err"

ECHO=/bin/echo

.............................................................定义crsd需要使用的环境变量和操作系统命令。

---------------------------------------------------------------------------------------------------------------------------

case $PLATFORM in

Linux)

    SCRDIR=/etc/oracle/scls_scr/$HOST

    ID=/etc/init.d

    LOGGER="/usr/bin/logger"

    if [ ! -f "$LOGGER" ]; then

      LOGGER="/bin/logger"

    fi

    LOGMSG="$LOGGER -puser.err"

 

    if [ ! -f "$UMASK" ]; then

      UMASK=umask

......................................................................................................................................................

OSF1)  

    ID=/sbin/init.d

    # No restriction in opening files on TRU64. Refer b7623099.

    MAXFILE=unlimited

    ;;

*)  /bin/echo "ERROR: Unknown Operating System"

    exit -1

    ;;

esac

....................................................................................根据不同平台设置不同的环境变量。

......................................................................................................................................................

 

case $1 in

'home')

    $ECHO $ORA_CRS_HOME

    exit 0;

    ;;

'stop')

    [ -r $PIDFILE ] && crspid=`$CAT $PIDFILE`

    $LOGMSG "Oracle CRSD $crspid set to stop"

 

    # Indicate that the next time we start up, it may be an initial startup.

    $ECHO "stopped" > $CRSDBOOT

    

    $LOGMSG "Oracle CRSD $crspid shutdown completed"

    ;;

'run') # foreground run out of init

.....................................................................................................................................................

    $ECHO "Manual invocation of $0 is not supported."

    ;;

Esac

....................................................................根据输入参数值决定相应的操作。输入参数为run,则表示启动crsd.bin守护进程。

 

 

(3)Init.evmd

ORA_CRS_HOME=/opt/oracle/product/CRS

ORACLE_USER=oracle

 

ORACLE_HOME=$ORA_CRS_HOME

export ORACLE_HOME

export ORA_CRS_HOME

 

CAT=/bin/cat

RMF="/bin/rm -f"

LOGMSG="/bin/logger -puser.err"

ECHO=/bin/echo

KILL=/bin/kill

..............................................................................根据不同平台设置不同的环境变量。

 

case $PLATFORM in

Linux)

       ID=/etc/init.d

       LOGGER="/usr/bin/logger"

       if [ ! -f "$LOGGER" ];then

        LOGGER="/bin/logger"

       fi

       LOGMSG="$LOGGER -puser.err"

       SU="/bin/su -l"

 

       ;;

HP-UX)

       ID=/sbin/init.d

       ;;

.....................................................................................................................................................

       ;;

Esac

.......................................................................根据不同平台设置不同的环境变量。

....................................................................................................................................................

case $1 in

'home')

    $ECHO $ORA_CRS_HOME

    exit 0;

    ;;

'user')

    $ECHO $ORACLE_USER

    exit 0;

    ;;

'stop')

    $LOGMSG "Oracle EVMD set to stop"

      

    ;;

'run') # foreground run out of init

根据输入参数值决定相应的操作。输入参数为run,则表示启动crsd.bin守护进程。

 

 

(4)小结

看了 init. cssd、init.crsd和 init. evmd三个脚本的内容后,可以发现这三个脚本的基本结构是:首先定义变量和操作系统命令,之后根据不同的操作系统平台设置对应的环境变量,最后根据输入的参数来决定对应的操作。但是这样做也为集群管理软件带来了问题:如果由于某种原因脚本的内容或者权限被修改,很可能导致集群无法被启动,并且很难进行诊断,而且所有的操作都保存在脚本中也会存在安全性的问题,所以,从11.2.0.2版本开始,集群的启动方式发生了改变。