大家好,这里是 Lucifer三思而后行,专注于提升数据库运维效率。


目录

  • 前言
  • 环境安装
  • 问题重现
  • 时区检查
  • 修改时区
  • 问题分析
  • 问题解决
  • 深究根源
  • 问题一
  • 问题二
  • 问题三
  • 写在最后
  • 往期精彩文章推荐


前言

昨天遇到一个问题,Oracle RAC 安装完之后,客户反馈数据库查询的时间不对,经分析原来是系统时区与客户所属时区不一致(看来是安装操作系统时选错了时区),需要修改系统时区。

本以为是一个很简单的操作,没想到踩坑了,修改系统时区之后,查询发现以下问题(已解决):

## 使用 TNS 连接数据库查询时间时区为:上海 +0800
$ sqlplus system/oracle@lucifer

SQL> select SYSTIMESTAMP from dual;

SYSTIMESTAMP
---------------------------------------------------------------------------
21-AUG-24 01.19.41.557307 PM +08:00

## 使用本地连接数据库查询时间:越南 +0700
$ sqlplus / as sysdba

SQL> select SYSTIMESTAMP from dual;

SYSTIMESTAMP
---------------------------------------------------------------------------
21-AUG-24 12.20.25.136498 PM +07:00

于是把问题重现了一下进行分享,希望大家能够在遇到这种情况时可以及时避坑。

环境安装

首先用 Oracle一键安装脚本 快速部署一套测试环境:

./OracleShellInstall -n rocky9 `# RAC 主机名前缀`\
-hn rocky9-01,rocky9-02 `# RAC 主机名`\
-cn rocky9-cls `# RAC 集群名称`\
-sn rocky9-scan `# RAC SCAN 名称`\
-rp oracle `# 主机 root 用户密码`\
-lf ens33 `# 主机网卡名称`\
-pf ens34 `# 主机心跳网卡名称`\
-ri 192.168.6.160,192.168.6.161 `# RAC 公网 IP`\
-vi 192.168.6.162,192.168.6.163 `# RAC 虚拟 IP`\
-si 192.168.6.165 `# RAC SCAN IP`\
-od /dev/sdb `# OCR 磁盘盘符名称`\
-dd /dev/sdc `# DATA 磁盘盘符名称`\
-o lucifer `# 数据库名称`\
-dp 'Passw0rd#PST' `# sys/system 用户密码`\
-ds AL32UTF8 `# 数据库字符集`\
-ns AL16UTF16 `# 国家字符集`\
-redo 100 `# 在线重做日志大小(M)`\
-opd Y `# 是否优化数据库`

耗时 45 分钟左右,下面重现下问题。

问题重现

时区检查

当前环境我的系统时区是 Asia/Shanghai (CST, +0800)

## 节点一
[root@rocky9-01:/root]# ll /etc/localtime 
lrwxrwxrwx. 1 root root 35 Aug 15 10:17 /etc/localtime -> ../usr/share/zoneinfo/Asia/Shanghai
[root@rocky9-01:/root]# timedatectl 
               Local time: Wed 2024-08-21 11:56:42 CST
           Universal time: Wed 2024-08-21 03:56:42 UTC
                 RTC time: Wed 2024-08-21 03:56:45
                Time zone: Asia/Shanghai (CST, +0800)
System clock synchronized: no
              NTP service: active
          RTC in local TZ: no

## 节点二
[root@rocky9-02:/root]# ll /etc/localtime 
lrwxrwxrwx. 1 root root 35 Aug 15 10:17 /etc/localtime -> ../usr/share/zoneinfo/Asia/Shanghai
[root@rocky9-02:/root]# timedatectl 
               Local time: Wed 2024-08-21 11:57:40 CST
           Universal time: Wed 2024-08-21 03:57:40 UTC
                 RTC time: Wed 2024-08-21 03:57:43
                Time zone: Asia/Shanghai (CST, +0800)
System clock synchronized: no
              NTP service: active
          RTC in local TZ: no

查看数据库时间:

-- 使用 TNS 连接数据库查询时间时区为:上海 +0800
SQL> select SYSTIMESTAMP from dual;

SYSTIMESTAMP
---------------------------------------------------------------------------
21-AUG-24 11.59.33.946355 AM +08:00

-- 使用本地连接数据库查询时间:上海 +0800
SQL> select SYSTIMESTAMP from dual;

SYSTIMESTAMP
---------------------------------------------------------------------------
21-AUG-24 11.59.33.946355 AM +08:00

修改之前可以发现两个方式查询的时间和时区是一致的。

修改时区

首先停止数据库以及集群服务:

## 关闭数据库
[oracle@rocky9-01:/home/oracle]$ srvctl stop db -d lucifer
## 关闭集群以及自启(两个节点)
[root@rocky9-01:/root]# crsctl stop crs
[root@rocky9-01:/root]# crsctl disable crs
[root@rocky9-02:/root]# crsctl stop crs
[root@rocky9-02:/root]# crsctl disable crs

手动修改操作系统时区:

## 修改时区为越南(两个节点)
## 节点一
[root@rocky9-01:/root]# timedatectl set-timezone "Asia/Ho_Chi_Minh"
[root@rocky9-01:/root]# timedatectl 
               Local time: Wed 2024-08-21 11:09:43 +07
           Universal time: Wed 2024-08-21 04:09:43 UTC
                 RTC time: Wed 2024-08-21 04:09:46
                Time zone: Asia/Ho_Chi_Minh (+07, +0700)
System clock synchronized: no
              NTP service: active
          RTC in local TZ: no
[root@rocky9-01:/root]# ll /etc/localtime 
lrwxrwxrwx 1 root root 38 Aug 21 11:09 /etc/localtime -> ../usr/share/zoneinfo/Asia/Ho_Chi_Minh

## 节点二
[root@rocky9-02:/root]# timedatectl set-timezone "Asia/Ho_Chi_Minh"
[root@rocky9-02:/root]# timedatectl 
               Local time: Wed 2024-08-21 11:09:28 +07
           Universal time: Wed 2024-08-21 04:09:28 UTC
                 RTC time: Wed 2024-08-21 04:09:31
                Time zone: Asia/Ho_Chi_Minh (+07, +0700)
System clock synchronized: no
              NTP service: active
          RTC in local TZ: no
[root@rocky9-02:/root]# ll /etc/localtime 
lrwxrwxrwx 1 root root 38 Aug 21 11:09 /etc/localtime -> ../usr/share/zoneinfo/Asia/Ho_Chi_Minh

可以发现时区已经修改成功,但是为了确保起见,重启一下两台服务器主机:

[root@rocky9-01:/root]# reboot
[root@rocky9-02:/root]# reboot

重启完成后,再次检查操作系统时区:

## 节点一
[root@rocky9-01:/root]# timedatectl 
               Local time: Wed 2024-08-21 12:10:03 +07
           Universal time: Wed 2024-08-21 05:10:03 UTC
                 RTC time: Wed 2024-08-21 05:10:03
                Time zone: Asia/Ho_Chi_Minh (+07, +0700)
System clock synchronized: no
              NTP service: active
          RTC in local TZ: no

## 节点二
[root@rocky9-02:/root]# timedatectl 
               Local time: Wed 2024-08-21 12:10:07 +07
           Universal time: Wed 2024-08-21 05:10:07 UTC
                 RTC time: Wed 2024-08-21 05:10:07
                Time zone: Asia/Ho_Chi_Minh (+07, +0700)
System clock synchronized: no
              NTP service: active
          RTC in local TZ: no

确保没有问题后启动集群和数据库:

## 启动集群(两个节点)
[root@rocky9-01:/root]# crsctl start crs
[root@rocky9-02:/root]# crsctl start crs
## 打开数据库
[oracle@rocky9-01:/home/oracle]$ srvctl start db -d lucifer

正常启动:

[root@rocky9-01:/root]# crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  ONLINE       rocky9-01                                    
               ONLINE  ONLINE       rocky9-02                                    
ora.LISTENER.lsnr
               ONLINE  ONLINE       rocky9-01                                    
               ONLINE  ONLINE       rocky9-02                                    
ora.OCR.dg
               ONLINE  ONLINE       rocky9-01                                    
               ONLINE  ONLINE       rocky9-02                                    
ora.asm
               ONLINE  ONLINE       rocky9-01                Started             
               ONLINE  ONLINE       rocky9-02                Started             
ora.gsd
               OFFLINE OFFLINE      rocky9-01                                    
               OFFLINE OFFLINE      rocky9-02                                    
ora.net1.network
               ONLINE  ONLINE       rocky9-01                                    
               ONLINE  ONLINE       rocky9-02                                    
ora.ons
               ONLINE  ONLINE       rocky9-01                                    
               ONLINE  ONLINE       rocky9-02                                    
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       rocky9-02                                    
ora.cvu
      1        ONLINE  ONLINE       rocky9-01                                    
ora.lucifer.db
      1        ONLINE  ONLINE       rocky9-01                Open                
      2        ONLINE  ONLINE       rocky9-02                Open                
ora.oc4j
      1        ONLINE  ONLINE       rocky9-01                                    
ora.rocky9-01.vip
      1        ONLINE  ONLINE       rocky9-01                                    
ora.rocky9-02.vip
      1        ONLINE  ONLINE       rocky9-02                                    
ora.scan1.vip
      1        ONLINE  ONLINE       rocky9-02

验证一下修改后的数据库时间:

-- 使用 TNS 连接数据库查询时间时区为:上海 +0800
SQL> select SYSTIMESTAMP from dual;

SYSTIMESTAMP
---------------------------------------------------------------------------
21-AUG-24 01.26.09.483623 PM +08:00

-- 使用本地连接数据库查询时间:越南 +0700
SQL> select SYSTIMESTAMP from dual;

SYSTIMESTAMP
---------------------------------------------------------------------------
21-AUG-24 12.26.21.182588 PM +07:00

问题成功复现了。

问题分析

通过以上情况可以发现,本地的时区是正确的,但是使用 TNS 连接的时区是不对的,因为通过 TNS 连接数据库查询 SYSTIMESTAMP 是取的监听时间,也就是说监听可能存在问题。

这里可以参考 MOS 文档中的 point 9 & 11Dates & Calendars - Frequently Asked Questions (Doc ID 227334.1)

9) Why is my SYSDATE / SYSTIMESTAMP time not the same as my system clock on Unix?
To debug situations in which you have a unexplained difference between the oracle SYSDATE / SYSTIMESTAMP and the system time you see on Unix, use the following method:

telnet/ssh to the Unix box
connect using sqlplus in the telnet session:

1) once through the listener using a tnsnames alias
$sqlplus user/password @ [tnsnames alias]
SQL>select to_char(sysdate,'DD-MON-YY HH24:MI:SS') from dual;

2) once through a "local" ORACLE_SID connection
$env | egrep 'ORACLE_SID'
$sqlplus user/password
SQL>select to_char(sysdate,'DD-MON-YY HH24:MI:SS') from dual;

Check that the time in the banner of sqlplus ( SQL*Plus: Release 10.1.0.4.0 - Production on Wo Jan 11 15:05:46 2006 ) is reflecting the time based on the current TZ set in the Unix (!) session.

If the results are different this means that the listener is started with a different TZ setting than you current user environment. To resolve this simply stop and start listener with the TZ setting you want to use.
Make sure you double check what listener you are using when having multiple listeners. For more information about the TZ variable please see your OS documentation.

11)How do I see & set the TZ environment variable in a RAC environment
note 1390015.1 Incorrect SYSDATE shown when connected via Listener in RAC

Note: If you stop and start a RAC database and/or listener "manual" on Unix (= with sqlplus or lsnrctl) then the above setting is NOT used but the TZ setting of the OS user that starts the database/listener, see point 9)

How to use multiple timezones for current_timestamp ( NOT for sysdate !) please see Note 1531653.1 How to use multiple timezones with one Oracle RAC database .

检查监听的时间是否正确:

[grid@rocky9-01:/home/grid]$ lsnrctl stat

LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 21-AUG-2024 12:28:52

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.4.0 - Production
Start Date                21-AUG-2024 13:13:32
Uptime                    0 days 0 hr. 15 min. 20 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/rocky9-01/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.6.160)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.6.162)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "lucifer" has 1 instance(s).
  Instance "lucifer1", status READY, has 1 handler(s) for this service...
Service "luciferXDB" has 1 instance(s).
  Instance "lucifer1", status READY, has 1 handler(s) for this service...
The command completed successfully

通过上面的输出可以发现一个很诡异的现象:

## 查看监听的时间是 12:30,也就是越南时间
LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 21-AUG-2024 12:30:35

## 然而监听启动时间是 13:13,也就是北京时间,明显是错误的
Start Date                21-AUG-2024 13:13:32

为什么监听启动时间会是错误的呢?为了尽快解决问题,通过搜索 MOS 发现了一个文档有相关解决方案:How To Change Timezone for Grid Infrastructure (Doc ID 1209444.1)

其中有一段描述与我们遇到的情况极其相似:

Oracle RAC 修改系统时区避坑指南(深挖篇)_SQL

可以看到从 11.2.0.2 版本开始,TZ 时区环境变量需要读取 $GRID_HOME/crs/install/s_crsconfig_<nodename>_env.txt 文件中的 TZ 变量值,这也会影响监听的启动时间。

检查文件中对应的 TZ 变量值:

## 节点一
[grid@rocky9-01:/home/grid]$ cd $ORACLE_HOME/crs/install
[grid@rocky9-01:/u01/app/11.2.0/grid/crs/install]$ grep TZ s_crsconfig_rocky9-01_env.txt 
TZ=Asia/Shanghai

## 节点二
[grid@rocky9-02:/u01/app/11.2.0/grid/crs/install]$ grep TZ s_crsconfig_rocky9-02_env.txt 
TZ=Asia/Shanghai

当前配置文件的 TZ 值明显为修改之前的时区,看来是符合的。

问题解决

既然知道了解决方案,那解决就很简单了,修改 $GRID_HOME/crs/install/s_crsconfig_<nodename>_env.txt 文件中的 TZ 变量为正确的值即可:

## 节点一
## 备份配置文件
[grid@rocky9-01:/u01/app/11.2.0/grid/crs/install]$ cp s_crsconfig_rocky9-01_env.txt s_crsconfig_rocky9-01_env.txt20240821
## 修改 TZ 值
[grid@rocky9-01:/u01/app/11.2.0/grid/crs/install]$ sed -i 's/^TZ=.*/TZ=Asia\/Ho_Chi_Minh/' s_crsconfig_rocky9-01_env.txt
[grid@rocky9-01:/u01/app/11.2.0/grid/crs/install]$ grep TZ s_crsconfig_rocky9-01_env.txt
TZ=Asia/Ho_Chi_Minh

## 节点二
## 备份配置文件
[grid@rocky9-02:/u01/app/11.2.0/grid/crs/install]$ cp s_crsconfig_rocky9-02_env.txt s_crsconfig_rocky9-02_env.txt20240821
## 修改 TZ 值
[grid@rocky9-02:/u01/app/11.2.0/grid/crs/install]$ sed -i 's/^TZ=.*/TZ=Asia\/Ho_Chi_Minh/' s_crsconfig_rocky9-02_env.txt
[grid@rocky9-02:/u01/app/11.2.0/grid/crs/install]$ grep TZ s_crsconfig_rocky9-02_env.txt 
TZ=Asia/Ho_Chi_Minh

修改后需要重启数据库和集群才能生效,最好重启主机:

## 关闭数据库
[oracle@rocky9-01:/home/oracle]$ srvctl stop db -d lucifer
## 关闭集群以及恢复自启(两个节点)
[root@rocky9-01:/root]# crsctl stop crs
[root@rocky9-01:/root]# crsctl enable crs
[root@rocky9-02:/root]# crsctl stop crs
[root@rocky9-02:/root]# crsctl enable crs
[root@rocky9-01:/root]# reboot
[root@rocky9-02:/root]# reboot

重启后检查监听:

## 系统时间
[grid@rocky9-01:/home/grid]$ date
Wed Aug 21 01:07:01 PM +07 2024
## 监听时间
[grid@rocky9-01:/home/grid]$ lsnrctl stat

LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 21-AUG-2024 13:07:02

Copyright (c) 1991, 2013, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER)))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.4.0 - Production
Start Date                21-AUG-2024 13:05:03
Uptime                    0 days 0 hr. 1 min. 59 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/11.2.0/grid/network/admin/listener.ora
Listener Log File         /u01/app/grid/diag/tnslsnr/rocky9-01/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.6.160)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.6.162)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
  Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "lucifer" has 1 instance(s).
  Instance "lucifer1", status READY, has 1 handler(s) for this service...
Service "luciferXDB" has 1 instance(s).
  Instance "lucifer1", status READY, has 1 handler(s) for this service...
The command completed successfully

可以看到监听时间和系统时间已经保持一致,再次验证数据库时间:

-- 使用 TNS 连接数据库查询时间时区为:越南 +0700
SQL> select SYSTIMESTAMP from dual;

SYSTIMESTAMP
---------------------------------------------------------------------------
21-AUG-24 01.08.20.889733 PM +07:00

-- 使用本地连接数据库查询时间:越南 +0700
SQL> select SYSTIMESTAMP from dual;

SYSTIMESTAMP
---------------------------------------------------------------------------
21-AUG-24 01.08.24.437260 PM +07:00

时区一致,与客户确认后,已恢复正常。

深究根源

问题解决,就结束了吗?不,还想深究下 3 个问题:

  1. 这个 txt 配置文件是如何生成的?
  2. txt 配置文件中的 TZ 值是如何获取的?
  3. 监听为什么会受这个文件影响?

问题一

这个配置文件是什么时候生成的?通过分析 Grid 安装日志,大致算是搞清楚了,下面记录一下分析过程,通过倒退的方式。

首先在 $GRID_HOME/crs/install/s_crsconfig_lib.pm 文件中有一个函数:

sub s_createConfigEnvFile
#---------------------------------------------------------------------
# Function: Create s_crsconfig_$HOST_env.txt file for Time Zone
# Args    : none
# Notes   : Valid <env_file> format
#           (Please keep this in sync with has/utl/crswrapexec.pl)
#             * Empty lines: lines with all white space
#             * Comments: line starts with #.
#             * <key>=<value>
#             * <key> is all non-whitespace characters on the left of the
#               first "=" character.
#             * <value> is everything on the right of the first "=" character
#               (including whitespaces).
#             * Surrounding double-quote (") won't be stripped.
#             * Key with blank <value> ('') will be undefined.
#               (e.g: Hello=, Hello will be undefined)
#---------------------------------------------------------------------
{
   my $env_file = catfile($ORA_CRS_HOME, 'crs', 'install',
                          's_crsconfig_' . $HOST . '_env.txt');

   open (ENVFILE, ">$env_file") or die "Can't create $env_file: $!";

   print ENVFILE "### This file can be used to modify the NLS_LANG environment"
               . " variable, which determines the charset to be used for messages.\n"
               . "### For example, a new charset can be configured by setting"
               . " NLS_LANG=JAPANESE_JAPAN.UTF8 \n"
               . "### Do not modify this file except to change NLS_LANG,"
               . " or under the direction of Oracle Support Services\n\n";

   # get TZ
   if ($CFG->defined_param('TZ')) {
      my $tz = $CFG->params('TZ');
      $tz    =~ s/'//g; # remove single quotes
      print ENVFILE "TZ=" . $tz . "\n";
   }
    ...
    ...
    ...
}

我这里截取了其中重要的一部分,也就是创建 s_crsconfig_$HOST_env.txt 文件的代码,这段代码的主要目的是生成一个环境变量配置文件 s_crsconfig_<HOST>_env.txt,其中包含头注释信息,并且会写入 TZ(时区)变量的值。

当然,上面只是定义了一个函数,并没有在这个文件中被调用,调用函数的文件是 $GRID_HOME/crs/install/crsconfig_lib.pm,对应的代码段为:

# 导入名为 s_crsconfig_lib 的模块,也就是上面的 s_crsconfig_lib.pm 文件
use s_crsconfig_lib;

# 调用 s_createConfigEnvFile 函数创建 s_crsconfig_$HOST_env.txt 文件
sub run_env_setup_modules
{
    ...
    ...
    ...

    # create s_crsconfig_$HOST_env.txt file
    s_createConfigEnvFile ();
}

顺藤摸瓜,找到调用 run_env_setup_modules 函数的文件是 $GRID_HOME/crs/install/rootcrs.pl,对应的代码块是:

# 导入名为 crsconfig_lib 的模块,也就是上面的 crsconfig_lib.pm 文件
use crsconfig_lib;

# 调用 run_env_setup_modules 函数
# run directory creation, script instantiation, files creation/permissions
# modules
run_env_setup_modules ();

那么 rootcrs.pl 文件是在哪里以及什么时候调用的呢?对 root.sh 执行过程比较熟悉的朋友应该已经知道了,不熟悉的朋友也没关系,继续顺藤摸瓜就行。

调用 rootcrs.pl 的文件是 $GRID_HOME/crs/config/rootconfig.sh,对应的代码块是:

HA_CONFIG=false

# 这里 ROOTCRSPL 指向的就是 crs/install/rootcrs.pl 文件
ROOTCRSPL="$ORACLE_HOME/perl/bin/perl -I$ORACLE_HOME/perl/lib -I$ORACLE_HOME/crs/install $ORACLE_HOME/crs/install/rootcrs.pl"

# 当 HA_CONFIG 为 false 时调用了 `rootcrs.pl` 文件
if [ "$HA_CONFIG" = "true" ]; then
  ROOTSCRIPT=$ROOTHASPL
else
  ROOTSCRIPT=$ROOTCRSPL
fi

接着就是 rootconfig.sh$GRID_HOME/root.sh 文件调用,对应的代码块是:

/u01/app/11.2.0/grid/crs/config/rootconfig.sh

到这里,生成 s_crsconfig_$HOST_env.txt 文件的整个流程就很清晰了,大概如下:

  1. 安装 Grid 时执行 $GRID_HOME/root.sh
  2. root.sh 脚本调用 $GRID_HOME/crs/config/rootconfig.sh 脚本
  3. rootconfig.sh 脚本调用 $GRID_HOME/crs/install/rootcrs.pl 脚本
  4. rootcrs.pl 脚本调用 $GRID_HOME/crs/install/crsconfig_lib.pm 文件
  5. crsconfig_lib.pm 文件调用 $GRID_HOME/crs/install/s_crsconfig_lib.pm 文件
  6. s_crsconfig_lib.pm 文件中定义了函数创建 s_crsconfig_$HOST_env.txt 文件。

第一个问题算是彻底研究明白了。

问题二

因为前面知道 s_crsconfig_$HOST_env.txt 文件的创建函数了,所以只要看下创建时 TZ 值是如何获取就知道了。

查看创建函数获取 TZ 变量值的代码段:

# get TZ
   if ($CFG->defined_param('TZ')) {
      my $tz = $CFG->params('TZ');
      $tz    =~ s/'//g; # remove single quotes
      print ENVFILE "TZ=" . $tz . "\n";
   }

这段代码的主要作用是检查配置对象 $CFG 中是否定义了 TZ 参数,并在存在时获取其值,去除其中的单引号,然后将其以 TZ=<value> 的格式写入到文件中。

也就是说需要知道 $CFG 对象是如何定义的,这个对象是在 $GRID_HOME/crs/install/rootcrs.pl 脚本中定义的,对应的代码段是:

# pull all parameters defined in crsconfig_params and s_crsconfig_defs (if
# it exists) as variables in Perl
my $paramfile_default = catfile (dirname ($0), "crsconfig_params");

our $PARAM_FILE_PATH = $paramfile_default;

# Read the config files and set up the configuration data for
# subsequent processing
my $cfg =
  crsconfig_lib->new(IS_SIHA             => FALSE,
                     paramfile           => $PARAM_FILE_PATH,
                     osdfile             => $defsfile,
                     addfile             => $addparams,
                     crscfg_trace        => TRUE,
                     CRSDelete           => $g_delete,
                     DEBUG               => $DEBUG,
                     HAS_USER            => $SUPERUSER,
                     HOST                => $HOST,
                     UPGRADE             => $UPGRADE,
                     UNLOCK		 => $g_unlock,
                     unlock_crshome      => $unlock_crshome,
                     CRSPatch            => $g_patch,
                     DOWNGRADE           => $DOWNGRADE,
                     oldcrshome          => $oldcrshome,
                     oldcrsver           => $oldcrsver,
                     force               => $g_force,
                     deinstall           => $g_deinstall,
                     keepdg              => $g_keepdg,
                     lastnode            => $g_lastnode,
                     REMOTENODE          => $REMOTENODE,
                     destcrshome         => $destcrshome
                     );

可以看到获取参数值的对应文件为当前目录下的 crsconfig_params 文件,查看文件中是否存在 TZ 相关的参数:

## 以节点一为例
[grid@rocky9-01:/u01/app/11.2.0/grid/crs/install]$ grep TZ crsconfig_params
#    dpham      03/17/10 - Add TZ variable (9462081
TZ=Asia/Shanghai

📢 注意:这里的 TZ 值还是 Asia/Shanghai,为什么没有影响到数据库的运行?因为这里仅用于安装时生成 s_crsconfig_$HOST_env.txt 文件所用,所以不修改也无关大雅。

这个文件是由 crsconfig_params.sbs 生成,其中 TZ 参数:

[grid@rocky9-01:/u01/app/11.2.0/grid/crs/install]$ grep TZ crsconfig_params.sbs 
#    dpham      03/17/10 - Add TZ variable (9462081
TZ=%oracle_install_crs_Timezone%

可以看到 TZ 变量值是通过 %oracle_install_crs_Timezone% 获取。

在 Grid 安装过程中会生成一个静默文件 $GRID_HOME/inventory/response/oracle.crs_Complete.rsp,其中记录了 Grid 安装时获取到的 %oracle_install_crs_Timezone% 值:

#-------------------------------------------------------------------------------
#Name       : oracle_install_crs_Timezone
#Datatype   : String
#Description: 
#-------------------------------------------------------------------------------
oracle_install_crs_Timezone="Asia/Shanghai"

这个文件的值是来源于 $GRID_HOME/inventory/globalvariables/oracle.crs/globalvariables.xml,对应的安装日志中也有相关记录:

## $GRID_HOME/inventory/globalvariables/oracle.crs/globalvariables.xml
<VAR NAME="oracle_install_crs_Timezone" TYPE="STRING" VALUE="Asia/Shanghai" ADVISE="T" COMPUTE_AT_CLONE="F" CALC_REQD="F" CLASS="oracle.sysman.oii.oiis.OiisVariable" EXTENDS_FROM="" ALLOW_ASSIGNMENT="T" HANDLE_ERRORS="F" VAL_REQD="F" SECURE="F" SUPPRESS_ON_SILENT="F" DESC_ID=""/>

## $GRID_HOME/cfgtoollogs/oui/installActions2024-08-15_04-05-22PM.log
INFO: The default response file generated with recorded values is /u01/app/11.2.0/grid/inventory/response/oracle.crs_Complete.rsp
INFO: -destinationFile option was not provided. Saving the default response file as /u01/app/11.2.0/grid/inventory/response/oracle.crs_Complete.rsp
INFO: Recording the installation in file /u01/app/11.2.0/grid/inventory/response/oracle.crs_Complete.rsp.

INFO: INFO: globalpropLocation= /soft/grid/install/../stage/globalvariables/globalvar.xml
INFO: INFO: oracle.install.tb.globalvarpath= /soft/grid/install/../stage/globalvariables/globalvar.xml

INFO: Reading global variables from file /soft/grid/install/../stage/globalvariables/globalvar.xml

INFO: ----------------------------------------------------------------------------------------------------
INFO:  PROPERTY                                               VALUE                                      
INFO: ----------------------------------------------------------------------------------------------------
INFO:  oracle_install_crs_Timezone                            Asia/Shanghai           
INFO: adding the variable oracle_install_crs_Timezone to command line args table
INFO: Setting variable 'oracle_install_crs_Timezone' to 'Asia/Shanghai'. Received the value from the command line.

## $GRID_HOME/install/root_rocky9-01_2024-08-15_16-14-17.log
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params

## $GRID_HOME/cfgtoollogs/crsconfig/rootcrs_rocky9-01.log
2024-08-15 16:14:17: The configuration parameter file /u01/app/11.2.0/grid/crs/install/crsconfig_params is valid
2024-08-15 16:14:17: ### Printing the configuration values from files:
2024-08-15 16:14:17:    /u01/app/11.2.0/grid/crs/install/crsconfig_params
2024-08-15 16:14:17: paramfile=/u01/app/11.2.0/grid/crs/install/crsconfig_params

TZ 的值是通过 /soft/grid/install/../stage/globalvariables/globalvar.xml 文件读取的,所以只需要搞清楚 globalvar.xml 的值是如何注入的就知道了。

这个可以在安装软件解压后的文件中找到相关定义:

## stage/globalvariables/variable.properties
variables_xmls=globalvar.xml

## stage/globalvariables/globalvar.xml
<!-- GLOBAL VARIABLES FOR VENDOR CLUSTERWARE INSTALL -->
      <VAR NAME="oracle_install_crs_Timezone" TYPE="STRING" VALUE="" ALLOW_ASSIGNMENT="T"/>

在软件包刚解压时,oracle_install_crs_Timezone 对应的 VALUE 值为空,在安装过程中被写入,至于如何写入的,这个在安装日志中没有找到,怀疑可能是核心 jar 包中获取写入的。

问题三

至于最后一个问题,监听为什么会受这个文件影响?这个就需要追溯到监听是在哪一步启动了,在前几天写过一篇:Oracle RAC 集群启动顺序 就有这个答案:

Oracle RAC 修改系统时区避坑指南(深挖篇)_数据库查询_02

监听是由 oraagent 服务启动,所以需要查看 bin/oraagent 脚本文件,对应的代码段为:

case $0 in
*.bin) 
    ORASYM=/u01/app/11.2.0/grid/bin/`basename $0 .bin`
    ;;
*)     
    ORASYM=$0.bin
    ;;
esac

exec $ORASYM "$@"

这个文件是为了找 oraagent.bin 执行,看起来并没有相关 TZ 配置文件的调用,但是在脚本中我发现了:

*ohasd*)
    CRSWRAPEXECE="/u01/app/11.2.0/grid/bin/crswrapexece.pl"
    ENV_FILE="${ORA_CRS_HOME}/crs/install/s_crsconfig_${MY_HOST}_env.txt"
    export ENV_FILE

由此看来,这个文件是在调用 ohasd 时调用生效,而 ohasd 是最初启动的进程:

Oracle RAC 修改系统时区避坑指南(深挖篇)_数据库查询_03

查看 $GRID_HOME/log/rocky9-01/client/crswrapexece.log 日志也可以看到:

20-Aug-24 12:46 Executed cmd: /u01/app/11.2.0/grid/bin/crswrapexece.pl /u01/app/11.2.0/grid/crs/install/s_crsconfig_rocky9-01_env.txt /u01/app/11.2.0/grid/bin/ohasd.bin reboot
20-Aug-24 12:46 executing "/u01/app/11.2.0/grid/bin/ohasd.bin reboot"

这样的话,一切就说的通了,在 ohasd 服务启动时,会去获取 s_crsconfig_${MY_HOST}_env.txt 文件作为环境变量文件 ENV_FILE,然后生效该环境变量文件,自然也就生效了 TZ 时区变量,后续启动监听时也会读取 TZ 变量值。

写在最后

虽然还是有一些疑问❓没有得到解答,比如 TZ 到底是如何从操作系统层面获取并赋值的?但是通过这一次深挖,确实对 Grid 的安装启动流程了解更加清晰了,希望能对大家也有所帮助!



感谢您的阅读,这里是 Lucifer三思而后行,欢迎 点赞+关注,我会持续分享数据库知识、运维技巧。