这里说到的迁移并非使用迁移工具进行的迁移,而是因为原服务器根目录无法挂载,但数据目录正常的情况下拷贝整个数据目录至新的服务器模板中,以模板安装版本和原有实例数据启动服务。但启动服务后一会服务自动停掉。然后一整个的排除过程,最后发现可能最开始就找错了方向。好在这只是测试环境,也还有旧版本数据可供暂时连接,若是生产环境……懂得都懂哈。
新服务器为达梦安装模板,达梦软件已经安装,拷贝数据目录部分省略,拷贝至新服务器后注意需要更改dm.ini和dm.ctl中对应目录为当前环境目录
sed -i 's#/dmdata/DAMENG#/dm/dmdata/DAMENG#g' /dm/dmdata/DAMENG/dm.ini
./dmctlcvt TYPE=1  SRC=/dm/dmdata/DAMENG/dm.ctl DEST=/dm/dmbak/dmctl.txt
sed -i 's#/dmdata/DAMENG#/dm/dmdata/DAMENG#g' /dm/dmbak/dmctl.txt
./dmctlcvt TYPE=2  SRC=/dm/dmbak/dmctl.txt  DEST=/dm/dmdata/DAMENG/dm.ctl
拷贝新的dm.key文件到bin目录,然后启动服务器,但未成功,出现以下信息,提示将CHECK_SVR_VERSION参数改为0

达梦数据迁移 AUTO_INCREMENT 达梦数据迁移工具闪退_DSC

修改参数后再次启动,但是启动后很快就自动停止了服务

达梦数据迁移 AUTO_INCREMENT 达梦数据迁移工具闪退_数据目录_02

查看日志记录如下:
2023-04-24 17:09:01.536 [INFO] database P0000145570 T0000000000000145570  version info: enterprise
2023-04-24 17:09:01.543 [INFO] database P0000145570 T0000000000000145570  Database's huge_with_delta is 1, and rlog_gen_for_huge is 0!
2023-04-24 17:09:01.543 [INFO] database P0000145570 T0000000000000145570  os_sema2_create_low, create and inc sema success, key:109211175, sem_id:262144, sem_value:1!
2023-04-24 17:09:01.546 [INFO] database P0000145570 T0000000000000145570  DM Database Server x64 V8 1-2-98-22.11.18-174995-10040-ENT  startup...
2023-04-24 17:09:01.547 [INFO] database P0000145570 T0000000000000145570  INI parameter ROLLSEG_POOLS changed, the original value 19, new value 1
2023-04-24 17:09:01.633 [WARNING] database P0000145570 T0000000000000145570  fail to load libgeos_c.so.1.13.3, /home/dmdba/dmdbms/bin/libgeos_c.so.1.13.3: cannot open shared object file: No such file or directory
2023-04-24 17:09:01.636 [WARNING] database P0000145570 T0000000000000145570  fail to load libproj.so, /home/dmdba/dmdbms/bin/libproj.so: cannot open shared object file: No such file or directory
2023-04-24 17:09:01.637 [WARNING] database P0000145570 T0000000000000145570  fail to load libxqilla.so, libnsl.so.1: cannot open shared object file: No such file or directory
2023-04-24 17:09:05.052 [WARNING] database P0000145570 T0000000000000145570  fail to load libgssapi_krb5.so, /home/dmdba/dmdbms/bin/libgssapi_krb5.so: cannot open shared object file: No such file or directory
……中间部分太多省略
2023-04-24 17:09:05.208 [INFO] database P0000145570 T0000000000000145570  backup control file /dm/dmdata/DAMENG/dm.ctl to file /dm/dmdata/DAMENG/dm_20230424170905_208590.ctl
2023-04-24 17:09:05.213 [INFO] database P0000145570 T0000000000000145570  backup control file /dm/dmdata/DAMENG/dm.ctl to file /dm/dmdata/DAMENG/ctl_bak/dm_20230424170905_210110.ctl succeed
2023-04-24 17:09:05.213 [INFO] database P0000145570 T0000000000000145570  local instance name is DMSERVER, mode is NORMAL, status is OPEN.
2023-04-24 17:09:05.213 [INFO] database P0000145570 T0000000000000145570  SYSTEM IS READY.
2023-04-24 17:09:05.213 [INFO] database P0000145570 T0000000000000145570  set g_dw_stat from UNDEFINED to NONE success, g_dw_recover_stop is 0
2023-04-24 17:09:06.207 [INFO] database P0000145570 T0000000000000145620  trx4_min_tid_collect set min_active_id_opt, min_active_id: 204526929, first_tid: 204526918
2023-04-24 17:10:00.030 [FATAL] database P0000145570 T0000000000000145927  Fail to find file in current system. tsid:7, fileid:0
2023-04-24 17:10:00.030 [FATAL] database P0000145570 T0000000000000145927  code = -1, dm_sys_halt now!!!
2023-04-24 17:10:00.030 [INFO] database P0000145570 T0000000000000145927  total 2 rfil opened!
2023-04-24 17:10:00.031 [FATAL] database P0000145570 T0000000000000145927  sigterm_handler receive signal 8
因前面版本问题有过提示,日志中间也没有ERROR部分,就往数据库版本方向不一致去分析了,先后尝试了三个版本,其中前两个版本是一样的情况启动后自动停服务,最后一个版本是无法启动。

达梦数据迁移 AUTO_INCREMENT 达梦数据迁移工具闪退_启动服务_03

达梦数据迁移 AUTO_INCREMENT 达梦数据迁移工具闪退_DSC_04


最后版本截图是在自己环境安装后的截图

达梦数据迁移 AUTO_INCREMENT 达梦数据迁移工具闪退_运维_05


最后这个版本提示upgrade is not allowed,就先不考虑了

达梦数据迁移 AUTO_INCREMENT 达梦数据迁移工具闪退_启动服务_06

后面还是选择和之前版本最接近的98版本,考虑到安装过高版本可能会使软件认为升级过,所以将数据目录全部重新拷贝,并修改配置文件中的相关路径,再次启动还是和之前一样的结果。收集core文件进行分析
!#%&*^$@[499744]: select task.guid, task.threadname, task.jobguid from bi_schd_taskqueue task where ( task.runstate = 1 or task.runstate = 0 ) and task.servername = ? and task.jobguid in ( select guid from bi_schd_jobqueue where ( runstate = 0 or runstate = 1 ) and moduleid = 'nc.bs.smart.db.TempTableSchdTask' and ? >= starttime )            
!#%&*^$@[499743]:
SELECT BIGDATEDIFF(SECOND,START_TIME,GETDATE()) run_second
from V$INSTANCE

--只看到是499744这个线程造成,但是语句中的这个表不是业务表,暂时未有头绪
Thread 1 (LWP 499744):
#0  0x000000000155c597 in assert_fun ()
#1  0x00000000015723e8 in sigterm_handler ()
#2  0x00007f374194b930 in ?? ()
#3  0x0000000000000007 in ?? ()
#4  0x0000000000000000 in ?? ()
日志最后还是和之前一样
2023-04-24 17:21:19.701 [INFO] database P0000149138 T0000000000000149138  backup control file /dm/dmdata/DAMENG/dm.ctl to file /dm/dmdata/DAMENG/ctl_bak/dm_20230424172119_697466.ctl succeed
2023-04-24 17:21:19.701 [INFO] database P0000149138 T0000000000000149138  local instance name is DMSERVER, mode is NORMAL, status is OPEN.
2023-04-24 17:21:19.701 [INFO] database P0000149138 T0000000000000149138  SYSTEM IS READY.
2023-04-24 17:21:19.701 [INFO] database P0000149138 T0000000000000149138  set g_dw_stat from UNDEFINED to NONE success, g_dw_recover_stop is 0
2023-04-24 17:21:20.695 [INFO] database P0000149138 T0000000000000149161  trx4_min_tid_collect set min_active_id_opt, min_active_id: 204527924, first_tid: 204527920
--下面这一行要注意很重要,具体的原因其实已经写了,但是最开始并没有意识到!!!
2023-04-24 17:22:24.259 [FATAL] database P0000149138 T0000000000000149480  Fail to find file in current system. tsid:7, fileid:0
2023-04-24 17:22:24.259 [FATAL] database P0000149138 T0000000000000149480  code = -1, dm_sys_halt now!!!
2023-04-24 17:22:24.259 [INFO] database P0000149138 T0000000000000149480  total 2 rfil opened!
2023-04-24 17:22:24.259 [FATAL] database P0000149138 T0000000000000149480  sigterm_handler receive signal 8
其实日志一早就有提示,在第一个FATAL的这一行就是服务器停止的原因,但是这个究竟是什么意思呢。直到原厂大佬提示看一下dm.ctl文件中的ts_id=7才反应过来,原来tsid:7对应的是dm.ctl中的ts_id=7.赶紧查看一下文件中的具体情况
cat /dm/dmbak/dmctl.txt
#===============================================
# table space name
ts_name=nnc_data01
 # table space ID
ts_id=7
# table space status
ts_state=0
# table space cache
ts_cache=NORMAL
# DSC node number
ts_nth=0
# DSC optimized node number
ts_opt_node=0
# table space create time
ts_create_time=DATETIME '2022-3-16 13:36:27'
# table space modify time
ts_modify_time=DATETIME '2022-3-16 13:37:0'
# table space encrypt flag
ts_encrypt_flag=0
# table space copy num
ts_copy_num=0
# table space region size flag
ts_size_flag=0
#-----------------------------------------------
#===============================================
# table space name
ts_name=nnc_index01
 # table space ID
ts_id=8
# table space status
ts_state=0
# table space cache
ts_cache=NORMAL
# DSC node number
ts_nth=0
# DSC optimized node number
ts_opt_node=0
# table space create time
ts_create_time=DATETIME '2022-3-16 13:36:27'
# table space modify time
ts_modify_time=DATETIME '2022-3-16 13:37:46'
# table space encrypt flag
ts_encrypt_flag=0
# table space copy num
ts_copy_num=0
# table space region size flag
ts_size_flag=0
#-----------------------------------------------
文件中ts_id=7和ts_id=8的表空间并未在现有数据目录中看到相应的DBF文件,应该是数据库模板生成但后面有删除了的表空间。

达梦数据迁移 AUTO_INCREMENT 达梦数据迁移工具闪退_启动服务_07

所以修改/dm/dmbak/dmctl.txt文件,将ts_id=7和ts_id=8部分全部注释掉,重新生成dm.ctl控制文件再启动服务
vim /dm/dmbak/dmctl.txt
#===============================================
# table space name
#ts_name=nnc_data01
 # table space ID
#ts_id=7
# table space status
#ts_state=0
# table space cache
#ts_cache=NORMAL
# DSC node number
#ts_nth=0
# DSC optimized node number
#ts_opt_node=0
# table space create time
#ts_create_time=DATETIME '2022-3-16 13:36:27'
# table space modify time
#ts_modify_time=DATETIME '2022-3-16 13:37:0'
# table space encrypt flag
#ts_encrypt_flag=0
# table space copy num
#ts_copy_num=0
# table space region size flag
#ts_size_flag=0
#-----------------------------------------------
#===============================================
# table space name
#ts_name=nnc_index01
 # table space ID
#ts_id=8
# table space status
#ts_state=0
# table space cache
#ts_cache=NORMAL
# DSC node number
#ts_nth=0
# DSC optimized node number
#ts_opt_node=0
# table space create time
#ts_create_time=DATETIME '2022-3-16 13:36:27'
# table space modify time
#ts_modify_time=DATETIME '2022-3-16 13:37:46'
# table space encrypt flag
#ts_encrypt_flag=0
# table space copy num
#ts_copy_num=0
# table space region size flag
#ts_size_flag=0
#-----------------------------------------------
重新生成dm.ctl控制文件
./dmctlcvt TYPE=2  SRC=/dm/dmbak/dmctl.txt  DEST=/dm/dmdata/DAMENG/dm.ctl

启动服务后进程正常,不再自动停止
systemctl start DmServiceDMSERVER

ps -ef |grep dmserver |grep -v grep
dmdba   537156   1 21 12:50  ?  00:53:23 /home/dmdba/dmdbms/bin/dmserver path=/dm/dmdata/DAMENG/dm.ini -noconsole
所以学会看日志真的很重要,先看ERROR,没有就看FATAL的第一行信息至关重要!!!同样报错的信息和配置文件、控制文件信息要对应。这样才能更快找到原因。以上就是这一次运维分享,希望对需要的伙伴有用。