postgres启动失败问题分析与处理

     很多企业使用开源的操作系统例如centos,也是用开源的数据库例如mysql或者postgres,在安装使用过程中经常出现一些不可预知的错误,导致安装失败或者启动失败等。

例如需要安装gcc 或者最新lib等,也有因为安全控件问题阻挡软件正常安装使用,或者有时平常运行好好的,突然重新启动竟然报错,让人捉摸不清问题原因。

这两天客户一个系统使用postgres数据库,做了主从,但是因为断电问题导致主从数据同步出现问题,该项目经理让我帮忙分析处理,出于个人懒,

就找了17年帮他们项目组搭建的postgre现有设置好的主从测试开发环境,该环境持续运行一年半,一直没出现问题,我想模拟下,客户生产

故障,一个原因是模拟直接kill 主库进程,一个模拟强制断电关机看是否能正常启动,并自动主从数据同步,结果世事难料,在模拟直接kill掉进程后,竟然启动不了,具体错误如下:

看了半天错误信息,看到了ERROR: Unable to open policy //etc/selinux/targeted/policy/policy.30.

[root@DB1 postgresql_data]# journalctl -xe

9 27 12:43:59 DB1 dbus[784]: [system] Activating service name='org.fedoraproject.Setroubleshootd' (using servicehelper)

9 27 12:43:59 DB1 dbus-daemon[784]: dbus[784]: [system] Activating service name='org.fedoraproject.Setroubleshootd' (using servicehelper)

9 27 12:43:59 DB1 dbus-daemon[784]: ERROR: policydb version 30 does not match my version range 15-29

9 27 12:43:59 DB1 dbus-daemon[784]: ERROR: Unable to open policy //etc/selinux/targeted/policy/policy.30.

9 27 12:43:59 DB1 python[9716]: detected unhandled Python exception in '/usr/sbin/setroubleshootd'

9 27 12:43:59 DB1 abrt-server[9721]: Not saving repeating crash in '/usr/sbin/setroubleshootd'

9 27 12:43:59 DB1 dbus-daemon[784]: Traceback (most recent call last):

9 27 12:43:59 DB1 dbus-daemon[784]: File "/usr/sbin/setroubleshootd", line 30, in <module>

9 27 12:43:59 DB1 dbus-daemon[784]: from setroubleshoot.util import log_debug

9 27 12:43:59 DB1 dbus-daemon[784]: File "/usr/lib64/python2.7/site-packages/setroubleshoot/util.py", line 291, in <module>

9 27 12:43:59 DB1 dbus-daemon[784]: from sepolicy import get_all_file_types

9 27 12:43:59 DB1 dbus-daemon[784]: File "/usr/lib64/python2.7/site-packages/sepolicy/__init__.py", line 798, in <module>

9 27 12:43:59 DB1 dbus-daemon[784]: raise e

9 27 12:43:59 DB1 dbus-daemon[784]: ValueError: Failed to read //etc/selinux/targeted/policy/policy.30 policy file

9 27 12:43:59 DB1 dbus[784]: [system] Activated service 'org.fedoraproject.Setroubleshootd' failed: Launch helper exited with unknown return code 1

9 27 12:43:59 DB1 dbus-daemon[784]: dbus[784]: [system] Activated service 'org.fedoraproject.Setroubleshootd' failed: Launch helper exited with unknown return code 1

9 27 12:43:59 DB1 dbus-daemon[784]: dbus[784]: [system] Activating service name='org.fedoraproject.Setroubleshootd' (using servicehelper)

9 27 12:43:59 DB1 dbus[784]: [system] Activating service name='org.fedoraproject.Setroubleshootd' (using servicehelper)

9 27 12:43:59 DB1 fprintd[9127]: ** Message: No devices in use, exit

9 27 12:43:59 DB1 dbus-daemon[784]: ERROR: policydb version 30 does not match my version range 15-29

9 27 12:43:59 DB1 dbus-daemon[784]: ERROR: Unable to open policy //etc/selinux/targeted/policy/policy.30.


 修改selinux 比较麻烦,时间有限,我选择流氓做法,直接设置SELINUX=disabled,然后在重新启动postgres,竟然报数据文件目录权限问题,如下:

9 27 12:37:26 DB1 systemd[1]: Starting PostgreSQL 9.5 database server...

9 27 12:37:26 DB1 pg_ctl[3560]: < 2018-09-27 12:37:26.211 CST >FATAL:  data directory "/home/postgresql_data" has group or world access

9 27 12:37:26 DB1 pg_ctl[3560]: < 2018-09-27 12:37:26.211 CST >DETAIL:  Permissions should be u=rwx (0700).

9 27 12:37:27 DB1 pg_ctl[3560]: pg_ctl: 无法启动服务器进程

9 27 12:37:27 DB1 pg_ctl[3560]: 检查日志输出.

9 27 12:37:27 DB1 systemd[1]: postgresql-9.5.service: control process exited, code=exited status=1

9 27 12:37:27 DB1 systemd[1]: Failed to start PostgreSQL 9.5 database server.

9 27 12:37:27 DB1 systemd[1]: Unit postgresql-9.5.service entered failed state.

9 27 12:37:27 DB1 systemd[1]: postgresql-9.5.service failed.

 根据错误信息提示,信息给得很详细,postgresql的数据文件权限被改了,现在不是0700(只有用户权限)

重新 chmod 700 -R 赋权后重启,问题解决,启动成功,该数据库运行接近两年,一直相安无事,这次竟然报该错误信息。

postgres启动失败问题分析与处理_postgres启动失败问题分析与处理