1、刚开始/var/log/ha-log日志报kkmail_postgresql启动不了,发现/etc/init.d/目录下这个服务文件都没有,成为临时文件了。于是以为在/etc/ha.d/haresources文件中去掉这个服务就可以。

ResourceManager(default)[3839]: 2018/05/01_18:07:52 ERROR: Cannot locate resource script kkmail_postgresql
ResourceManager(default)[3839]: 2018/05/01_18:07:52 ERROR: Cannot locate resource script kkmail_postgresql
ResourceManager(default)[3839]: 2018/05/01_18:07:52 ERROR: Cannot locate resource script kkmail_postgresql
ResourceManager(default)[3839]: 2018/05/01_18:07:52 CRIT: Giving up resources due to failure of kkmail_postgresql

备注:/etc/ha.d/haresources原本中有kkmail_nginx kkmail_mysqld kkmail_app kkmail_postgresql服务。



2、但是去掉后并重启了服务器,发现依然不行。/var/log/ha-log日志现在提示app服务启动错误,手动启动app服务也提示错误。

ResourceManager(default)[6339]: 2018/05/01_18:44:47 ERROR: Return code 2 from /etc/init.d/kkmail_app
ResourceManager(default)[6339]: 2018/05/01_18:44:47 CRIT: Giving up resources due to failure of kkmail_app
[root@hlt1 app]# /etc/init.d/kkmail_app start
authenticator: started
rulefilter: started
receiver: started
dispatcher: started
sizequerier: started
proxy_monitor: start failure exit_code=  1


我们用到了supervisor进程监控,发现有几个进程启动不起来,启动就包括proxy_monitor。

[root@hlt1 app]# /usr/local/kkmail/app/engine/bin/supervisorctl -c /usr/local/kkmail/app/conf/supervisord.ini 
authenticator                    STOPPED    Not started
dispatcher                       STOPPED    Not started
ldap_monitor                     FATAL      Exited too quickly (process log may have details)
operation                        RUNNING    pid 26746, uptime 0:00:12
proxy_monitor                    STOPPED    Not started
receiver                         STOPPED    Not started
rulefilter                       STOPPED    Not started
search_manager                   BACKOFF    Exited too quickly (process log may have details)
service_listener                 FATAL      Exited too quickly (process log may have details)
sizequerier                      STOPPED    Not started
smtp_monitor                     FATAL      Exited too quickly (process log may have details)
task_monitor                     FATAL      Exited too quickly (process log may have details)
kkmail_clamd                      STARTING   
kkmail> restart all
kkmail_clamd: stopped
operation: stopped
sizequerier: started
authenticator: started
rulefilter: started
receiver: started
dispatcher: started
ldap_monitor: ERROR (abnormal termination)
proxy_monitor: ERROR (abnormal termination)
service_listener: ERROR (abnormal termination)
operation: started
smtp_monitor: ERROR (abnormal termination)
task_monitor: ERROR (abnormal termination)
search_manager: ERROR (abnormal termination)
kkmail_clamd: started
kkmail> exit


备注:
我刚开始好像打不开,报错如下:
Error: Another program is already listening on a port that one of our HTTP servers is configured to use.  Shut this program down first before starting supervisord.
For help, use /usr/local/kkmail/app/engine/bin/supervisord -h
然后我ps -ef |grep supervisord,然后kill -9 id后,运行这个命令好了/usr/local/kkmail/app/engine/bin/supervisord -c /usr/local/kkmail/app/conf/supervisord.ini



3、想知道为什么启动不了,先开启下日志看看。

[root@hlt1 store]# cat /usr/local/kkmail/app/conf/supervisord.conf.d/proxy_monitor.ini 
; proxy_monitor
;
[program:proxy_monitor]
command=/usr/local/kkmail/app/exec/proxy_monitor
autostart=false
autorestart=true
startsecs=0
user=kkmail
stdout_logfile=NONE
stdout_logfile=/usr/local/kkmail/app/log/proxy_monitor.log   #把这行开启
;stdout_logfile_maxbytes=50MB
;stdout_logfile_backups=5
stderr_logfile=NONE
stderr_logfile=/usr/local/kkmail/app/log/proxy_monitor.err   #把这行开启
;stderr_logfile_maxbytes=50MB
;stderr_logfile_backups=5
[root@hlt1 store]# cat /usr/local/kkmail/app/log/proxy_monitor.log
2018-05-01 19:10:47 Proxy_Monitor[ERROR]: Traceback (most recent call last):
2018-05-01 19:10:47 Proxy_Monitor[ERROR]:   File "/usr/local/kkmail/service/python/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run
2018-05-01 19:10:47 Proxy_Monitor[ERROR]:     result = self._run(*self.args, **self.kwargs)
2018-05-01 19:10:47 Proxy_Monitor[ERROR]:   File "/usr/local/kkmail/app/src/proxy_monitor.py", line 49, in check_open_subprocess
2018-05-01 19:10:47 Proxy_Monitor[ERROR]:   File "/usr/local/kkmail/app/src/lib/ModProxyRule.py", line 47, in is_proxy_open
2018-05-01 19:10:47 Proxy_Monitor[ERROR]:   File "/usr/local/kkmail/app/src/lib/ModProxyRule.py", line 15, in get_proxy_cache
2018-05-01 19:10:47 Proxy_Monitor[ERROR]:   File "/usr/local/kkmail/app/engine/lib/python2.7/site-packages/redis/client.py", line 1231, in ttl
2018-05-01 19:10:47 Proxy_Monitor[ERROR]:     return self.execute_command('TTL', name)
2018-05-01 19:10:47 Proxy_Monitor[ERROR]:   File "/usr/local/kkmail/app/engine/lib/python2.7/site-packages/redis/client.py", line 673, in execute_command
2018-05-01 19:10:47 Proxy_Monitor[ERROR]:     connection.send_command(*args)
2018-05-01 19:10:47 Proxy_Monitor[ERROR]:   File "/usr/local/kkmail/app/engine/lib/python2.7/site-packages/redis/connection.py", line 610, in send_command
2018-05-01 19:10:47 Proxy_Monitor[ERROR]:     self.send_packed_command(self.pack_command(*args))
2018-05-01 19:10:47 Proxy_Monitor[ERROR]:   File "/usr/local/kkmail/app/engine/lib/python2.7/site-packages/redis/connection.py", line 585, in send_packed_command
2018-05-01 19:10:47 Proxy_Monitor[ERROR]:     self.connect()
2018-05-01 19:10:47 Proxy_Monitor[ERROR]:   File "/usr/local/kkmail/app/engine/lib/python2.7/site-packages/redis/connection.py", line 489, in connect
2018-05-01 19:10:47 Proxy_Monitor[ERROR]:     raise ConnectionError(self._error_message(e))
2018-05-01 19:10:47 Proxy_Monitor[ERROR]: ConnectionError: Error 111 connecting to unix socket: /usr/local/kkmail/data/redis/redis.sock. Connection refused.
2018-05-01 19:10:47 Proxy_Monitor[ERROR]: <Greenlet at 0x7fba5cc4de10: check_open_subprocess> failed with ConnectionError



4、看上面报错好像是redis没启动成功造成的,然后依次把mysql、app重启下,重启后再到supervisor重启所有进程。

[root@hlt1 supervisord.conf.d]# /usr/local/kkmail/app/engine/bin/supervisorctl -c /usr/local/kkmail/app/conf/supervisord.ini 
authenticator                    RUNNING    pid 35317, uptime 0:00:15
dispatcher                       RUNNING    pid 35431, uptime 0:00:14
ldap_monitor                     RUNNING    pid 35295, uptime 0:00:15
operation                        RUNNING    pid 35299, uptime 0:00:15
proxy_monitor                    RUNNING    pid 35503, uptime 0:00:13
receiver                         RUNNING    pid 35404, uptime 0:00:14
rulefilter                       RUNNING    pid 35375, uptime 0:00:15
search_manager                   RUNNING    pid 35298, uptime 0:00:15
service_listener                 RUNNING    pid 35297, uptime 0:00:15
sizequerier                      RUNNING    pid 35474, uptime 0:00:14
smtp_monitor                     RUNNING    pid 35301, uptime 0:00:15
task_monitor                     STARTING   
kkmail_clamd                      STARTING   
kkmail> restart all
ldap_monitor: stopped
proxy_monitor: stopped
kkmail_clamd: stopped
service_listener: stopped
sizequerier: stopped
authenticator: stopped
rulefilter: stopped
search_manager: stopped
receiver: stopped
operation: stopped
smtp_monitor: stopped
task_monitor: stopped
dispatcher: stopped
sizequerier: started
authenticator: started
rulefilter: started
receiver: started
dispatcher: started
ldap_monitor: started
proxy_monitor: started
service_listener: started
search_manager: started
operation: started
smtp_monitor: started
kkmail_clamd: started
task_monitor: started



5、虽然解决了这些问题,但是重启服务器发现redis服务依然不能开机启动,还有apache服务(chkconfig没问题)。redis启动不起来就造成app启动不了,app启动不起来就造成挂载失败。
后来我把/etc/ha.d/haresources文件只留kkmail_nginx服务后,重启服务器就可以挂载了。然后手动启动app和mysqld。


备注:

至于为什么开机redis和apache是停止的状态,可以检测下pid文件。



6、两台服务器软件都升级到最新版本,并且停止ha服务来回测试OK。