问题:蓝鲸社区版完整部署,执行安装bkdata时,报“databus.service.consul start failed.”
[root@paas-1 install]# ./bkcec start bkdata
[192.168.50.117]20181212-091416 72 starting bkdata(ALL) on host: 192.168.50.115
"-":23: bad minute
errors in crontab file, can't install.
[192.168.50.117]20181212-091427 79 going to init snapshot data. this may take a while.
E
======================================================================
ERROR: init_snapshot_config (databus.tests.DatabusHealthTestCase)
------------------------------------------------------------------------------------------------------------------------------
排查思路:
执行dig databus.service.consul正常
执行./bkcec start bkdata databus提示“ERROR: init_snapshot_config (databus.tests.DatabusHealthTestCase)”
执行./bkcec stop bkdata之后,执行./bkcec install bkdata 1(去除之前的环境,覆盖安装)
执行./bkcec initdata bkdata(初始化bkdata)
执行./bkcec start bkdata,再次报“databus.service.consul start failed.”
注释:cat .bk_install.step可查看安装进度...
问题解决:重启了cmdb
原因分析:
bkdata从cmdb获取基础业务的信息获取不到,导致报错。
还有个是脚本bug
(bkdata机器上,执行vim /data/bkce/bkdata/dataapi/databus/tests.py,将 “update_bizid” 引用这个字段的内容注释掉,该问题在下个版本中会进行修复。)
tests.py内容注释掉之后的效果图示:
其他辅助操作命令:
确认中控机位置:
查看日志:
[root@paas-1 install]# cat /data/install/.controller_ip
192.168.50.117
[root@paas-1 install]# cd /data/bkce/logs/
[root@paas-1 logs]# ll
加载ssh工具($:代表变量,cat /data/install/utils.fc)
[root@paas-1 install]# source utils.fc
[root@paas-1 install]# ssh $BKDATA_IP
#ssh登录主机后,可以执行ifconfig查看对应主机ip,utils.fc为脚本文件。加载utils.fc主要是为了调用服务名称登录主机。而不需要以ip的方式登录主机。
[root@rbtnode1 install]# ssh $FTA_IP
查看详细:
[root@rbtnode1 bkdata]# ls -lsrt
显示日志信息:
[root@rbtnode1 bkdata]# tail -f kernel.log
查看性能资源:
[root@rbtnode1 bkdata]# top
top - 09:35:26 up 18:53, 1 user, load average: 17.46, 12.08, 10.62
Tasks: 361 total, 1 running, 359 sleeping, 1 stopped, 0 zombie
%Cpu(s): 64.0 us, 14.2 sy, 0.0 ni, 20.8 id, 0.2 wa, 0.0 hi, 0.9 si, 0.0 st
KiB Mem : 16267340 total, 2454172 free, 12075284 used, 1737884 buff/cache
KiB Swap: 6160380 total, 6160380 free, 0 used. 3669524 avail Mem
查看启动任务计划:确保服务是否正常运行,或看配置文件是否选举出集群领导者
如有乱码,可以执行清除任务计划,然后[root@rbtnode1 install]# ./bkcec install cron 1进行重新安装crontab,服务启动的时候自动会写入crontab
[root@rbtnode1 ~]# crontab -l
* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch consul >/dev/null 2>&1
* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch nginx >/dev/null 2>&1
* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch zk >/dev/null 2>&1
* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch rabbitmq >/dev/null 2>&1
* * * * * /usr/local/gse/agent/bin/gsectl watch
* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch paas_agent >/dev/null 2>&1
* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch es >/dev/null 2>&1
* * * * * export INSTALL_PATH=/data/bkce; /data/bkce/bin/process_watch kafka >/dev/null 2>&1
*/10 * * * * /data/bkce/bkdata/dataapi/bin/update_cc_cache.sh
查看进程信息:
[root@rbtnode1 ~]# ps -ef |grep bkdata
[root@rbtnode1 ~]# ps -ef |grep gse_agent
说明:脚本bug的问题主要是为了解决安装部署蓝鲸时,在初始化bkdata遇到的以下问题:
解决方法:bkdata机器上,执行vim /data/bkce/bkdata/dataapi/databus/tests.py,将 “update_bizid” 引用这个字段的内容注释掉。
原因分析:如不注释掉,该引用的字段内容,将消耗很大的主机资源,导致主机因性能瓶颈以致蓝鲸服务拉不起来。
[root@paas-1 install]# ./bkcec initdata bkdata
initdata for bkdata()
[192.168.50.117]20181212-101752 153 exec initdata_bkdata on 192.168.50.115
[192.168.50.115]20181212-101755 103 start to make migration for bkdata ...
[192.168.50.115]20181212-101755 111 on-migrate ... /data/bkce/bkdata/dataapi/on_migrate
[192.168.50.115]20181212-101757 9 init dataserver zk config
[192.168.50.115]20181212-101757 12 create topic
[192.168.50.115]20181212-101758 15 run trt migration
System check identified some issues:
WARNINGS:
trt.TrtResultTableField.field_index: (fields.W122) 'max_length' is ignored when used with IntegerField
HINT: Remove 'max_length' from field
Operations to perform:
Apply all migrations: trt
Running migrations:
No migrations to apply.
Your models have changes that are not yet reflected in a migration, and so won't be applied.
Run 'manage.py makemigrations' to make new migrations, and then re-run 'manage.py migrate' to apply them.
[192.168.50.115]20181212-101801 18 insert reserved dataid
E=================set reserved dataid========================================
======================================================================
ERROR: update_reserved_dataid (databus.tests.DatabusHealthTestCase)
------------------------------------------------------------------------------------------------------------------------------
Traceback (most recent call last):
File "/data/bkce/bkdata/dataapi/databus/tests.py", line 46, in update_reserved_dataid
blueking_bizid = utils.get_blueking_bizid()
File "/data/bkce/bkdata/dataapi/databus/init/utils.py", line 19, in get_blueking_bizid
raise Exception('Failed to get application id of BlueKing. The response is error %s' % json.dumps(ret))
Exception: Failed to get application id of BlueKing. The response is error {"message": "Component request third-party system [CC] interface [get_app_list] error: Status Code: 404, Error Message: Third-party system does not find this interface, please try again later or contact component developer to handle this", "code": 1306201, "data": null, "result": false, "request_id": "47ef124353824f7a898900c0defc93e1"}
----------------------------------------------------------------------
Ran 1 test in 0.759s
FAILED (errors=1)
[192.168.50.115]20181212-101804 21 running 'update_reserved_dataid' for databus health test failed.
[192.168.50.115]20181212-101804 130 migrate failed for bkdata(dataapi)
[192.168.50.117]20181212-101803 453 create database bksuite_common
[192.168.50.117]20181212-101803 455 add version info to db
环境说明:
[root@paas-1 install]# cd /data/src/
您在 /var/spool/mail/root 中有新邮件
[root@paas-1 src]# grep . VERSION */VERSION */*/VERSION
VERSION:4.1.16
cmdb/VERSION:0.0.42
fta/VERSION:4.1.12
gse/VERSION:3.2.12
job/VERSION:4.3.3
license/VERSION:3.1.4
open_paas/VERSION:3.0.83
paas_agent/VERSION:3.0.8
bkdata/dataapi/VERSION:1.2.105
bkdata/databus/VERSION:1.2.23
bkdata/monitor/VERSION:0.2.6
[root@paas-1 src]#
注,本文章为个人近期学习蓝鲸的内容总结,仅供大家参考学习!