ubuntu18 kolla-ansible安装最全组件openstack
持续更新
preparation
- TLS handshake process
RFC 8446 - The Transport Layer Security (TLS) Protocol Version 1.3 - Certificate_authority
Certificate authority - WikipediaOpenStack Docs: Advanced ConfigurationSelf-signed SSL certificates and how to trust them · Tech Adventures by Tarun Lalwani4.14. Using Shared System Certificates Red Hat Enterprise Linux 7 | Red Hat Customer Portal
部署
- kolla-ansible官方文档,可以解锁kolla-ansible的高级玩法
OpenStack Docs: Welcome to Kolla-Ansible’s documentation! - 官方文档-快速开始
OpenStack Docs: Quick Start
部署环境简介
ansible -i ~/kolla-config/kolla-config/multinode/multinode all -m shell -a "lsb_release -a && ip route | grep ens"
localhost | CHANGED | rc=0 >>
Distributor ID: Ubuntu
Description: Ubuntu 18.04.4 LTS
Release: 18.04
Codename: bionic
default via 10.10.1.254 dev ens33 proto static
10.10.1.0/24 dev ens33 proto kernel scope link src 10.10.1.241 No LSB modules are available.
10.10.1.243 | CHANGED | rc=0 >>
Distributor ID: Ubuntu
Description: Ubuntu 18.04.4 LTS
Release: 18.04
Codename: bionic
default via 10.10.1.254 dev ens33 proto static
10.10.1.0/24 dev ens33 proto kernel scope link src 10.10.1.243 No LSB modules are available.
10.10.1.241 | CHANGED | rc=0 >>
Distributor ID: Ubuntu
Description: Ubuntu 18.04.4 LTS
Release: 18.04
Codename: bionic
default via 10.10.1.254 dev ens33 proto static
10.10.1.0/24 dev ens33 proto kernel scope link src 10.10.1.241 No LSB modules are available.
10.10.1.242 | CHANGED | rc=0 >>
Distributor ID: Ubuntu
Description: Ubuntu 18.04.4 LTS
Release: 18.04
Codename: bionic
default via 10.10.1.254 dev ens33 proto static
10.10.1.0/24 dev ens33 proto kernel scope link src 10.10.1.242 No LSB modules are available.
部署过程
下载kolla和kolla-ansible
创建虚拟ansible
- 修改globals.yml文件和inventory文件
我的globals.yml和inventory文件如下
kolla-config/kolla-config/multinode at master · albertjone/kolla-config · GitHub - 生成密码
cp ~/kolla-ansible/etc/kolla/passwords.yml /etc/kolla/passwords.yml
./kolla-ansible/tools/generate_passwords.py
初始化节点(bootstrap-servers)
apt install sshpass -y
kolla-ansible -i ~/kolla-config/kolla-config/multinode/multinode bootstrap-servers
生成证书(certificates)
./kolla-ansible/tools/kolla-ansible -i ~/kolla-config/kolla-config/multinode/multinode certificates
部署检查(prechecks)
./kolla-ansible/tools/kolla-ansible -i ~/kolla-config/kolla-config/multinode/multinode prechecks
开始部署
cd kolla-ansible/tools
./kolla-ansible -i ../../multinode deploy
问题
ssh使用密码无法使用
报错如下
TASK [Gathering Facts] ************************************************************************************************************************************************************************
fatal: [10.10.1.241]: FAILED! => {"msg": "Using a SSH password instead of a key is not possible because Host Key checking is enabled and sshpass does not support this. Please add this host's fingerprint to your known_hosts file to manage this host."}
解决方法
(build) root@steveguan-1:~# ssh root@10.10.1.241
The authenticity of host '10.10.1.241 (10.10.1.241)' can't be established.
ECDSA key fingerprint is SHA256:DKscpTjQfSK4mT+DmZPU0BKAq80ORJUE+94BSwF5foE.
Are you sure you want to continue connecting (yes/no)? yes
ERROR! ‘listen’ is not a valid attribute for a HandlerTaskInclude
在执行prechecks的时候出现如下报错
(build) root@steveguan-1:~# ./kolla-ansible/tools/kolla-ansible -i ~/kolla-config/kolla-config/multinode/multinode prechecks
Pre-deployment checking : ansible-playbook -i /root/kolla-config/kolla-config/multinode/multinode -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla -e kolla_action=precheck /root/kolla-ansible/ansible/site.yml
[DEPRECATION WARNING]: The TRANSFORM_INVALID_GROUP_CHARS settings is set to allow bad characters in group names by default, this will change, but still be user configurable on deprecation.
This feature will be removed in version 2.10. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
ERROR! 'listen' is not a valid attribute for a HandlerTaskInclude
The error appears to be in '/root/kolla-ansible/ansible/roles/mariadb/handlers/main.yml': line 66, column 3, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
- name: Restart MariaDB on existing cluster members
^ herev
根据https://github.com/ansible/ansible/issues/56580得到的解决方案是
pip install ansible==2.9
Failed to start mariadb
但跑道mariadb的时候无法启动mariadb
TASK [mariadb : Fail on existing but stopped cluster] *****************************************************************************************************************
fatal: [10.10.1.241]: FAILED! => {"changed": false, "msg": "MariaDB cluster exists but is stopped. Please start it using kolla-ansible mariadb_recovery"}
fatal: [10.10.1.242]: FAILED! => {"changed": false, "msg": "MariaDB cluster exists but is stopped. Please start it using kolla-ansible mariadb_recovery"}
fatal: [10.10.1.243]: FAILED! => {"changed": false, "msg": "MariaDB cluster exists but is stopped. Please start it using kolla-ansible mariadb_recovery"}
ansible -i ~/kolla-config/kolla-config/multinode/multinode compute -m shell -a "docker volume rm mariadb"
Failed to start rabbitmq
RUNNING HANDLER [rabbitmq : Waiting for rabbitmq to start on first node] **********************************************************************************************
fatal: [10.10.1.241]: FAILED! => {"changed": true, "cmd": ["docker", "exec", "outward_rabbitmq", "rabbitmqctl", "wait", "/var/lib/rabbitmq/mnesia/rabbitmq.pid"], "delta": "0:00:58.742540", "end": "2020-08-02 16:36:49.006539", "msg": "non-zero return code", "rc": 69, "start": "2020-08-02 16:35:50.263999", "stderr": "Error: unable to perform an operation on node 'rabbit@steveguan-1'. Please see diagnostics information and suggestions below.\n\nMost common reasons for this are:\n\n * Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)\n * CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)\n * Target node is not running\n\nIn addition to the diagnostics info below:\n\n * See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more\n * Consult server logs on node rabbit@steveguan-1\n * If target node is configured to use long node names, don't forget to use --longnames with CLI tools\n\nDIAGNOSTICS\n===========\n\nattempted to contact: ['rabbit@steveguan-1']\n\nrabbit@steveguan-1:\n * connected to epmd (port 4371) on steveguan-1\n * epmd reports: node 'rabbit' not running at all\n no other nodes on steveguan-1\n * suggestion: start the node\n\nCurrent node details:\n * node name: 'rabbitmqcli-64-rabbit@steveguan-1'\n * effective user's home directory: /var/lib/rabbitmq\n * Erlang cookie hash: zhg6nXg1QG3gM3mMbAwjgw==", "stderr_lines": ["Error: unable to perform an operation on node 'rabbit@steveguan-1'. Please see diagnostics information and suggestions below.", "", "Most common reasons for this are:", "", " * Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)", " * CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)", " * Target node is not running", "", "In addition to the diagnostics info below:", "", " * See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more", " * Consult server logs on node rabbit@steveguan-1", " * If target node is configured to use long node names, don't forget to use --longnames with CLI tools", "", "DIAGNOSTICS", "===========", "", "attempted to contact: ['rabbit@steveguan-1']", "", "rabbit@steveguan-1:", " * connected to epmd (port 4371) on steveguan-1", " * epmd reports: node 'rabbit' not running at all", " no other nodes on steveguan-1", " * suggestion: start the node", "", "Current node details:", " * node name: 'rabbitmqcli-64-rabbit@steveguan-1'", " * effective user's home directory: /var/lib/rabbitmq", " * Erlang cookie hash: zhg6nXg1QG3gM3mMbAwjgw=="], "stdout": "Waiting for pid file '/var/lib/rabbitmq/mnesia/rabbitmq.pid' to appear\npid is 26\nWaiting for erlang distribution on node 'rabbit@steveguan-1' while OS process '26' is running\nWaiting for applications 'rabbit_and_plugins' to start on node 'rabbit@steveguan-1'", "stdout_lines": ["Waiting for pid file '/var/lib/rabbitmq/mnesia/rabbitmq.pid' to appear", "pid is 26", "Waiting for erlang distribution on node 'rabbit@steveguan-1' while OS process '26' is running", "Waiting for applications 'rabbit_and_plugins' to start on node 'rabbit@steveguan-1'"]}
ansible -i ~/kolla-config/kolla-config/multinode/multinode compute -m shell -a "docker exec -u root -it rabbitmq rabbitmq-plugins disable rabbitmq_prometheus"
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-XXIEls59-1596375800704)(:storage/865ae890-c0b3-4375-b2f1-2fd0051c29d3/4f72ae15.png)]
/usr/sbin/haproxy -W -db -p /run/haproxy.pid -f /etc/haproxy/haproxy.cfg
/usr/sbin/haproxy -db -p /run/haproxy.pid -f /etc/haproxy/haproxy.cfg
/usr/share/elasticsearch/bin/elasticsearch
docker restart elasticsearch && docker exec -u elasticsearch -it elasticsearch bash
docker restart elasticsearch && docker ps | grep elasticsearch
mkdir /usr/share/elasticsearch/config &&
cp /etc/elasticsearch/{elasticsearch.yml, jvm.options, log4j2.properties} /usr/share/elasticsearch/config && /usr/share/elasticsearch/bin/elasticsearch
export ES_PATH_CONF=/etc/elasticsearch; /usr/share/elasticsearch/bin/elasticsearch
mkdir /usr/share/elasticsearch/config
ls -sR /etc/elasticsearch /usr/share/elasticsearch/config
nvalid index name
“reason”: “Invalid index name [.kibana], already exists as alias”
使用下面的curl也会复现出上面的出错
通过下面的的连接
What is an Elasticsearch Index? | Elastic BlogList all indices | Elasticsearch Reference [7.6] | ElasticDelete Index | Elasticsearch Reference [6.0] | Elastic
curl -H 'Content-Type: application/json' \
-X PUT https://10.10.1.205:9200/.kibana \
-d '{"index.mapper.dynamic": "true"}'
# get one index
curl -H 'Content-Type: application/json' \
-X GET https://10.10.1.205:9200/.kibana
# get all alias
curl -H 'Content-Type: application/json' \
-X GET https://10.10.1.205:9200/_alias
# get index alias
curl -H 'Content-Type: application/json' \
-X GET https://10.10.1.205:9200/.kibana/_alias
# check index exists
curl -I https://10.10.1.205:9200/.kibana?pretty
# remov index
curl -H 'Content-Type: application/json' \
-X DELETE https://10.10.1.205:9200/.kibana
curl -H 'Content-Type: application/json' \
-X DELETE https://10.10.1.205:9200/_all
# for test
curl -H 'Content-Type: application/json' \
-X PUT https://10.10.1.205:9200/.xiaojue \
-d '{"index.mapper.dynamic": "true"}'
Self-Signed Certificates failed
Bug #1875561 “Self-Signed Certificates failed” : Bugs : kolla-ansible 根据红帽的文档Chapter 5. Using shared system certificates Red Hat Enterprise Linux 8 | Red Hat Customer Portal,OpenStack Docs: Advanced Configuration应该能够生效,但是我在容器中和自己centos测试系统中按照官方文档的介绍,都无法使用curl不加-k来获取数据
./kolla-ansible -i /root/multinode deploy -t kibana
kolla-toolbox curl不通 elasticsearch的9200端口
kolla-toolbox容器中curl -k https://10.10.1.205:9200/.kibana
原因是deploy时存在如下文件
cat /root/.docker/config.json
{
"proxies":
{
"default":
{
"httpProxy": "http://10.10.1.100:1087",
"httpsProxy": "http://10.10.1.100:1087",
"noProxy": "*.test.example.com,.example2.com"
}
}
}
解决方法:
cd kolla-ansible
./kolla-ansible -i ../../multinode destroy --yes-i-really-really-mean-it
./kolla-ansible -i ../../multinode deploy
failed TASK [kibana : Change kibana config to set index as defaultIndex]
error message
fatal: [10.10.1.201]: FAILED! => {"action": "uri", "changed": false, "connection": "close", "content": "{\"error\":{\"root_cause\":[{\"type\":\"illegal_argument_exception\",\"reason\":\"Rejecting mapping update to [.kibana_1] as the final mapping would have more than 1 type: [doc, config]\"}],\"type\":\"illegal_argument_exception\",\"reason\":\"Rejecting mapping update to [.kibana_1] as the final mapping would have more than 1 type: [doc, config]\"},\"status\":400}", "content_length": "343", "content_type": "application/json; charset=UTF-8", "elapsed": 0, "json": {"error": {"reason": "Rejecting mapping update to [.kibana_1] as the final mapping would have more than 1 type: [doc, config]", "root_cause": [{"reason": "Rejecting mapping update to [.kibana_1] as the final mapping would have more than 1 type: [doc, config]", "type": "illegal_argument_exception"}], "type": "illegal_argument_exception"}, "status": 400}, "msg": "Status code was 400 and not [200, 201]: HTTP Error 400: Bad Request", "redirected": false, "status": 400, "url": "https://10.10.1.205:9200/.kibana/config/*"}
# get version
(kolla-toolbox)[root@k-node1 /]# curl https://10.10.1.205:9200 -k
{
"name" : "10.10.1.201",
"cluster_name" : "kolla_logging",
"cluster_uuid" : "iAFzv7jlSsm0eWnT3s3iaw",
"version" : {
"number" : "6.8.8",
"build_flavor" : "oss",
"build_type" : "rpm",
"build_hash" : "2f4c224",
"build_date" : "2020-03-18T23:22:18.622755Z",
"build_snapshot" : false,
"lucene_version" : "7.7.2",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
# creat test index
curl -H 'Content-Type: application/json' \
-X PUT \
http://172.16.50.247:9200/.test/
http://172.16.50.247:9200/
# get kibana index info
curl -H 'Content-Type: application/json' \
-X GET \
http://172.16.50.247:9200/.kibana/
curl -H 'Content-Type: application/json' \
-X GET \
http://172.16.50.247:9200/.test/
# get .kibana config
curl -H 'Content-Type: application/json' \
-X GET \
https://10.10.1.205:9200/.kibana/config/* \
-k
{
"_index": ".kibana_1",
"_type": "config",
"_id": "*",
"found": false
}
# get .kibana index-pattern
curl -H 'Content-Type: application/json' \
-X GET \
https://10.10.1.205:9200/.kibana/index-pattern/
# set default kibana index-pattern
curl -H 'Content-Type: application/json' \
-X PUT \
https://10.10.1.205:9200/.kibana/config/* \
-d '{"defaultIndex": "flog-*"}' -k
curl -H 'Content-Type: application/json' \
-X PUT \
https://10.10.1.205:9200/.kibana/_settings \
-d '{
"changes": {
"defaultIndex": "flog-*"
}
}'
curl -H 'Content-Type: application/json' \
-X PUT \
https://10.10.1.205:9200/.kibana/_mapping \
-d '
{
"properties": {
"defaultIndex": {
"type": "keyword",
"index": false
}
}
}'
curl -H 'Content-Type: application/json' \
-X PUT \
https://10.10.1.205:9200/.kibana/settings/defaultIndex \
-d '{"value": "flog-*"}' -k
curl -XGET "https://10.10.1.205:9200/_cat/indices"
curl -XGET "https://10.10.1.205:9200/.kibana"
ironic fails
TASK [ironic : Copying ironic-agent kernel and initramfs (iPXE)] ******************************************************************************************************************************
task path: /root/kolla-ansible/ansible/roles/ironic/tasks/config.yml:171
<10.10.1.201> ESTABLISH SSH CONNECTION FOR USER: root
<10.10.1.201> SSH: EXEC sshpass -d12 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/d5eb36b79d 10.10.1.201 '/bin/sh -c '"'"'echo ~root && sleep 0'"'"''
<10.10.1.201> (0, b'/root\n', b'')
<10.10.1.201> ESTABLISH SSH CONNECTION FOR USER: root
<10.10.1.201> SSH: EXEC sshpass -d12 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/d5eb36b79d 10.10.1.201 '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir /root/.ansible/tmp/ansible-tmp-1589006790.8853176-117770-146080183204827 && echo ansible-tmp-1589006790.8853176-117770-146080183204827="` echo /root/.ansible/tmp/ansible-tmp-1589006790.8853176-117770-146080183204827 `" ) && sleep 0'"'"''
<10.10.1.201> (0, b'ansible-tmp-1589006790.8853176-117770-146080183204827=/root/.ansible/tmp/ansible-tmp-1589006790.8853176-117770-146080183204827\n', b'')
<10.10.1.201> ESTABLISH SSH CONNECTION FOR USER: root
<10.10.1.201> SSH: EXEC sshpass -d12 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/d5eb36b79d 10.10.1.201 '/bin/sh -c '"'"'rm -f -r /root/.ansible/tmp/ansible-tmp-1589006790.8853176-117770-146080183204827/ > /dev/null 2>&1 && sleep 0'"'"''
<10.10.1.201> (0, b'', b'')
The full traceback is:
Traceback (most recent call last):
File "/root/deploy/lib/python3.7/site-packages/ansible/plugins/action/copy.py", line 464, in run
source = self._find_needle('files', source)
File "/root/deploy/lib/python3.7/site-packages/ansible/plugins/action/__init__.py", line 1178, in _find_needle
return self._loader.path_dwim_relative_stack(path_stack, dirname, needle)
File "/root/deploy/lib/python3.7/site-packages/ansible/parsing/dataloader.py", line 327, in path_dwim_relative_stack
raise AnsibleFileNotFound(file_name=source, paths=[to_native(p) for p in search])
ansible.errors.AnsibleFileNotFound: Could not find or access '/etc/kolla/config/ironic/ironic-agent.kernel' on the Ansible Controller.
If you are using a module and expect the file to exist on the remote, see the remote_src option
failed: [10.10.1.201] (item=ironic-agent.kernel) => {
"ansible_loop_var": "item",
"changed": false,
"invocation": {
"dest": "/etc/kolla/ironic-ipxe/ironic-agent.kernel",
"mode": "0660",
"module_args": {
"dest": "/etc/kolla/ironic-ipxe/ironic-agent.kernel",
"mode": "0660",
"src": "/etc/kolla/config/ironic/ironic-agent.kernel"
},
"src": "/etc/kolla/config/ironic/ironic-agent.kernel"
},
"item": "ironic-agent.kernel",
"msg": "Could not find or access '/etc/kolla/config/ironic/ironic-agent.kernel' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"
}
解决方法参考OpenStack Docs: Ironic in Kolla
mkdir /etc/kolla/config/ironic/
curl https://tarballs.openstack.org/ironic-python-agent/coreos/files/coreos_production_pxe.vmlinuz \
-o /etc/kolla/config/ironic/ironic-agent.kernel -L
curl https://tarballs.openstack.org/ironic-python-agent/coreos/files/coreos_production_pxe_image-oem.cpio.gz \
-o /etc/kolla/config/ironic/ironic-agent.initramfs -L
murano fail
"docker", "exec", "murano_api", "murano", "--os-username", "admin", "--os-password", "uxlNttg9tklN88M5FH7SaWNq6UXZ5sBOIsyHZnh0", "--os-project-name", "admin", "--os-cacert", "/etc/pki/ca-trust/source/anchors/kolla-customca-haproxy-internal.crt", "--os-auth-url", "https://10.10.1.205:35357", "--murano-url", "https://10.10.1.201:8082", "package-list"
数据库挂了
参考Two-Node Clusters — Galera Cluster Documentation
docker exec -u mysql -it mariadb galera_new_cluster
docker exec -u mysql -it mariadb mysqld_safe --wsrep-recover
/usr/bin/mysqld_safe
docker exec -u mysql -it mariadb bash
docker exec -u root -it mariadb bash
docker exec -u root -it mariadb \
mysql -u root -p3rWYF9UTz9hkegcSOeyjtuvCWKDAFhaIXXOLlvmw
docker exec -u root -it mariadb \
mysql -V
tailf /var/lib/docker/volumes/kolla_logs/_data/mariadb/mariadb.log
show status like "%wsrep%";
+-------------------------------+----------------------+
| Variable_name | Value |
+-------------------------------+----------------------+
| wsrep_applier_thread_count | 0 |
| wsrep_cluster_conf_id | 18446744073709551615 |
| wsrep_cluster_size | 0 |
| wsrep_cluster_state_uuid | |
| wsrep_cluster_status | Disconnected |
| wsrep_connected | OFF |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 18446744073709551615 |
| wsrep_provider_name | |
| wsrep_provider_vendor | |
| wsrep_provider_version | |
| wsrep_ready | OFF |
| wsrep_rollbacker_thread_count | 0 |
| wsrep_thread_count | 0 |
+-------------------------------+----------------------+
docker exec -u root -it mariadb mysql -V
ansible -i ~/multinode chrony -m shell -a "docker ps | grep mariadb"
cd ~/kolla-ansible/tools
./kolla-ansible -i ~/multinode mariadb_recovery
ovs无法安装
The full traceback is:
WARNING: The below traceback may *not* be related to the actual failure.
File "/tmp/ansible_kolla_docker_payload_w4s6XR/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py", line 1024, in main
File "/tmp/ansible_kolla_docker_payload_w4s6XR/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py", line 747, in recreate_or_restart_container
File "/tmp/ansible_kolla_docker_payload_w4s6XR/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py", line 765, in start_container
File "/tmp/ansible_kolla_docker_payload_w4s6XR/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py", line 571, in pull_image
File "/usr/local/lib/python2.7/dist-packages/docker/api/image.py", line 415, in pull
self._raise_for_status(response)
File "/usr/local/lib/python2.7/dist-packages/docker/api/client.py", line 263, in _raise_for_status
raise create_api_error_from_http_exception(e)
File "/usr/local/lib/python2.7/dist-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation)
fatal: [10.10.1.201]: FAILED! => {
"changed": true,
"invocation": {
"module_args": {
"action": "recreate_or_restart_container",
"api_version": "auto",
"auth_email": null,
"auth_password": null,
"auth_registry": "10.10.1.201:4000",
"auth_username": null,
"cap_add": [],
"client_timeout": 120,
"command": null,
"detach": true,
"dimensions": {},
"environment": {
"KOLLA_CONFIG_STRATEGY": "COPY_ALWAYS"
},
"graceful_timeout": 10,
"image": "10.10.1.201:4000/kolla/centos-source-ovsdpdk-db:fe6fd8dc5d",
"labels": {},
"name": "ovsdpdk_db",
"privileged": false,
"remove_on_exit": true,
"restart_policy": "unless-stopped",
"restart_retries": 10,
"security_opt": [],
"state": "running",
"tls_cacert": null,
"tls_cert": null,
"tls_key": null,
"tls_verify": false,
"tty": false,
"volumes": [
"/etc/kolla/ovsdpdk-db/:/var/lib/kolla/config_files/:ro",
"/etc/localtime:/etc/localtime:ro",
"",
"/run/openvswitch:/run/openvswitch:shared",
"kolla_logs:/var/log/kolla/",
"ovsdpdk_db:/var/lib/openvswitch/"
],
"volumes_from": null
}
},
"msg": "'Traceback (most recent call last):\\n File \"/tmp/ansible_kolla_docker_payload_w4s6XR/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py\", line 1024, in main\\n File \"/tmp/ansible_kolla_docker_payload_w4s6XR/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py\", line 747, in recreate_or_restart_container\\n File \"/tmp/ansible_kolla_docker_payload_w4s6XR/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py\", line 765, in start_container\\n File \"/tmp/ansible_kolla_docker_payload_w4s6XR/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py\", line 571, in pull_image\\n File \"/usr/local/lib/python2.7/dist-packages/docker/api/image.py\", line 415, in pull\\n self._raise_for_status(response)\\n File \"/usr/local/lib/python2.7/dist-packages/docker/api/client.py\", line 263, in _raise_for_status\\n raise create_api_error_from_http_exception(e)\\n File \"/usr/local/lib/python2.7/dist-packages/docker/errors.py\", line 31, in create_api_error_from_http_exception\\n raise cls(e, response=response, explanation=explanation)\\nNotFound: 404 Client Error: Not Found (\"manifest for 10.10.1.201:4000/kolla/centos-source-ovsdpdk-db:fe6fd8dc5d not found: manifest unknown: manifest unknown\")\\n'"
}
由于目前centos没有ovs-dpdk的源所以暂时无法build基于centos的kolla镜像。
解决方法:
使用Ubuntu的镜像
docker pull kolla/ubuntu-source-ovsdpdk:8.0.3
docker pull kolla/ubuntu-source-ovsdpdk-db:8.0.3
docker pull kolla/ubuntu-source-ovsdpdk-vswitchd:8.0.3
docker tag kolla/ubuntu-source-ovsdpdk:8.0.3 10.10.1.201:4000/kolla/centos-source-ovsdpdk:fe6fd8dc5d
docker tag kolla/ubuntu-source-ovsdpdk-db:8.0.3 10.10.1.201:4000/kolla/centos-source-ovsdpdk-db:fe6fd8dc5d
docker tag kolla/ubuntu-source-ovsdpdk-vswitchd:8.0.3 10.10.1.201:4000/kolla/centos-source-ovsdpdk-vswitchd:fe6fd8dc5d
docker push 10.10.1.201:4000/kolla/centos-source-ovsdpdk:fe6fd8dc5d
docker push 10.10.1.201:4000/kolla/centos-source-ovsdpdk-db:fe6fd8dc5d
docker push 10.10.1.201:4000/kolla/centos-source-ovsdpdk-vswitchd:fe6fd8dc5d
单个节点上outward_rabbitmq容器不断重启
进入正常容器查看进程
docker exec -u root -it outward_rabbitmq bash
ps -ef | cat
/usr/lib64/erlang/erts-10.7.1/bin/beam.smp \
-W w -A 64 -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 512 -MMmcs 30 -P 1048576 -t 5000000 \
-stbt db -zdbbl 128000 -K true -B i \
-- -root /usr/lib64/erlang -progname erl \
-- -home /var/lib/rabbitmq -epmd_port 4371 \
-- -pa \
/usr/lib/rabbitmq/lib/rabbitmq_server-3.8.3/ebin \
-noshell -noi nput -s rabbit boot -sname rabbit@k-node1 -boot start_sasl \
-conf /etc/rabbitmq/rabbitmq.conf -conf_dir /var/lib/rabbitmq/config \
-conf_script_dir /usr/lib/rabbitmq/bin
-conf_schema_dir /var/lib/rabbitmq/schema \
-conf_advanced /etc/rabbitmq/advanced.config \
-kernel inet_default_connect_options [{nodelay,true}] \
-kernel inetrc '/etc/rabbitmq/erl_inetrc' \
-sasl errlog_type error -sasl sa sl_error_logger false \
-rabbit lager_log_root "/var/log/kolla/outward_rabbitmq" \
-rabbit lager_default_file "/var/log/kolla/outward_rabbitmq/rabbit@k-node1.log" \
-rabbit lager_upgrade_file "/var/log/kolla/outward_rabbitmq/rabbit@k-node1_upgrade.log" \
-rabbit feature_flags_file "/var/lib/rabbitmq/mnesia/rabbit@k-node1-feature_flags" \
-rabbit enabled_plugins_file "/etc/rabbitmq/enabled_plugins" \
-rabbit plugins_dir "/usr/lib/rabbitmq/plugins:/usr/lib/rabbitmq/lib/rabbitmq_server-3.8.3/plugins" \
-rabbit plugins_expand_dir "/var/lib/rabbitmq/mnesia/rabbit@k-node1-plugins-expand"
-os_mon start_cpu_sup false \
-os_mon start_disksup false \
-os_mon start_memsup false \
-mnesia dir "/var/lib/rabbitmq/mnesia/rabbit@k-node1" \
-ra data_dir "/var/lib/rabbitmq/mnesia/rabbit@k-node1/quorum" \
-kernel inet_dist_listen_min 25674 -kernel inet_dist_listen_max 25674 --
容器启动命令
/usr/sbin/rabbitmq-server
手动启动容器服务
修改失败容器的启动文件/etc/kolla/outward_rabbitmq/config.json
进行如下修改
{
"command": "/usr/sbin/rabbitmq-server",
"command": "sleep infinity", # 增加前面配置
...
}
重启容器
docker restart outward_rabbitmq
进入容器在容器中执行服务启动命令
docker exec -u root -it outward_rabbitmq bash
/usr/sbin/rabbitmq-server
获取部分失败日志
## ## RabbitMQ 3.8.3
## ##
########## Copyright (c) 2007-2020 Pivotal Software, Inc.
###### ##
########## Licensed under the MPL 1.1. Website: https://rabbitmq.com
Doc guides: https://rabbitmq.com/documentation.html
Support: https://rabbitmq.com/contact.html
Tutorials: https://rabbitmq.com/getstarted.html
Monitoring: https://rabbitmq.com/monitoring.html
Logs: /var/log/kolla/outward_rabbitmq/rabbit@k-node2.log
/var/log/kolla/outward_rabbitmq/rabbit@k-node2_upgrade.log
Config file(s): /etc/rabbitmq/rabbitmq.conf
Starting broker...vi
BOOT FAILED
===========
Error description:
init:do_boot/3
init:start_em/1
rabbit:start_it/1 line 484
rabbit:broker_start/1 line 360
rabbit:start_loaded_apps/2 line 613
app_utils:manage_applications/6 line 126
lists:foldl/3 line 1263
rabbit:'-handle_app_error/1-fun-0-'/3 line 736
throw:{could_not_start,rabbitmq_prometheus,
{rabbitmq_prometheus,
{bad_return,
{{rabbit_prometheus_app,start,[normal,[]]},
{'EXIT',
{{could_not_start_listener,
[{port,15692},{protocol,'http/prometheus'}],
{shutdown,
{failed_to_start_child,ranch_acceptors_sup,
{listen_error,rabbit_web_dispatch_sup_15692,eaddrinuse}}}},
{gen_server,call,
[rabbit_web_dispatch_registry,
{add,rabbitmq_prometheus_tcp,
[{port,15692},{protocol,'http/prometheus'}],
#Fun<rabbit_web_dispatch.0.73002970>,
[{'_',[],
[{[<<"metrics">>],[],rabbit_prometheus_handler,[]},
{[<<"metrics">>,registry],
[],rabbit_prometheus_handler,[]}]}],
{[],"RabbitMQ Prometheus"}},
infinity]}}}}}}}
Log file(s) (may contain more information):
/var/log/kolla/outward_rabbitmq/rabbit@k-node2.log
/var/log/kolla/outward_rabbitmq/rabbit@k-node2_upgrade.log
{"init terminating in do_boot",{could_not_start,rabbitmq_prometheus,{rabbitmq_prometheus,{bad_return,{{rabbit_prometheus_app,start,[normal,[]]},{'EXIT',{{could_not_start_listener,[{port,15692},{protocol,'http/prometheus'}],{shutdown,{failed_to_start_child,ranch_acceptors_sup,{listen_error,rabbit_web_dispatch_sup_15692,eaddrinuse}}}},{gen_server,call,[rabbit_web_dispatch_registry,{add,rabbitmq_prometheus_tcp,[{port,15692},{protocol,'http/prometheus'}],#Fun<rabbit_web_dispatch.0.73002970>,[{'_',[],[{[<<"metrics">>],[],rabbit_prometheus_handler,[]},{[<<"metrics">>,registry],[],rabbit_prometheus_handler,[]}]}],{[],"RabbitMQ Prometheus"}},infinity]}}}}}}}}
init terminating in do_boot ({could_not_start,rabbitmq_prometheus,{rabbitmq_prometheus,{bad_return,{{_},{_}}}}})
Crash dump is being written to: /var/log/kolla/outward_rabbitmq/erl_crash.dump...done
Ensure RabbitMQ users exist fail
TASK [service-rabbitmq : nova-cell | Ensure RabbitMQ users exist] ****************************************************************************************************************************************************************************
task path: /root/kolla-ansible/ansible/roles/service-rabbitmq/tasks/main.yml:15
failed: [10.10.1.201 -> 10.10.1.201] (item={'user': 'openstack', 'vhost': '/'}) => {
"action": "rabbitmq_user",
"ansible_loop_var": "item",
"attempts": 5,
"changed": false,
"cmd": "/usr/sbin/rabbitmqctl -q -n rabbit list_users",
"invocation": {
"module_args": {
"configure_priv": ".*",
"force": false,
"node": "rabbit",
"password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
"permissions": [
{
"configure_priv": ".*",
"read_priv": ".*",
"vhost": "/",
"write_priv": ".*"
}
],
"read_priv": ".*",
"state": "present",
"tags": null,
"update_password": "always",
"user": "openstack",
"vhost": "/",
"write_priv": ".*"
}
},
"item": {
"password": "fNn7OUGWBRKdFh1RmLzpf2R34ItnBMoWzD5QJjQz",
"user": "openstack",
"vhost": "/"
},
"msg": "Error: this command requires the 'rabbit' app to be running on the target node. Start it with 'rabbitmqctl start_app'.\nArguments given:\n\t-q -n rabbit list_users\n\n\u001b[1mUsage\u001b[0m\n\nrabbitmqctl [--node <node>] [--longnames] [--quiet] list_users [--no-table-headers] [--timeout <timeout>]",
"rc": 64,
"stderr": "Error: this command requires the 'rabbit' app to be running on the target node. Start it with 'rabbitmqctl start_app'.\nArguments given:\n\t-q -n rabbit list_users\n\n\u001b[1mUsage\u001b[0m\n\nrabbitmqctl [--node <node>] [--longnames] [--quiet] list_users [--no-table-headers] [--timeout <timeout>]\n",
"stderr_lines": [
"Error: this command requires the 'rabbit' app to be running on the target node. Start it with 'rabbitmqctl start_app'.",
"Arguments given:",
"\t-q -n rabbit list_users",
"",
"\u001b[1mUsage\u001b[0m",
"",
"rabbitmqctl [--node <node>] [--longnames] [--quiet] list_users [--no-table-headers] [--timeout <timeout>]"
],
"stdout": "",
"stdout_lines": []
}
根据下面的问题的解决经验,得出是两套rabbitmq都开起来prometheus插件所以,导致端口冲突从而无法启动rabbitmq
解决方法
ansible -i ~/multinode baremetal -m shell -a \
"docker stop rabbitmq &&
docker restart outward_rabbitmq &&
docker exec -u root -it outward_rabbitmq rabbitmq-plugins disable rabbitmq_prometheus &&
docker restart outward_rabbitmq rabbitmq &&
docker ps | grep rabbitmq"
ansible -i ~/multinode baremetal -m shell -a "docker ps | grep rabbitmq"
docker exec -u root -it rabbitmq bash
TASK [keystone : Creating admin project, user, role, service, and endpoint]一直han在那边
查看keystone日志,发现rabbitmq无法连上
2020-05-21 12:49:47.391 381 ERROR oslo.messaging._drivers.impl_rabbit [req-0a338d12-e01c-40ba-9292-6a521b784dfa - - - - -] Connection failed: [Errno 111] Connection refused (retrying in 0 seconds): ConnectionRefusedError: [Errno 111] Connection refused
2020-05-21 12:49:47.399 381 ERROR oslo.messaging._drivers.impl_rabbit [req-0a338d12-e01c-40ba-9292-6a521b784dfa - - - - -] Connection failed: [Errno 111] Connection refused (retrying in 28.0 seconds): ConnectionRefusedError: [Errno 111] Connection refused
查看rabbitmq状态
(deploy) root@k-node1:/proc/net# ansible -i ~/multinode baremetal -m shell -a "docker ps -a | grep rabbitmq "
[DEPRECATION WARNING]: The TRANSFORM_INVALID_GROUP_CHARS settings is set to allow bad characters in group names by default, this will change, but still be user configurable on deprecation.
This feature will be removed in version 2.10. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
[DEPRECATION WARNING]: Distribution Ubuntu 16.04 on host 10.10.1.201 should use /usr/bin/python3, but is using /usr/bin/python for backward compatibility with prior Ansible releases. A
future Ansible release will default to using the discovered platform python for this host. See https://docs.ansible.com/ansible/2.9/reference_appendices/interpreter_discovery.html for more
information. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
10.10.1.201 | CHANGED | rc=0 >>
e3d58381fe1c 10.10.1.201:4000/kolla/centos-source-rabbitmq:fe6fd8dc5d "dumb-init --single-…" 21 hours ago Up 3 hours outward_rabbitmq
02fa4f75a28d 10.10.1.201:4000/kolla/centos-source-rabbitmq:fe6fd8dc5d "dumb-init --single-…" 21 hours ago Up 3 hours rabbitmq
[DEPRECATION WARNING]: Distribution Ubuntu 16.04 on host 10.10.1.202 should use /usr/bin/python3, but is using /usr/bin/python for backward compatibility with prior Ansible releases. A
future Ansible release will default to using the discovered platform python for this host. See https://docs.ansible.com/ansible/2.9/reference_appendices/interpreter_discovery.html for more
information. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
10.10.1.202 | CHANGED | rc=0 >>
47b195c087b0 10.10.1.201:4000/kolla/centos-source-rabbitmq:fe6fd8dc5d "dumb-init --single-…" 20 hours ago Up 3 hours outward_rabbitmq
432e1692998f 10.10.1.201:4000/kolla/centos-source-rabbitmq:fe6fd8dc5d "dumb-init --single-…" 21 hours ago Up 3 minutes rabbitmq
查看rabbitmq服务是否正常
lsof -i :5672
查看2上的rabbitmq日志
vi /var/lib/docker/volumes/kolla_logs/_data/rabbitmq/log/crash.log
2020-05-21 13:57:40 =ERROR REPORT====
Mnesia('rabbit@k-node1'): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, 'rabbit@k-node2'}
2020-05-21 13:57:40 =ERROR REPORT====
Failed to start Ranch listener rabbit_web_dispatch_sup_15692 in ranch_tcp:listen([{cacerts,'...'},{key,'...'},{cert,'...'},{port,15692}]) for reason eaddrinuse (address already in use)
2020-05-21 13:57:40 =SUPERVISOR REPORT====
Supervisor: {<0.604.0>,ranch_listener_sup}
Context: start_error
Reason: {listen_error,rabbit_web_dispatch_sup_15692,eaddrinuse}
Offender: [{pid,undefined},{id,ranch_acceptors_sup},{mfargs,{ranch_acceptors_sup,start_link,[rabbit_web_dispatch_sup_15692,ranch_tcp]}},{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}]
2020-05-21 13:57:40 =CRASH REPORT====
crasher:
initial call: supervisor:ranch_acceptors_sup/1
pid: <0.606.0>
registered_name: []
exception exit: {{listen_error,rabbit_web_dispatch_sup_15692,eaddrinuse},[{ranch_acceptors_sup,listen_error,5,[{file,"src/ranch_acceptors_sup.erl"},{line,66}]},{ranch_acceptors_sup,init,1,[{file,"src/ranch_acceptors_sup.erl"},{line,44}]},{supervisor,init,1,[{file,"supervisor.erl"},{line,295}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,374}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,342}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}
ancestors: [<0.604.0>,rabbit_web_dispatch_sup,<0.597.0>]
message_queue_len: 0
messages: []
links: [<0.604.0>]
dictionary: [{logger,error_logger}]
trap_exit: true
status: running
heap_size: 987
stack_size: 27
reductions: 1356
neighbours:
2020-05-21 13:57:40 =ERROR REPORT====
** Generic server rabbit_web_dispatch_registry terminating
** Last message in was {add,rabbitmq_prometheus_tcp,[{port,15692},{protocol,'http/prometheus'}],#Fun<rabbit_web_dispatch.0.73002970>,[{'_',[],[{[<<"metrics">>],[],rabbit_prometheus_handler,[]},{[<<"metrics">>,registry],[],rabbit_prometheus_handler,[]}]}],{[],"RabbitMQ Prometheus"}}
** When Server state == undefined
** Reason for termination ==
** {{could_not_start_listener,[{port,15692},{protocol,'http/prometheus'}],{shutdown,{failed_to_start_child,ranch_acceptors_sup,{listen_error,rabbit_web_dispatch_sup_15692,eaddrinuse}}}},[{rabbit_web_dispatch_sup,check_error,2,[{file,"src/rabbit_web_dispatch_sup.erl"},{line,141}]},{rabbit_web_dispatch_registry,handle_call,3,[{file,"src/rabbit_web_dispatch_registry.erl"},{line,75}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,661}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,690}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}
** Client <0.603.0> stacktrace
** [{gen,do_call,4,[{file,"gen.erl"},{line,167}]},{gen_server,call,3,[{file,"gen_server.erl"},{line,219}]},{rabbit_web_dispatch,register_context_handler,5,[{file,"src/rabbit_web_dispatch.erl"},{line,35}]},{rabbit_prometheus_app,start_listener,1,[{file,"src/rabbit_prometheus_app.erl"},{line,83}]},{rabbit_prometheus_app,'-start_configured_listener/0-lc$^0/1-0-',1,[{file,"src/rabbit_prometheus_app.erl"},{line,57}]},{rabbit_prometheus_app,start,2,[{file,"src/rabbit_prometheus_app.erl"},{line,32}]},{application_master,start_it_old,4,[{file,"application_master.erl"},{line,277}]}]
2020-05-21 13:57:40 =CRASH REPORT====
crasher:
initial call: rabbit_web_dispatch_registry:init/1
pid: <0.599.0>
registered_name: rabbit_web_dispatch_registry
exception exit: {{could_not_start_listener,[{port,15692},{protocol,'http/prometheus'}],{shutdown,{failed_to_start_child,ranch_acceptors_sup,{listen_error,rabbit_web_dispatch_sup_15692,eaddrinuse}}}},[{rabbit_web_dispatch_sup,check_error,2,[{file,"src/rabbit_web_dispatch_sup.erl"},{line,141}]},{rabbit_web_dispatch_registry,handle_call,3,[{file,"src/rabbit_web_dispatch_registry.erl"},{line,75}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,661}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,690}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}
ancestors: [rabbit_web_dispatch_sup,<0.597.0>]
message_queue_len: 0
messages: []
links: [<0.598.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 1598
stack_size: 27
reductions: 749
neighbours:
2020-05-21 13:57:40 =SUPERVISOR REPORT====
Supervisor: {local,rabbit_web_dispatch_sup}
Context: child_terminated
Reason: {could_not_start_listener,[{port,15692},{protocol,'http/prometheus'}],{shutdown,{failed_to_start_child,ranch_acceptors_sup,{listen_error,rabbit_web_dispatch_sup_15692,eaddrinuse}}}}
Offender: [{pid,<0.599.0>},{id,rabbit_web_dispatch_registry},{mfargs,{rabbit_web_dispatch_registry,start_link,[]}},{restart_type,transient},{shutdown,5000},{child_type,worker}]
2020-05-21 13:57:40 =CRASH REPORT====
crasher:
initial call: application_master:init/4
pid: <0.602.0>
registered_name: []
exception exit: {{bad_return,{{rabbit_prometheus_app,start,[normal,[]]},{'EXIT',{{could_not_start_listener,[{port,15692},{protocol,'http/prometheus'}],{shutdown,{failed_to_start_child,ranch_acceptors_sup,{listen_error,rabbit_web_dispatch_sup_15692,eaddrinuse}}}},{gen_server,call,[rabbit_web_dispatch_registry,{add,rabbitmq_prometheus_tcp,[{port,15692},{protocol,'http/prometheus'}],#Fun<rabbit_web_dispatch.0.73002970>,[{'_',[],[{[<<"metrics">>],[],rabbit_prometheus_handler,[]},{[<<"metrics">>,registry],[],rabbit_prometheus_handler,[]}]}],{[],"RabbitMQ Prometheus"}},infinity]}}}}},[{application_master,init,4,[{file,"application_master.erl"},{line,138}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}
ancestors: [<0.601.0>]
message_queue_len: 1
messages: [{'EXIT',<0.603.0>,normal}]
links: [<0.601.0>,<0.44.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 376
stack_size: 27
reductions: 227
neighbours:
后来查看端口15692
的占用情况,发现
root@k-node1:~# lsof -i :15692
lsof: no pwd entry for UID 42439
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
lsof: no pwd entry for UID 42439
beam.smp 11708 42439 93u IPv4 200265 0t0 TCP *:15692 (LISTEN)
想到两套有rabbitmq,猜想,两套都起了prometheus listener,所以导致了这一套rabbitmq的出现因为端口冲突而启动rabbitmq失败。
后来把outward_rabbitmq
暂时停掉发现rabbitmq可用了。
TASK [ovs-dpdk : Install ovs-dpdkctl service and config] failed
TASK [neutron : Copying over openvswitch_agent.ini] failed
TASK [neutron : Copying over openvswitch_agent.ini] ******************************************************************************************************************************************************************************************
task path: /root/kolla-ansible/ansible/roles/neutron/tasks/config.yml:172
<10.10.1.201> ESTABLISH SSH CONNECTION FOR USER: root
<10.10.1.201> SSH: EXEC sshpass -d11 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/d5eb36b79d 10.10.1.201 '/bin/sh -c '"'"'echo ~root && sleep 0'"'"''
<10.10.1.201> (0, b'/root\n', b'')
<10.10.1.201> ESTABLISH SSH CONNECTION FOR USER: root
<10.10.1.201> SSH: EXEC sshpass -d11 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/d5eb36b79d 10.10.1.201 '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir /root/.ansible/tmp/ansible-tmp-1590113556.592888-38224-240706786989573 && echo ansible-tmp-1590113556.592888-38224-240706786989573="` echo /root/.ansible/tmp/ansible-tmp-1590113556.592888-38224-240706786989573 `" ) && sleep 0'"'"''
<10.10.1.201> (0, b'ansible-tmp-1590113556.592888-38224-240706786989573=/root/.ansible/tmp/ansible-tmp-1590113556.592888-38224-240706786989573\n', b'')
<10.10.1.201> ESTABLISH SSH CONNECTION FOR USER: root
<10.10.1.201> SSH: EXEC sshpass -d11 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/d5eb36b79d 10.10.1.201 '/bin/sh -c '"'"'rm -f -r /root/.ansible/tmp/ansible-tmp-1590113556.592888-38224-240706786989573/ > /dev/null 2>&1 && sleep 0'"'"''
<10.10.1.201> (0, b'', b'')
fatal: [10.10.1.201]: FAILED! => {
"msg": "An unhandled exception occurred while templating '{{ 'tunnel' | kolla_address }}'. Error was a <class 'kolla_ansible.exception.FilterError'>, original message: Interface 'dpdk_bridge' not present on host '10.10.1.201'"
}
原因可能是TASK [ovs-dpdk : Binds the interface to the target driver specifed in the config]虽然跑过来,但是这个task的output却有问题:
changed: [10.10.1.202] => {
"changed": true,
"cmd": [
"/etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh",
"bind_nics"
],
"delta": "0:00:00.094136",
"end": "2020-05-28 15:33:07.105387",
"invocation": {
"module_args": {
"_raw_params": "/etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh bind_nics",
"_uses_shell": false,
"argv": null,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"stdin_add_newline": true,
"strip_empty_ends": true,
"warn": true
}
},
"rc": 0,
"start": "2020-05-28 15:33:07.011251",
"stderr": "++ realpath /etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh\n+ FULL_PATH=/etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh\n+ CONFIG_FILE=/etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf\n+ SERVICE_FILE=/etc/systemd/system/ovs-dpdkctl.service\n+ BRIDGE_SERVICE_FILE=/etc/systemd/system/ovs-dpdk-bridge.service\n+ '[' 1 -ge 1 ']'\n+ func=bind_nics\n+ shift\n+ eval 'bind_nics '\n++ bind_nics\n+++ list_dpdk_nics\n++++ get_value ovs port_mappings\n++++ crudini --get /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ovs port_mappings\n++++ tr , '\\n'\n++++ cut -d : -f 1\n+++ for nic in '$(get_value ovs port_mappings | tr '\\'','\\'' '\\''\\n'\\'' | cut -d : -f 1)'\n+++ echo ens34\n++ for nic in '$(list_dpdk_nics)'\n+++ get_value ens34 address\n+++ crudini --get /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ens34 address\n++ device_address=0000:02:02.0\n+++ get_driver_by_address 0000:02:02.0\n+++ ls /sys/bus/pci/devices/0000:02:02.0/driver -al\n+++ awk '{n=split($NF,a,\"/\"); print a[n]}'\nls: cannot access '/sys/bus/pci/devices/0000:02:02.0/driver': No such file or directory\n++ current_driver=\n+++ get_value ens34 driver\n+++ crudini --get /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ens34 driver\n++ target_driver=uio_pci_generic\n++ '[' '' '!=' uio_pci_generic ']'\n++ set_value ens34 old_driver\n++ crudini --set /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ens34 old_driver ''\n++ unbind_nic 0000:02:02.0\n++ echo 0000:02:02.0\n/etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh: line 106: /sys/bus/pci/drivers//unbind: Permission denied\n++ echo\n++ bind_nic 0000:02:02.0 uio_pci_generic\n++ echo uio_pci_generic\n++ echo 0000:02:02.0\n/etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh: line 102: echo: write error: No such device\n+ set +o xtrace",
"stderr_lines": [
"++ realpath /etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh",
"+ FULL_PATH=/etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh",
"+ CONFIG_FILE=/etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf",
"+ SERVICE_FILE=/etc/systemd/system/ovs-dpdkctl.service",
"+ BRIDGE_SERVICE_FILE=/etc/systemd/system/ovs-dpdk-bridge.service",
"+ '[' 1 -ge 1 ']'",
"+ func=bind_nics",
"+ shift",
"+ eval 'bind_nics '",
"++ bind_nics",
"+++ list_dpdk_nics",
"++++ get_value ovs port_mappings",
"++++ crudini --get /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ovs port_mappings",
"++++ tr , '\\n'",
"++++ cut -d : -f 1",
"+++ for nic in '$(get_value ovs port_mappings | tr '\\'','\\'' '\\''\\n'\\'' | cut -d : -f 1)'",
"+++ echo ens34",
"++ for nic in '$(list_dpdk_nics)'",
"+++ get_value ens34 address",
"+++ crudini --get /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ens34 address",
"++ device_address=0000:02:02.0",
"+++ get_driver_by_address 0000:02:02.0",
"+++ ls /sys/bus/pci/devices/0000:02:02.0/driver -al",
"+++ awk '{n=split($NF,a,\"/\"); print a[n]}'",
"ls: cannot access '/sys/bus/pci/devices/0000:02:02.0/driver': No such file or directory",
"++ current_driver=",
"+++ get_value ens34 driver",
"+++ crudini --get /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ens34 driver",
"++ target_driver=uio_pci_generic",
"++ '[' '' '!=' uio_pci_generic ']'",
"++ set_value ens34 old_driver",
"++ crudini --set /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ens34 old_driver ''",
"++ unbind_nic 0000:02:02.0",
"++ echo 0000:02:02.0",
"/etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh: line 106: /sys/bus/pci/drivers//unbind: Permission denied",
"++ echo",
"++ bind_nic 0000:02:02.0 uio_pci_generic",
"++ echo uio_pci_generic",
"++ echo 0000:02:02.0",
"/etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh: line 102: echo: write error: No such device",
"+ set +o xtrace"
],
"stdout": "",
"stdout_lines": []
}
根据这边/etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh: line 102:
的代码
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UEsxZB89-1596375800706)(:storage/865ae890-c0b3-4375-b2f1-2fd0051c29d3/a11a8ed3.png)]
实际上做的就是
(deploy) root@k-node1:~# crudini --get /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ovs port_mappings | tr ',' '\n' | cut -d : -f 1
ens34
(deploy) root@k-node1:~# crudini --get /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ens34 address
0000:02:02.0
(deploy) root@k-node1:~# ls /sys/bus/pci/devices/0000\:02\:01.0/driver -al | awk '{n=split($NF,a,"/"); print a[n]}'
e1000
(deploy) root@k-node1:~# crudini --get /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ens34 old_driver
e1000
$device_address=0000:02:02.0
$current_driver=e1000
$target_driver=uio_pci_generic
echo 0000:02:02.0 > /sys/bus/pci/drivers/e1000/unbind
echo > /sys/bus/pci/devices/0000:02:02.0/driver_override
/sys/bus/pci/drivers/e1000/bind
/sys/bus/pci/drivers/e1000/unbind
/sys/bus/pci/devices/0000:02:02.0/driver_override
/sys/bus/pci/devices/0000:02:02.0/driver not fount
配置文件/etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf
内容如下
[ovs]
bridge_mappings = physnet1:dpdk_bridge
port_mappings = ens34:dpdk_bridge
cidr_mappings = dpdk_bridge:192.168.115.202/24
ovs_coremask = 0x1
pmd_coremask = 0x2
ovs_mem_channels = 4
ovs_socket_mem = 1024
dpdk_interface_driver = uio_pci_generic
hugepage_mountpoint = /dev/hugepages
physical_port_policy = named
pci_whitelist = -w 0000:02:02.0
[ens33]
address = 0000:02:01.0
driver =e1000
[ens34]
address = 0000:02:02.0
driver =uio_pci_generic
old_driver =
跑taskRUNNING HANDLER [ovs-dpdk : Ensuring ovsdpdk bridges are properly setup named]出错了但是没退出
ok: [10.10.1.202] => {
"changed": false,
"cmd": [
"docker",
"exec",
"ovsdpdk_db",
"/bin/sh",
"-c",
"CONFIG_FILE=/var/lib/kolla/config_files/ovs-dpdkctl.conf /var/lib/kolla/config_files/ovs-dpdkctl.sh init"
],
"delta": "0:00:00.383534",
"end": "2020-05-28 18:33:38.817271",
"invocation": {
"module_args": {
"_raw_params": "docker exec ovsdpdk_db /bin/sh -c 'CONFIG_FILE=/var/lib/kolla/config_files/ovs-dpdkctl.conf /var/lib/kolla/config_files/ovs-dpdkctl.sh init'\n",
"_uses_shell": false,
"argv": null,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"stdin_add_newline": true,
"strip_empty_ends": true,
"warn": true
}
},
"rc": 0,
"start": "2020-05-28 18:33:38.433737",
"stderr": "++ realpath /var/lib/kolla/config_files/ovs-dpdkctl.sh\n+ FULL_PATH=/var/lib/kolla/config_files/ovs-dpdkctl.sh\n+ CONFIG_FILE=/var/lib/kolla/config_files/ovs-dpdkctl.conf\n+ SERVICE_FILE=/etc/systemd/system/ovs-dpdkctl.service\n+ BRIDGE_SERVICE_FILE=/etc/systemd/system/ovs-dpdk-bridge.service\n+ '[' 1 -ge 1 ']'\n+ func=init\n+ shift\n+ eval 'init '\n++ init\n++ init_ovs_db\n++ ovs-vsctl init\n+++ get_value ovs pmd_coremask\n+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs pmd_coremask\n/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found\n+++ get_value ovs ovs_coremask\n+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs ovs_coremask\n/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found\n+++ get_value ovs ovs_mem_channels\n+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs ovs_mem_channels\n/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found\n+++ get_value ovs ovs_socket_mem\n+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs ovs_socket_mem\n/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found\n+++ get_value ovs hugepage_mountpoint\n+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs hugepage_mountpoint\n/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found\n+++ get_value ovs pci_whitelist\n+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs pci_whitelist\n/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found\n++ ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask= other_config:dpdk-init=True other_config:dpdk-lcore-mask= other_config:dpdk-mem-channels= other_config:dpdk-socket-mem= other_config:dpdk-hugepage-dir= 'other_config:dpdk-extra= --proc-type primary '\novs-vsctl: other_config:pmd-cpu-mask=: argument does not end in \"=\" followed by a value.\n++ init_ovs_bridges\n+++ get_value ovs bridge_mappings\n+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs bridge_mappings\n/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found\n++ raw_bridge_mappings=\n++ bridge_mappings=(${raw_bridge_mappings//,/ })\n++ init_ovs_interfaces\n++ pci_port_pairs=\n+++ list_dpdk_nics\n++++ get_value ovs port_mappings\n++++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs port_mappings\n++++ cut -d : -f 1\n++++ tr , '\\n'\n/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found\n+++ echo\n+++ sort\n++ pci_port_pairs=\n++ dpdk_port_number=0\n+ set +o xtrace",
"stderr_lines": [
"++ realpath /var/lib/kolla/config_files/ovs-dpdkctl.sh",
"+ FULL_PATH=/var/lib/kolla/config_files/ovs-dpdkctl.sh",
"+ CONFIG_FILE=/var/lib/kolla/config_files/ovs-dpdkctl.conf",
"+ SERVICE_FILE=/etc/systemd/system/ovs-dpdkctl.service",
"+ BRIDGE_SERVICE_FILE=/etc/systemd/system/ovs-dpdk-bridge.service",
"+ '[' 1 -ge 1 ']'",
"+ func=init",
"+ shift",
"+ eval 'init '",
"++ init",
"++ init_ovs_db",
"++ ovs-vsctl init",
"+++ get_value ovs pmd_coremask",
"+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs pmd_coremask",
"/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found",
"+++ get_value ovs ovs_coremask",
"+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs ovs_coremask",
"/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found",
"+++ get_value ovs ovs_mem_channels",
"+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs ovs_mem_channels",
"/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found",
"+++ get_value ovs ovs_socket_mem",
"+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs ovs_socket_mem",
"/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found",
"+++ get_value ovs hugepage_mountpoint",
"+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs hugepage_mountpoint",
"/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found",
"+++ get_value ovs pci_whitelist",
"+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs pci_whitelist",
"/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found",
"++ ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask= other_config:dpdk-init=True other_config:dpdk-lcore-mask= other_config:dpdk-mem-channels= other_config:dpdk-socket-mem= other_config:dpdk-hugepage-dir= 'other_config:dpdk-extra= --proc-type primary '",
"ovs-vsctl: other_config:pmd-cpu-mask=: argument does not end in \"=\" followed by a value.",
"++ init_ovs_bridges",
"+++ get_value ovs bridge_mappings",
"+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs bridge_mappings",
"/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found",
"++ raw_bridge_mappings=",
"++ bridge_mappings=(${raw_bridge_mappings//,/ })",
"++ init_ovs_interfaces",
"++ pci_port_pairs=",
"+++ list_dpdk_nics",
"++++ get_value ovs port_mappings",
"++++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs port_mappings",
"++++ cut -d : -f 1",
"++++ tr , '\\n'",
"/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found",
"+++ echo",
"+++ sort",
"++ pci_port_pairs=",
"++ dpdk_port_number=0",
"+ set +o xtrace"
],
"stdout": "",
"stdout_lines": []
}
安装crudini
docker exec -u root -it ovsdpdk_db bash
curl http://archive.ubuntu.com/ubuntu/pool/universe/c/crudini/crudini_0.7-1_amd64.deb -o crudini_0.7-1_amd64.deb
curl http://archive.ubuntu.com/ubuntu/pool/universe/p/python-iniparse/python-iniparse_0.4-2.2_all.deb -o python-iniparse_0.4-2.2_all.deb
curl http://archive.ubuntu.com/ubuntu/pool/main/s/six/python-six_1.11.0-2_all.deb -o python-six_1.11.0-2_all.deb
dpkg -i python-six_1.11.0-2_all.deb
dpkg -i python-iniparse_0.4-2.2_all.deb
dpkg -i crudini_0.7-1_amd64.deb
我在k-node1上检查了我所有的Ethernet controller
for item in $(lspci | grep Ethernet | awk '{print $1}'); \
do echo $item; \
ls /sys/bus/pci/devices/0000:$item; done
02:01.0
acpi_index config device driver_override irq local_cpus net remove resource resource4 subsystem_device vendor
broken_parity_status consistent_dma_mask_bits dma_mask_bits enable label modalias numa_node rescan resource0 rom subsystem_vendor
class d3cold_allowed driver firmware_node local_cpulist msi_bus power reset resource2 subsystem uevent
02:02.0
acpi_index config device enable label modalias power reset resource2 subsystem uevent
broken_parity_status consistent_dma_mask_bits dma_mask_bits firmware_node local_cpulist msi_bus remove resource resource4 subsystem_device vendor
class d3cold_allowed driver_override irq local_cpus numa_node rescan resource0 rom subsystem_vendor
发现一张网卡上没有driver文件
在k-node2上检查所有的Ethernet controller
root@k-node2:~# for item in $(lspci | grep Ethernet | awk '{print $1}'); \
> do echo $item; \
> ls /sys/bus/pci/devices/0000:$item; done
02:01.0
acpi_index config device driver_override irq local_cpus net remove resource resource4 subsystem_device vendor
broken_parity_status consistent_dma_mask_bits dma_mask_bits enable label modalias numa_node rescan resource0 rom subsystem_vendor
class d3cold_allowed driver firmware_node local_cpulist msi_bus power reset resource2 subsystem uevent
02:02.0
acpi_index config device driver_override irq local_cpus net remove resource resource4 subsystem_device vendor
broken_parity_status consistent_dma_mask_bits dma_mask_bits enable label modalias numa_node rescan resource0 rom subsystem_vendor
class d3cold_allowed driver firmware_node local_cpulist msi_bus power reset resource2 subsystem uevent
发现k-node2上所有的网卡都是有driver文件的。
查了资料linux - writing in /sys/bus/pci/… fails - Stack Overflow 查看k-node1上有问题的网卡
(deploy) root@k-node1:~/kolla-ansible# lspci -v -s 0000:02:02.0
02:02.0 Ethernet controller: Intel Corporation 82545EM Gigabit Ethernet Controller (Copper) (rev 01)
DeviceName: Ethernet1
Subsystem: VMware PRO/1000 MT Single Port Adapter
Physical Slot: 34
Flags: 66MHz, medium devsel, IRQ 16
Memory at fc020000 (64-bit, non-prefetchable) [size=128K]
Memory at fc050000 (64-bit, non-prefetchable) [size=64K]
I/O ports at 1040 [size=64]
Expansion ROM at fc080000 [disabled] [size=64K]
Capabilities: [dc] Power Management version 2
Capabilities: [e4] PCI-X non-bridge device
Kernel modules: e1000
查看k-node2上对应没有问题的网卡
root@k-node2:~# lspci -v -s 0000:02:02.0
02:02.0 Ethernet controller: Intel Corporation 82545EM Gigabit Ethernet Controller (Copper) (rev 01)
DeviceName: Ethernet1
Subsystem: VMware PRO/1000 MT Single Port Adapter
Physical Slot: 34
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 16
Memory at fc020000 (64-bit, non-prefetchable) [size=128K]
Memory at fc050000 (64-bit, non-prefetchable) [size=64K]
I/O ports at 1040 [size=64]
Expansion ROM at fc080000 [disabled] [size=64K]
Capabilities: [dc] Power Management version 2
Capabilities: [e4] PCI-X non-bridge device
Kernel driver in use: e1000
Kernel modules: e1000
lsmod | grep uio_pci_generic
muranoThe Keystone service is temporarily unavailable. (HTTP 503)
docker exec murano_api murano \
--os-username admin \
--os-password \
uxlNttg9tklN88M5FH7SaWNq6UXZ5sBOIsyHZnh0 \
--os-project-name admin \
--os-auth-url https://10.10.1.205:35357 \
--os-cacert /etc/pki/ca-trust/source/anchors/kolla-customca-haproxy-internal.crt \
--murano-url https://10.10.1.205:8082 package-list
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-XS8s4DTy-1596375800707)(:storage/865ae890-c0b3-4375-b2f1-2fd0051c29d3/e04bfedd.png)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4g1bcEkv-1596375800709)(:storage/865ae890-c0b3-4375-b2f1-2fd0051c29d3/1b61261d.png)]
ansible -i multinode all -m shell -a "docker volume ls"
ansible -i /root/multinode control -m shell -a "docker ps | grep mariadb"
ansible -i /root/multinode control -m shell \
-a "cat /etc/kolla/mariadb/galera.cnf"
ansible -i /root/multinode control -m shell \
-a "cat /etc/kolla/mariadb/config.json"
task - name: Wait for Monasca Grafana to load failed
fatal: [10.10.1.201]: FAILED! => {
"action": "uri",
"attempts": 10,
"cache_control": "no-cache",
"changed": false,
"connection": "close",
"content": "<html><body><h1>503 Service Unavailable</h1>\nNo server is available to handle this request.\n</body></html>\n",
"content_type": "text/html",
"elapsed": 0,
"invocation": {
"module_args": {
"attributes": null,
"backup": null,
"body": null,
"body_format": "raw",
"client_cert": null,
"client_key": null,
"content": null,
"creates": null,
"delimiter": null,
"dest": null,
"directory_mode": null,
"follow": false,
"follow_redirects": "safe",
"force": false,
"force_basic_auth": false,
"group": null,
"headers": {},
"http_agent": "ansible-httpget",
"method": "GET",
"mode": null,
"owner": null,
"regexp": null,
"remote_src": null,
"removes": null,
"return_content": false,
"selevel": null,
"serole": null,
"setype": null,
"seuser": null,
"src": null,
"status_code": [
"200"
],
"timeout": 30,
"unix_socket": null,
"unsafe_writes": null,
"url": "https://10.10.1.205:3001/login",
"url_password": null,
"url_username": null,
"use_proxy": true,
"validate_certs": false
}
},
"msg": "Status code was 503 and not [200]: HTTP Error 503: Service Unavailable",
"redirected": false,
"status": 503,
"url": "https://10.10.1.205:3001/login"
}
2020-06-04 07:00:44 CST | ERROR | forwarder | tornado.application(ioloop.py:909) | Exception in callback <bound method Forwarder.flush of <monasca_agent.forwarder.daemon.Forwarder object at 0x7f4642bbda90>>
Traceback (most recent call last):
File "/var/lib/kolla/venv/lib64/python3.6/site-packages/tornado/ioloop.py", line 907, in _run
return self.callback()
File "/var/lib/kolla/venv/lib/python3.6/site-packages/monasca_agent/forwarder/daemon.py", line 168, in flush
self._post_metrics()
File "/var/lib/kolla/venv/lib/python3.6/site-packages/monasca_agent/forwarder/daemon.py", line 159, in _post_metrics
self._endpoint.post_metrics(message_batch)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/monasca_agent/forwarder/api/monasca_api.py", line 136, in post_metrics
self._post(tenant_group[tenant], tenant)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/monasca_agent/forwarder/api/monasca_api.py", line 85, in _post
self._mon_client = self._get_mon_client()
File "/var/lib/kolla/venv/lib/python3.6/site-packages/monasca_agent/forwarder/api/monasca_api.py", line 140, in _get_mon_client
endpoint = k.get_monasca_url()
File "/var/lib/kolla/venv/lib/python3.6/site-packages/monasca_agent/common/keystone.py", line 309, in get_monasca_url
catalog = self._init_client().auth_ref.service_catalog
File "/var/lib/kolla/venv/lib/python3.6/site-packages/monasca_agent/common/keystone.py", line 276, in _init_client
ks = get_client(**self._config)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/monasca_agent/common/keystone.py", line 191, in get_client
disc = discover.Discover(session=sess)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneclient/discover.py", line 178, in __init__
authenticated=authenticated)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneclient/_discover.py", line 143, in __init__
authenticated=authenticated)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneclient/_discover.py", line 38, in get_version_data
resp = session.get(url, headers=headers, authenticated=authenticated)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py", line 1123, in get
return self.request(url, 'GET', **kwargs)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py", line 772, in request
auth_headers = self.get_auth_headers(auth)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py", line 1183, in get_auth_headers
return auth.get_headers(self, **kwargs)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/plugin.py", line 95, in get_headers
token = self.get_token(session)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 88, in get_token
return self.get_access(session).auth_token
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 134, in get_access
self.auth_ref = self.get_auth_ref(session)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py", line 206, in get_auth_ref
self._plugin = self._do_create_plugin(session)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py", line 138, in _do_create_plugin
authenticated=False)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 610, in get_discovery
authenticated=authenticated)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/discover.py", line 1452, in get_discovery
"/var/lib/docker/volumes/kolla_logs/_data/monasca/agent-forwarder.log" 4029234L, 322958739C 1,1 Top
ks = get_client(**self._config)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/monasca_agent/common/keystone.py", line 191, in get_client
disc = discover.Discover(session=sess)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneclient/discover.py", line 178, in __init__
authenticated=authenticated)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneclient/_discover.py", line 143, in __init__
authenticated=authenticated)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneclient/_discover.py", line 38, in get_version_data
resp = session.get(url, headers=headers, authenticated=authenticated)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py", line 1123, in get
return self.request(url, 'GET', **kwargs)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py", line 772, in request
auth_headers = self.get_auth_headers(auth)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py", line 1183, in get_auth_headers
return auth.get_headers(self, **kwargs)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/plugin.py", line 95, in get_headers
token = self.get_token(session)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 88, in get_token
return self.get_access(session).auth_token
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 134, in get_access
self.auth_ref = self.get_auth_ref(session)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py", line 206, in get_auth_ref
self._plugin = self._do_create_plugin(session)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py", line 138, in _do_create_plugin
authenticated=False)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 610, in get_discovery
authenticated=authenticated)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/discover.py", line 1452, in get_discovery
disc = Discover(session, url, authenticated=authenticated)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/discover.py", line 536, in __init__
authenticated=authenticated)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/discover.py", line 102, in get_version_data
resp = session.get(url, headers=headers, authenticated=authenticated)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py", line 1123, in get
return self.request(url, 'GET', **kwargs)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py", line 913, in request
resp = send(**kwargs)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py", line 1004, in _send_request
resp = self.session.request(method, url, **kwargs)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/requests/adapters.py", line 416, in send
self.cert_verify(conn, request.url, verify, cert)
File "/var/lib/kolla/venv/lib/python3.6/site-packages/requests/adapters.py", line 228, in cert_verify
"invalid path: {}".format(cert_loc))
OSError: Could not find a suitable TLS CA certificate bundle, invalid path: /var/lib/kolla/venv/lib/python' ~ distro_python_version ~ '/site-packages/certifi/cacert.pem
task [monasca : List influxdb databases] failed
fatal: [10.10.1.202 -> 10.10.1.202]: FAILED! => {
"changed": false,
"cmd": [
"docker",
"exec",
"influxdb",
"influx",
"-host",
"10.10.1.205",
"-port",
"8086",
"-execute",
"show databases"
],
"delta": "0:00:00.300792",
"end": "2020-06-06 09:28:26.936263",
"invocation": {
"module_args": {
"_raw_params": "docker exec influxdb influx -host 10.10.1.205 -port 8086 -execute 'show databases'",
"_uses_shell": false,
"argv": null,
"chdir": null,
"creates": null,
"executable": null,
"removes": null,
"stdin": null,
"stdin_add_newline": true,
"strip_empty_ends": true,
"warn": true
}
},
"msg": "non-zero return code",
"rc": 1,
"start": "2020-06-06 09:28:26.635471",
"stderr": "Failed to connect to http://10.10.1.205:8086: Get http://10.10.1.205:8086/ping: EOF\nPlease check your connection settings and ensure 'influxd' is running.",
"stderr_lines": [
"Failed to connect to http://10.10.1.205:8086: Get http://10.10.1.205:8086/ping: EOF",
"Please check your connection settings and ensure 'influxd' is running."
],
"stdout": "",
"stdout_lines": []
}
NO MORE HOSTS LEFT *************************************************************************************************************
PLAY RECAP *********************************************************************************************************************
10.10.1.201 : ok=46 changed=0 unreachable=0 failed=0 skipped=27 rescued=0 ignored=0
10.10.1.202 : ok=77 changed=2 unreachable=0 failed=1 skipped=12 rescued=0 ignored=0
localhost : ok=4 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
手动在节点2上执行了一把
root@k-node2:~# docker exec influxdb influx -host 10.10.1.205 -port 8086 -execute 'show databases'
Failed to connect to http://10.10.1.205:8086: Get http://10.10.1.205:8086/ping: EOF
Please check your connection settings and ensure 'influxd' is running.
查看启动文件/etc/kolla/influxdb/config.json
发现启动命令是
"command": "/usr/bin/influxd -config /etc/influxdb/influxdb.conf",
手动起一下服务,command
改成下面的
"command": "sleep infinity",
lsof 是ok的
后来发现是happroxy那边挂了ssl所以调用时要加ssl
docker exec influxdb influx -host 10.10.1.205 -port 8086 -unsafeSsl -ssl -execute 'show databases'
TASK [monasca : Enable Monasca Grafana datasource for control plane organisation] failed
fatal: [10.10.1.202]: FAILED! => {
"msg": "The conditional check 'monasca_grafana_datasource_response.status not in [200, 409] or (monasca_grafana_datasource_response.status == 409 and (\"Data source with same name already exists\" not in monasca_grafana_datasource_response.json.message|default(\"\"))' failed. The error was: template error while templating string: unexpected '}', expected ')'. String: {% if monasca_grafana_datasource_response.status not in [200, 409] or (monasca_grafana_datasource_response.status == 409 and (\"Data source with same name already exists\" not in monasca_grafana_datasource_response.json.message|default(\"\")) %} True {% else %} False {% endif %}"
}
通过debug打印出来的数据如下
实际上的数据如下
{
"changed": false,
"msg": "All items completed",
"results": [
{
"action": "uri",
"ansible_loop_var": "item",
"changed": false,
"connection": "close",
"content_length": "55",
"content_type": "application/json; charset=UTF-8",
"date": "Thu, 11 Jun 2020 14:27:34 GMT",
"elapsed": 0,
"failed": false,
"invocation": {
"module_args": {
"attributes": null,
"backup": null,
"body": "{\"name\": \"Monasca API\", \"type\": \"monasca-datasource\", \"access\": \"proxy\", \"url\": \"https://10.10.1.205:8070\", \"isDefault\": true, \"basicAuth\": false, \"jsonData\": {\"keystoneAuth\": true}}",
"body_format": "json",
"client_cert": null,
"client_key": null,
"content": null,
"creates": null,
"delimiter": null,
"dest": null,
"directory_mode": null,
"follow": false,
"follow_redirects": "safe",
"force": false,
"force_basic_auth": true,
"group": null,
"headers": {
"Content-Type": "application/json"
},
"http_agent": "ansible-httpget",
"method": "POST",
"mode": null,
"owner": null,
"password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
"regexp": null,
"remote_src": null,
"removes": null,
"return_content": false,
"selevel": null,
"serole": null,
"setype": null,
"seuser": null,
"src": null,
"status_code": [
"200",
" 409"
],
"timeout": 30,
"unix_socket": null,
"unsafe_writes": null,
"url": "https://10.10.1.205:3001/api/datasources",
"url_password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
"url_username": "grafana_local_admin",
"use_proxy": true,
"user": "grafana_local_admin",
"validate_certs": false
}
},
"item": {
"key": "monasca",
"value": {
"data": {
"access": "proxy",
"basicAuth": false,
"isDefault": true,
"jsonData": {
"keystoneAuth": true
},
"name": "Monasca API",
"type": "monasca-datasource",
"url": "https://10.10.1.205:8070"
},
"enabled": true
}
},
"json": {
"message": "Data source with same name already exists"
},
"msg": "HTTP Error 409: Conflict",
"redirected": false,
"status": 409,
"url": "https://10.10.1.205:3001/api/datasources"
}
]
}
curl -X POST -u grafana_local_admin:XxrkRhDOHGJZnYAiXnnHt8buvKy9i3e5o5EZFSUP \
-d \
https://10.10.1.205:3001/api/datasources
检查各节点性能
ansible -i ~/multinode chrony -m shell -a "uptime echo\n; free -h;"
调整
ansible -i ~/multinode chrony -m shell -a "shutdown 1"
补丁
git fetch ssh://XiaojueGuan@review.opendev.org:29418/openstack/kolla-ansible refs/changes/60/724460/8 && git cherry-pick FETCH_HEAD
git fetch ssh://XiaojueGuan@review.opendev.org:29418/openstack/kolla-ansible refs/changes/17/724217/6 && git cherry-pick FETCH_HEAD
git fetch ssh://XiaojueGuan@review.opendev.org:29418/openstack/kolla-ansible refs/changes/89/726289/2 && git cherry-pick FETCH_HEAD
git fetch ssh://XiaojueGuan@review.opendev.org:29418/openstack/kolla-ansible refs/changes/38/727638/3 && git cherry-pick FETCH_HEAD