ubuntu18 kolla-ansible安装最全组件openstack

持续更新

preparation

部署

部署环境简介

ansible -i ~/kolla-config/kolla-config/multinode/multinode all -m shell -a "lsb_release -a && ip route | grep ens"

localhost | CHANGED | rc=0 >>
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.4 LTS
Release:	18.04
Codename:	bionic
default via 10.10.1.254 dev ens33 proto static
10.10.1.0/24 dev ens33 proto kernel scope link src 10.10.1.241 No LSB modules are available.

10.10.1.243 | CHANGED | rc=0 >>
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.4 LTS
Release:	18.04
Codename:	bionic
default via 10.10.1.254 dev ens33 proto static
10.10.1.0/24 dev ens33 proto kernel scope link src 10.10.1.243 No LSB modules are available.

10.10.1.241 | CHANGED | rc=0 >>
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.4 LTS
Release:	18.04
Codename:	bionic
default via 10.10.1.254 dev ens33 proto static
10.10.1.0/24 dev ens33 proto kernel scope link src 10.10.1.241 No LSB modules are available.

10.10.1.242 | CHANGED | rc=0 >>
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.4 LTS
Release:	18.04
Codename:	bionic
default via 10.10.1.254 dev ens33 proto static
10.10.1.0/24 dev ens33 proto kernel scope link src 10.10.1.242 No LSB modules are available.

部署过程

下载kolla和kolla-ansible

创建虚拟ansible

  1. 修改globals.yml文件和inventory文件
    我的globals.yml和inventory文件如下
    kolla-config/kolla-config/multinode at master · albertjone/kolla-config · GitHub
  2. 生成密码
cp ~/kolla-ansible/etc/kolla/passwords.yml /etc/kolla/passwords.yml

./kolla-ansible/tools/generate_passwords.py

初始化节点(bootstrap-servers)

apt install sshpass -y

kolla-ansible -i ~/kolla-config/kolla-config/multinode/multinode bootstrap-servers

生成证书(certificates)

./kolla-ansible/tools/kolla-ansible -i ~/kolla-config/kolla-config/multinode/multinode certificates

部署检查(prechecks)

./kolla-ansible/tools/kolla-ansible -i ~/kolla-config/kolla-config/multinode/multinode prechecks

开始部署

cd kolla-ansible/tools
./kolla-ansible -i ../../multinode deploy

问题

ssh使用密码无法使用

报错如下

TASK [Gathering Facts] ************************************************************************************************************************************************************************
fatal: [10.10.1.241]: FAILED! => {"msg": "Using a SSH password instead of a key is not possible because Host Key checking is enabled and sshpass does not support this.  Please add this host's fingerprint to your known_hosts file to manage this host."}

解决方法

(build) root@steveguan-1:~# ssh root@10.10.1.241
The authenticity of host '10.10.1.241 (10.10.1.241)' can't be established.
ECDSA key fingerprint is SHA256:DKscpTjQfSK4mT+DmZPU0BKAq80ORJUE+94BSwF5foE.
Are you sure you want to continue connecting (yes/no)? yes
ERROR! ‘listen’ is not a valid attribute for a HandlerTaskInclude

在执行prechecks的时候出现如下报错

(build) root@steveguan-1:~# ./kolla-ansible/tools/kolla-ansible -i ~/kolla-config/kolla-config/multinode/multinode prechecks
Pre-deployment checking : ansible-playbook -i /root/kolla-config/kolla-config/multinode/multinode -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla  -e kolla_action=precheck /root/kolla-ansible/ansible/site.yml
[DEPRECATION WARNING]: The TRANSFORM_INVALID_GROUP_CHARS settings is set to allow bad characters in group names by default, this will change, but still be user configurable on deprecation.
This feature will be removed in version 2.10. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
 [WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details

ERROR! 'listen' is not a valid attribute for a HandlerTaskInclude

The error appears to be in '/root/kolla-ansible/ansible/roles/mariadb/handlers/main.yml': line 66, column 3, but may
be elsewhere in the file depending on the exact syntax problem.

The offending line appears to be:


- name: Restart MariaDB on existing cluster members
  ^ herev

根据https://github.com/ansible/ansible/issues/56580得到的解决方案是

pip install ansible==2.9
Failed to start mariadb

但跑道mariadb的时候无法启动mariadb

TASK [mariadb : Fail on existing but stopped cluster] *****************************************************************************************************************
fatal: [10.10.1.241]: FAILED! => {"changed": false, "msg": "MariaDB cluster exists but is stopped. Please start it using kolla-ansible mariadb_recovery"}
fatal: [10.10.1.242]: FAILED! => {"changed": false, "msg": "MariaDB cluster exists but is stopped. Please start it using kolla-ansible mariadb_recovery"}
fatal: [10.10.1.243]: FAILED! => {"changed": false, "msg": "MariaDB cluster exists but is stopped. Please start it using kolla-ansible mariadb_recovery"}
ansible -i ~/kolla-config/kolla-config/multinode/multinode compute -m shell -a "docker volume rm mariadb"
Failed to start rabbitmq
RUNNING HANDLER [rabbitmq : Waiting for rabbitmq to start on first node] **********************************************************************************************
fatal: [10.10.1.241]: FAILED! => {"changed": true, "cmd": ["docker", "exec", "outward_rabbitmq", "rabbitmqctl", "wait", "/var/lib/rabbitmq/mnesia/rabbitmq.pid"], "delta": "0:00:58.742540", "end": "2020-08-02 16:36:49.006539", "msg": "non-zero return code", "rc": 69, "start": "2020-08-02 16:35:50.263999", "stderr": "Error: unable to perform an operation on node 'rabbit@steveguan-1'. Please see diagnostics information and suggestions below.\n\nMost common reasons for this are:\n\n * Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)\n * CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)\n * Target node is not running\n\nIn addition to the diagnostics info below:\n\n * See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more\n * Consult server logs on node rabbit@steveguan-1\n * If target node is configured to use long node names, don't forget to use --longnames with CLI tools\n\nDIAGNOSTICS\n===========\n\nattempted to contact: ['rabbit@steveguan-1']\n\nrabbit@steveguan-1:\n  * connected to epmd (port 4371) on steveguan-1\n  * epmd reports: node 'rabbit' not running at all\n                  no other nodes on steveguan-1\n  * suggestion: start the node\n\nCurrent node details:\n * node name: 'rabbitmqcli-64-rabbit@steveguan-1'\n * effective user's home directory: /var/lib/rabbitmq\n * Erlang cookie hash: zhg6nXg1QG3gM3mMbAwjgw==", "stderr_lines": ["Error: unable to perform an operation on node 'rabbit@steveguan-1'. Please see diagnostics information and suggestions below.", "", "Most common reasons for this are:", "", " * Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)", " * CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)", " * Target node is not running", "", "In addition to the diagnostics info below:", "", " * See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more", " * Consult server logs on node rabbit@steveguan-1", " * If target node is configured to use long node names, don't forget to use --longnames with CLI tools", "", "DIAGNOSTICS", "===========", "", "attempted to contact: ['rabbit@steveguan-1']", "", "rabbit@steveguan-1:", "  * connected to epmd (port 4371) on steveguan-1", "  * epmd reports: node 'rabbit' not running at all", "                  no other nodes on steveguan-1", "  * suggestion: start the node", "", "Current node details:", " * node name: 'rabbitmqcli-64-rabbit@steveguan-1'", " * effective user's home directory: /var/lib/rabbitmq", " * Erlang cookie hash: zhg6nXg1QG3gM3mMbAwjgw=="], "stdout": "Waiting for pid file '/var/lib/rabbitmq/mnesia/rabbitmq.pid' to appear\npid is 26\nWaiting for erlang distribution on node 'rabbit@steveguan-1' while OS process '26' is running\nWaiting for applications 'rabbit_and_plugins' to start on node 'rabbit@steveguan-1'", "stdout_lines": ["Waiting for pid file '/var/lib/rabbitmq/mnesia/rabbitmq.pid' to appear", "pid is 26", "Waiting for erlang distribution on node 'rabbit@steveguan-1' while OS process '26' is running", "Waiting for applications 'rabbit_and_plugins' to start on node 'rabbit@steveguan-1'"]}
ansible -i ~/kolla-config/kolla-config/multinode/multinode compute -m shell -a "docker exec -u root -it  rabbitmq rabbitmq-plugins disable rabbitmq_prometheus"

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-XXIEls59-1596375800704)(:storage/865ae890-c0b3-4375-b2f1-2fd0051c29d3/4f72ae15.png)]

/usr/sbin/haproxy -W -db -p /run/haproxy.pid -f /etc/haproxy/haproxy.cfg
/usr/sbin/haproxy -db -p /run/haproxy.pid -f /etc/haproxy/haproxy.cfg

/usr/share/elasticsearch/bin/elasticsearch

docker restart elasticsearch && docker exec -u elasticsearch -it elasticsearch bash

docker restart elasticsearch && docker ps | grep elasticsearch

mkdir /usr/share/elasticsearch/config &&
cp /etc/elasticsearch/{elasticsearch.yml, jvm.options, log4j2.properties} /usr/share/elasticsearch/config && /usr/share/elasticsearch/bin/elasticsearch

export ES_PATH_CONF=/etc/elasticsearch; /usr/share/elasticsearch/bin/elasticsearch

mkdir /usr/share/elasticsearch/config
ls -sR /etc/elasticsearch /usr/share/elasticsearch/config

nvalid index name

“reason”: “Invalid index name [.kibana], already exists as alias”
使用下面的curl也会复现出上面的出错
通过下面的的连接
What is an Elasticsearch Index? | Elastic BlogList all indices | Elasticsearch Reference [7.6] | ElasticDelete Index | Elasticsearch Reference [6.0] | Elastic

curl -H 'Content-Type: application/json' \
-X PUT https://10.10.1.205:9200/.kibana \
-d '{"index.mapper.dynamic": "true"}'

# get one index
curl -H 'Content-Type: application/json' \
-X GET https://10.10.1.205:9200/.kibana

# get all alias
curl -H 'Content-Type: application/json' \
-X GET https://10.10.1.205:9200/_alias

# get index alias
curl -H 'Content-Type: application/json' \
-X GET https://10.10.1.205:9200/.kibana/_alias

# check index exists
curl -I https://10.10.1.205:9200/.kibana?pretty

# remov index
curl -H 'Content-Type: application/json' \
-X DELETE https://10.10.1.205:9200/.kibana

curl -H 'Content-Type: application/json' \
-X DELETE https://10.10.1.205:9200/_all
# for test
curl -H 'Content-Type: application/json' \
-X PUT https://10.10.1.205:9200/.xiaojue \
-d '{"index.mapper.dynamic": "true"}'
Self-Signed Certificates failed

Bug #1875561 “Self-Signed Certificates failed” : Bugs : kolla-ansible 根据红帽的文档Chapter 5. Using shared system certificates Red Hat Enterprise Linux 8 | Red Hat Customer PortalOpenStack Docs: Advanced Configuration应该能够生效,但是我在容器中和自己centos测试系统中按照官方文档的介绍,都无法使用curl不加-k来获取数据

修复:Gerrit Code Review

./kolla-ansible -i /root/multinode deploy -t kibana
kolla-toolbox curl不通 elasticsearch的9200端口

kolla-toolbox容器中curl -k https://10.10.1.205:9200/.kibana
原因是deploy时存在如下文件

cat /root/.docker/config.json

{
 "proxies":
 {
   "default":
   {
     "httpProxy": "http://10.10.1.100:1087",
     "httpsProxy": "http://10.10.1.100:1087",
     "noProxy": "*.test.example.com,.example2.com"
   }
 }
}

解决方法:

cd kolla-ansible 
./kolla-ansible -i ../../multinode destroy --yes-i-really-really-mean-it
./kolla-ansible -i ../../multinode deploy
failed TASK [kibana : Change kibana config to set index as defaultIndex]

error message

fatal: [10.10.1.201]: FAILED! => {"action": "uri", "changed": false, "connection": "close", "content": "{\"error\":{\"root_cause\":[{\"type\":\"illegal_argument_exception\",\"reason\":\"Rejecting mapping update to [.kibana_1] as the final mapping would have more than 1 type: [doc, config]\"}],\"type\":\"illegal_argument_exception\",\"reason\":\"Rejecting mapping update to [.kibana_1] as the final mapping would have more than 1 type: [doc, config]\"},\"status\":400}", "content_length": "343", "content_type": "application/json; charset=UTF-8", "elapsed": 0, "json": {"error": {"reason": "Rejecting mapping update to [.kibana_1] as the final mapping would have more than 1 type: [doc, config]", "root_cause": [{"reason": "Rejecting mapping update to [.kibana_1] as the final mapping would have more than 1 type: [doc, config]", "type": "illegal_argument_exception"}], "type": "illegal_argument_exception"}, "status": 400}, "msg": "Status code was 400 and not [200, 201]: HTTP Error 400: Bad Request", "redirected": false, "status": 400, "url": "https://10.10.1.205:9200/.kibana/config/*"}
# get version
(kolla-toolbox)[root@k-node1 /]# curl https://10.10.1.205:9200 -k
{
  "name" : "10.10.1.201",
  "cluster_name" : "kolla_logging",
  "cluster_uuid" : "iAFzv7jlSsm0eWnT3s3iaw",
  "version" : {
    "number" : "6.8.8",
    "build_flavor" : "oss",
    "build_type" : "rpm",
    "build_hash" : "2f4c224",
    "build_date" : "2020-03-18T23:22:18.622755Z",
    "build_snapshot" : false,
    "lucene_version" : "7.7.2",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

# creat test index
curl -H 'Content-Type: application/json' \
-X PUT \
http://172.16.50.247:9200/.test/


http://172.16.50.247:9200/
# get kibana index info
curl -H 'Content-Type: application/json' \
-X GET \
http://172.16.50.247:9200/.kibana/

curl -H 'Content-Type: application/json' \
-X GET \
http://172.16.50.247:9200/.test/

# get .kibana config
curl -H 'Content-Type: application/json' \
-X GET \
https://10.10.1.205:9200/.kibana/config/* \
-k
{
  "_index": ".kibana_1",
  "_type": "config",
  "_id": "*",
  "found": false
}


# get .kibana index-pattern
curl -H 'Content-Type: application/json' \
-X GET \
https://10.10.1.205:9200/.kibana/index-pattern/

# set default kibana index-pattern
curl -H 'Content-Type: application/json' \
-X PUT \
https://10.10.1.205:9200/.kibana/config/* \
-d '{"defaultIndex": "flog-*"}' -k


curl -H 'Content-Type: application/json' \
-X PUT \
https://10.10.1.205:9200/.kibana/_settings \
-d '{
    "changes": {
        "defaultIndex": "flog-*"
    }
}'


curl -H 'Content-Type: application/json' \
-X PUT \
https://10.10.1.205:9200/.kibana/_mapping \
-d '
{
  "properties": {
    "defaultIndex": {
      "type": "keyword",
      "index": false
    }
  }
}'




curl -H 'Content-Type: application/json' \
-X PUT \
https://10.10.1.205:9200/.kibana/settings/defaultIndex \
-d '{"value": "flog-*"}' -k


curl -XGET "https://10.10.1.205:9200/_cat/indices"

curl -XGET "https://10.10.1.205:9200/.kibana"
ironic fails
TASK [ironic : Copying ironic-agent kernel and initramfs (iPXE)] ******************************************************************************************************************************
task path: /root/kolla-ansible/ansible/roles/ironic/tasks/config.yml:171
<10.10.1.201> ESTABLISH SSH CONNECTION FOR USER: root
<10.10.1.201> SSH: EXEC sshpass -d12 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/d5eb36b79d 10.10.1.201 '/bin/sh -c '"'"'echo ~root && sleep 0'"'"''
<10.10.1.201> (0, b'/root\n', b'')
<10.10.1.201> ESTABLISH SSH CONNECTION FOR USER: root
<10.10.1.201> SSH: EXEC sshpass -d12 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/d5eb36b79d 10.10.1.201 '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir /root/.ansible/tmp/ansible-tmp-1589006790.8853176-117770-146080183204827 && echo ansible-tmp-1589006790.8853176-117770-146080183204827="` echo /root/.ansible/tmp/ansible-tmp-1589006790.8853176-117770-146080183204827 `" ) && sleep 0'"'"''
<10.10.1.201> (0, b'ansible-tmp-1589006790.8853176-117770-146080183204827=/root/.ansible/tmp/ansible-tmp-1589006790.8853176-117770-146080183204827\n', b'')
<10.10.1.201> ESTABLISH SSH CONNECTION FOR USER: root
<10.10.1.201> SSH: EXEC sshpass -d12 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/d5eb36b79d 10.10.1.201 '/bin/sh -c '"'"'rm -f -r /root/.ansible/tmp/ansible-tmp-1589006790.8853176-117770-146080183204827/ > /dev/null 2>&1 && sleep 0'"'"''
<10.10.1.201> (0, b'', b'')
The full traceback is:
Traceback (most recent call last):
  File "/root/deploy/lib/python3.7/site-packages/ansible/plugins/action/copy.py", line 464, in run
    source = self._find_needle('files', source)
  File "/root/deploy/lib/python3.7/site-packages/ansible/plugins/action/__init__.py", line 1178, in _find_needle
    return self._loader.path_dwim_relative_stack(path_stack, dirname, needle)
  File "/root/deploy/lib/python3.7/site-packages/ansible/parsing/dataloader.py", line 327, in path_dwim_relative_stack
    raise AnsibleFileNotFound(file_name=source, paths=[to_native(p) for p in search])
ansible.errors.AnsibleFileNotFound: Could not find or access '/etc/kolla/config/ironic/ironic-agent.kernel' on the Ansible Controller.
If you are using a module and expect the file to exist on the remote, see the remote_src option
failed: [10.10.1.201] (item=ironic-agent.kernel) => {
    "ansible_loop_var": "item",
    "changed": false,
    "invocation": {
        "dest": "/etc/kolla/ironic-ipxe/ironic-agent.kernel",
        "mode": "0660",
        "module_args": {
            "dest": "/etc/kolla/ironic-ipxe/ironic-agent.kernel",
            "mode": "0660",
            "src": "/etc/kolla/config/ironic/ironic-agent.kernel"
        },
        "src": "/etc/kolla/config/ironic/ironic-agent.kernel"
    },
    "item": "ironic-agent.kernel",
    "msg": "Could not find or access '/etc/kolla/config/ironic/ironic-agent.kernel' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"
}

解决方法参考OpenStack Docs: Ironic in Kolla

mkdir /etc/kolla/config/ironic/

 curl https://tarballs.openstack.org/ironic-python-agent/coreos/files/coreos_production_pxe.vmlinuz \
  -o /etc/kolla/config/ironic/ironic-agent.kernel -L
  
curl https://tarballs.openstack.org/ironic-python-agent/coreos/files/coreos_production_pxe_image-oem.cpio.gz \
  -o /etc/kolla/config/ironic/ironic-agent.initramfs -L
murano fail
"docker", "exec", "murano_api", "murano", "--os-username", "admin", "--os-password", "uxlNttg9tklN88M5FH7SaWNq6UXZ5sBOIsyHZnh0", "--os-project-name", "admin", "--os-cacert", "/etc/pki/ca-trust/source/anchors/kolla-customca-haproxy-internal.crt", "--os-auth-url", "https://10.10.1.205:35357", "--murano-url", "https://10.10.1.201:8082", "package-list"
数据库挂了

参考Two-Node Clusters — Galera Cluster Documentation

docker exec -u mysql -it  mariadb  galera_new_cluster
 
  docker exec -u mysql -it  mariadb mysqld_safe --wsrep-recover
 /usr/bin/mysqld_safe
 
 docker exec -u mysql -it  mariadb bash
 
 docker exec -u root -it  mariadb bash
 
 docker exec -u root -it  mariadb \
 mysql -u root -p3rWYF9UTz9hkegcSOeyjtuvCWKDAFhaIXXOLlvmw
 
  docker exec -u root -it  mariadb \
 mysql -V
 
 tailf /var/lib/docker/volumes/kolla_logs/_data/mariadb/mariadb.log
 
 
 show status like "%wsrep%";
 +-------------------------------+----------------------+
| Variable_name                 | Value                |
+-------------------------------+----------------------+
| wsrep_applier_thread_count    | 0                    |
| wsrep_cluster_conf_id         | 18446744073709551615 |
| wsrep_cluster_size            | 0                    |
| wsrep_cluster_state_uuid      |                      |
| wsrep_cluster_status          | Disconnected         |
| wsrep_connected               | OFF                  |
| wsrep_local_bf_aborts         | 0                    |
| wsrep_local_index             | 18446744073709551615 |
| wsrep_provider_name           |                      |
| wsrep_provider_vendor         |                      |
| wsrep_provider_version        |                      |
| wsrep_ready                   | OFF                  |
| wsrep_rollbacker_thread_count | 0                    |
| wsrep_thread_count            | 0                    |
+-------------------------------+----------------------+
 
 
docker exec -u root -it  mariadb mysql -V

ansible -i ~/multinode chrony -m shell -a "docker ps | grep mariadb"
  
cd ~/kolla-ansible/tools 
./kolla-ansible -i ~/multinode  mariadb_recovery
ovs无法安装
The full traceback is:
WARNING: The below traceback may *not* be related to the actual failure.
  File "/tmp/ansible_kolla_docker_payload_w4s6XR/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py", line 1024, in main
  File "/tmp/ansible_kolla_docker_payload_w4s6XR/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py", line 747, in recreate_or_restart_container
  File "/tmp/ansible_kolla_docker_payload_w4s6XR/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py", line 765, in start_container
  File "/tmp/ansible_kolla_docker_payload_w4s6XR/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py", line 571, in pull_image
  File "/usr/local/lib/python2.7/dist-packages/docker/api/image.py", line 415, in pull
    self._raise_for_status(response)
  File "/usr/local/lib/python2.7/dist-packages/docker/api/client.py", line 263, in _raise_for_status
    raise create_api_error_from_http_exception(e)
  File "/usr/local/lib/python2.7/dist-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
    raise cls(e, response=response, explanation=explanation)
fatal: [10.10.1.201]: FAILED! => {
    "changed": true,
    "invocation": {
        "module_args": {
            "action": "recreate_or_restart_container",
            "api_version": "auto",
            "auth_email": null,
            "auth_password": null,
            "auth_registry": "10.10.1.201:4000",
            "auth_username": null,
            "cap_add": [],
            "client_timeout": 120,
            "command": null,
            "detach": true,
            "dimensions": {},
            "environment": {
                "KOLLA_CONFIG_STRATEGY": "COPY_ALWAYS"
            },
            "graceful_timeout": 10,
            "image": "10.10.1.201:4000/kolla/centos-source-ovsdpdk-db:fe6fd8dc5d",
            "labels": {},
            "name": "ovsdpdk_db",
            "privileged": false,
            "remove_on_exit": true,
            "restart_policy": "unless-stopped",
            "restart_retries": 10,
            "security_opt": [],
            "state": "running",
            "tls_cacert": null,
            "tls_cert": null,
            "tls_key": null,
            "tls_verify": false,
            "tty": false,
            "volumes": [
                "/etc/kolla/ovsdpdk-db/:/var/lib/kolla/config_files/:ro",
                "/etc/localtime:/etc/localtime:ro",
                "",
                "/run/openvswitch:/run/openvswitch:shared",
                "kolla_logs:/var/log/kolla/",
                "ovsdpdk_db:/var/lib/openvswitch/"
            ],
            "volumes_from": null
        }
    },
    "msg": "'Traceback (most recent call last):\\n  File \"/tmp/ansible_kolla_docker_payload_w4s6XR/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py\", line 1024, in main\\n  File \"/tmp/ansible_kolla_docker_payload_w4s6XR/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py\", line 747, in recreate_or_restart_container\\n  File \"/tmp/ansible_kolla_docker_payload_w4s6XR/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py\", line 765, in start_container\\n  File \"/tmp/ansible_kolla_docker_payload_w4s6XR/ansible_kolla_docker_payload.zip/ansible/modules/kolla_docker.py\", line 571, in pull_image\\n  File \"/usr/local/lib/python2.7/dist-packages/docker/api/image.py\", line 415, in pull\\n    self._raise_for_status(response)\\n  File \"/usr/local/lib/python2.7/dist-packages/docker/api/client.py\", line 263, in _raise_for_status\\n    raise create_api_error_from_http_exception(e)\\n  File \"/usr/local/lib/python2.7/dist-packages/docker/errors.py\", line 31, in create_api_error_from_http_exception\\n    raise cls(e, response=response, explanation=explanation)\\nNotFound: 404 Client Error: Not Found (\"manifest for 10.10.1.201:4000/kolla/centos-source-ovsdpdk-db:fe6fd8dc5d not found: manifest unknown: manifest unknown\")\\n'"
}

由于目前centos没有ovs-dpdk的源所以暂时无法build基于centos的kolla镜像。

解决方法:
使用Ubuntu的镜像

docker pull kolla/ubuntu-source-ovsdpdk:8.0.3
docker pull kolla/ubuntu-source-ovsdpdk-db:8.0.3
docker pull kolla/ubuntu-source-ovsdpdk-vswitchd:8.0.3


docker tag kolla/ubuntu-source-ovsdpdk:8.0.3 10.10.1.201:4000/kolla/centos-source-ovsdpdk:fe6fd8dc5d
docker tag kolla/ubuntu-source-ovsdpdk-db:8.0.3 10.10.1.201:4000/kolla/centos-source-ovsdpdk-db:fe6fd8dc5d
docker tag kolla/ubuntu-source-ovsdpdk-vswitchd:8.0.3 10.10.1.201:4000/kolla/centos-source-ovsdpdk-vswitchd:fe6fd8dc5d

docker push 10.10.1.201:4000/kolla/centos-source-ovsdpdk:fe6fd8dc5d
docker push 10.10.1.201:4000/kolla/centos-source-ovsdpdk-db:fe6fd8dc5d
docker push 10.10.1.201:4000/kolla/centos-source-ovsdpdk-vswitchd:fe6fd8dc5d
单个节点上outward_rabbitmq容器不断重启

进入正常容器查看进程

docker exec -u root -it outward_rabbitmq bash 
ps -ef | cat

/usr/lib64/erlang/erts-10.7.1/bin/beam.smp \
-W w -A 64 -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 512 -MMmcs 30 -P 1048576 -t 5000000 \
-stbt db -zdbbl 128000 -K true -B i \
-- -root /usr/lib64/erlang -progname erl \
-- -home /var/lib/rabbitmq -epmd_port 4371 \
-- -pa \
/usr/lib/rabbitmq/lib/rabbitmq_server-3.8.3/ebin \
-noshell -noi nput -s rabbit boot -sname rabbit@k-node1 -boot start_sasl \
-conf /etc/rabbitmq/rabbitmq.conf -conf_dir /var/lib/rabbitmq/config \
-conf_script_dir /usr/lib/rabbitmq/bin 
-conf_schema_dir /var/lib/rabbitmq/schema \
-conf_advanced /etc/rabbitmq/advanced.config \
-kernel inet_default_connect_options [{nodelay,true}] \
-kernel inetrc '/etc/rabbitmq/erl_inetrc' \
-sasl errlog_type error -sasl sa sl_error_logger false \
-rabbit lager_log_root "/var/log/kolla/outward_rabbitmq" \
-rabbit lager_default_file "/var/log/kolla/outward_rabbitmq/rabbit@k-node1.log" \
-rabbit lager_upgrade_file "/var/log/kolla/outward_rabbitmq/rabbit@k-node1_upgrade.log" \
-rabbit feature_flags_file "/var/lib/rabbitmq/mnesia/rabbit@k-node1-feature_flags" \
-rabbit enabled_plugins_file "/etc/rabbitmq/enabled_plugins" \
-rabbit plugins_dir "/usr/lib/rabbitmq/plugins:/usr/lib/rabbitmq/lib/rabbitmq_server-3.8.3/plugins" \
-rabbit plugins_expand_dir "/var/lib/rabbitmq/mnesia/rabbit@k-node1-plugins-expand" 
-os_mon start_cpu_sup false \
-os_mon start_disksup false \
-os_mon start_memsup false \
-mnesia dir "/var/lib/rabbitmq/mnesia/rabbit@k-node1" \
-ra data_dir "/var/lib/rabbitmq/mnesia/rabbit@k-node1/quorum" \
-kernel inet_dist_listen_min 25674 -kernel inet_dist_listen_max 25674 --

容器启动命令

/usr/sbin/rabbitmq-server
手动启动容器服务

修改失败容器的启动文件/etc/kolla/outward_rabbitmq/config.json进行如下修改

{
    "command": "/usr/sbin/rabbitmq-server",
    "command": "sleep infinity", # 增加前面配置
    ...
}

重启容器

docker restart outward_rabbitmq

进入容器在容器中执行服务启动命令

docker exec -u root -it outward_rabbitmq bash
/usr/sbin/rabbitmq-server

获取部分失败日志

##  ##      RabbitMQ 3.8.3
  ##  ##
  ##########  Copyright (c) 2007-2020 Pivotal Software, Inc.
  ######  ##
  ##########  Licensed under the MPL 1.1. Website: https://rabbitmq.com

  Doc guides: https://rabbitmq.com/documentation.html
  Support:    https://rabbitmq.com/contact.html
  Tutorials:  https://rabbitmq.com/getstarted.html
  Monitoring: https://rabbitmq.com/monitoring.html

  Logs: /var/log/kolla/outward_rabbitmq/rabbit@k-node2.log
        /var/log/kolla/outward_rabbitmq/rabbit@k-node2_upgrade.log

  Config file(s): /etc/rabbitmq/rabbitmq.conf

  Starting broker...vi
BOOT FAILED
===========

Error description:
    init:do_boot/3
    init:start_em/1
    rabbit:start_it/1 line 484
    rabbit:broker_start/1 line 360
    rabbit:start_loaded_apps/2 line 613
    app_utils:manage_applications/6 line 126
    lists:foldl/3 line 1263
    rabbit:'-handle_app_error/1-fun-0-'/3 line 736
throw:{could_not_start,rabbitmq_prometheus,
       {rabbitmq_prometheus,
        {bad_return,
         {{rabbit_prometheus_app,start,[normal,[]]},
          {'EXIT',
           {{could_not_start_listener,
             [{port,15692},{protocol,'http/prometheus'}],
             {shutdown,
              {failed_to_start_child,ranch_acceptors_sup,
               {listen_error,rabbit_web_dispatch_sup_15692,eaddrinuse}}}},
            {gen_server,call,
             [rabbit_web_dispatch_registry,
              {add,rabbitmq_prometheus_tcp,
               [{port,15692},{protocol,'http/prometheus'}],
               #Fun<rabbit_web_dispatch.0.73002970>,
               [{'_',[],
                 [{[<<"metrics">>],[],rabbit_prometheus_handler,[]},
                  {[<<"metrics">>,registry],
                   [],rabbit_prometheus_handler,[]}]}],
               {[],"RabbitMQ Prometheus"}},
              infinity]}}}}}}}
Log file(s) (may contain more information):
   /var/log/kolla/outward_rabbitmq/rabbit@k-node2.log
   /var/log/kolla/outward_rabbitmq/rabbit@k-node2_upgrade.log

{"init terminating in do_boot",{could_not_start,rabbitmq_prometheus,{rabbitmq_prometheus,{bad_return,{{rabbit_prometheus_app,start,[normal,[]]},{'EXIT',{{could_not_start_listener,[{port,15692},{protocol,'http/prometheus'}],{shutdown,{failed_to_start_child,ranch_acceptors_sup,{listen_error,rabbit_web_dispatch_sup_15692,eaddrinuse}}}},{gen_server,call,[rabbit_web_dispatch_registry,{add,rabbitmq_prometheus_tcp,[{port,15692},{protocol,'http/prometheus'}],#Fun<rabbit_web_dispatch.0.73002970>,[{'_',[],[{[<<"metrics">>],[],rabbit_prometheus_handler,[]},{[<<"metrics">>,registry],[],rabbit_prometheus_handler,[]}]}],{[],"RabbitMQ Prometheus"}},infinity]}}}}}}}}
init terminating in do_boot ({could_not_start,rabbitmq_prometheus,{rabbitmq_prometheus,{bad_return,{{_},{_}}}}})

Crash dump is being written to: /var/log/kolla/outward_rabbitmq/erl_crash.dump...done
Ensure RabbitMQ users exist fail
TASK [service-rabbitmq : nova-cell | Ensure RabbitMQ users exist] ****************************************************************************************************************************************************************************
task path: /root/kolla-ansible/ansible/roles/service-rabbitmq/tasks/main.yml:15
failed: [10.10.1.201 -> 10.10.1.201] (item={'user': 'openstack', 'vhost': '/'}) => {
    "action": "rabbitmq_user",
    "ansible_loop_var": "item",
    "attempts": 5,
    "changed": false,
    "cmd": "/usr/sbin/rabbitmqctl -q -n rabbit list_users",
    "invocation": {
        "module_args": {
            "configure_priv": ".*",
            "force": false,
            "node": "rabbit",
            "password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
            "permissions": [
                {
                    "configure_priv": ".*",
                    "read_priv": ".*",
                    "vhost": "/",
                    "write_priv": ".*"
                }
            ],
            "read_priv": ".*",
            "state": "present",
            "tags": null,
            "update_password": "always",
            "user": "openstack",
            "vhost": "/",
            "write_priv": ".*"
        }
    },
    "item": {
        "password": "fNn7OUGWBRKdFh1RmLzpf2R34ItnBMoWzD5QJjQz",
        "user": "openstack",
        "vhost": "/"
    },
    "msg": "Error: this command requires the 'rabbit' app to be running on the target node. Start it with 'rabbitmqctl start_app'.\nArguments given:\n\t-q -n rabbit list_users\n\n\u001b[1mUsage\u001b[0m\n\nrabbitmqctl [--node <node>] [--longnames] [--quiet] list_users [--no-table-headers] [--timeout <timeout>]",
    "rc": 64,
    "stderr": "Error: this command requires the 'rabbit' app to be running on the target node. Start it with 'rabbitmqctl start_app'.\nArguments given:\n\t-q -n rabbit list_users\n\n\u001b[1mUsage\u001b[0m\n\nrabbitmqctl [--node <node>] [--longnames] [--quiet] list_users [--no-table-headers] [--timeout <timeout>]\n",
    "stderr_lines": [
        "Error: this command requires the 'rabbit' app to be running on the target node. Start it with 'rabbitmqctl start_app'.",
        "Arguments given:",
        "\t-q -n rabbit list_users",
        "",
        "\u001b[1mUsage\u001b[0m",
        "",
        "rabbitmqctl [--node <node>] [--longnames] [--quiet] list_users [--no-table-headers] [--timeout <timeout>]"
    ],
    "stdout": "",
    "stdout_lines": []
}

根据下面的问题的解决经验,得出是两套rabbitmq都开起来prometheus插件所以,导致端口冲突从而无法启动rabbitmq
解决方法

ansible -i ~/multinode baremetal -m shell -a \
"docker stop rabbitmq && 
docker restart outward_rabbitmq &&
docker exec -u root -it outward_rabbitmq rabbitmq-plugins disable rabbitmq_prometheus &&
docker restart outward_rabbitmq rabbitmq &&
docker ps | grep rabbitmq"


ansible -i ~/multinode baremetal -m shell -a "docker ps | grep rabbitmq"
docker exec -u root -it rabbitmq bash
TASK [keystone : Creating admin project, user, role, service, and endpoint]一直han在那边

查看keystone日志,发现rabbitmq无法连上

2020-05-21 12:49:47.391 381 ERROR oslo.messaging._drivers.impl_rabbit [req-0a338d12-e01c-40ba-9292-6a521b784dfa - - - - -] Connection failed: [Errno 111] Connection refused (retrying in 0 seconds): ConnectionRefusedError: [Errno 111] Connection refused
2020-05-21 12:49:47.399 381 ERROR oslo.messaging._drivers.impl_rabbit [req-0a338d12-e01c-40ba-9292-6a521b784dfa - - - - -] Connection failed: [Errno 111] Connection refused (retrying in 28.0 seconds): ConnectionRefusedError: [Errno 111] Connection refused

查看rabbitmq状态

(deploy) root@k-node1:/proc/net# ansible -i ~/multinode baremetal -m shell -a "docker ps -a | grep rabbitmq "
[DEPRECATION WARNING]: The TRANSFORM_INVALID_GROUP_CHARS settings is set to allow bad characters in group names by default, this will change, but still be user configurable on deprecation.
This feature will be removed in version 2.10. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
[DEPRECATION WARNING]: Distribution Ubuntu 16.04 on host 10.10.1.201 should use /usr/bin/python3, but is using /usr/bin/python for backward compatibility with prior Ansible releases. A
future Ansible release will default to using the discovered platform python for this host. See https://docs.ansible.com/ansible/2.9/reference_appendices/interpreter_discovery.html for more
information. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
10.10.1.201 | CHANGED | rc=0 >>
e3d58381fe1c        10.10.1.201:4000/kolla/centos-source-rabbitmq:fe6fd8dc5d                            "dumb-init --single-…"   21 hours ago        Up 3 hours                                               outward_rabbitmq
02fa4f75a28d        10.10.1.201:4000/kolla/centos-source-rabbitmq:fe6fd8dc5d                            "dumb-init --single-…"   21 hours ago        Up 3 hours                                               rabbitmq
[DEPRECATION WARNING]: Distribution Ubuntu 16.04 on host 10.10.1.202 should use /usr/bin/python3, but is using /usr/bin/python for backward compatibility with prior Ansible releases. A
future Ansible release will default to using the discovered platform python for this host. See https://docs.ansible.com/ansible/2.9/reference_appendices/interpreter_discovery.html for more
information. This feature will be removed in version 2.12. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
10.10.1.202 | CHANGED | rc=0 >>
47b195c087b0        10.10.1.201:4000/kolla/centos-source-rabbitmq:fe6fd8dc5d                            "dumb-init --single-…"   20 hours ago        Up 3 hours                                          outward_rabbitmq
432e1692998f        10.10.1.201:4000/kolla/centos-source-rabbitmq:fe6fd8dc5d                            "dumb-init --single-…"   21 hours ago        Up 3 minutes                                        rabbitmq

查看rabbitmq服务是否正常

lsof -i :5672

查看2上的rabbitmq日志

vi /var/lib/docker/volumes/kolla_logs/_data/rabbitmq/log/crash.log

2020-05-21 13:57:40 =ERROR REPORT====
Mnesia('rabbit@k-node1'): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, 'rabbit@k-node2'}
2020-05-21 13:57:40 =ERROR REPORT====
Failed to start Ranch listener rabbit_web_dispatch_sup_15692 in ranch_tcp:listen([{cacerts,'...'},{key,'...'},{cert,'...'},{port,15692}]) for reason eaddrinuse (address already in use)
2020-05-21 13:57:40 =SUPERVISOR REPORT====
     Supervisor: {<0.604.0>,ranch_listener_sup}
     Context:    start_error
     Reason:     {listen_error,rabbit_web_dispatch_sup_15692,eaddrinuse}
     Offender:   [{pid,undefined},{id,ranch_acceptors_sup},{mfargs,{ranch_acceptors_sup,start_link,[rabbit_web_dispatch_sup_15692,ranch_tcp]}},{restart_type,permanent},{shutdown,infinity},{child_type,supervisor}]

2020-05-21 13:57:40 =CRASH REPORT====
  crasher:
    initial call: supervisor:ranch_acceptors_sup/1
    pid: <0.606.0>
    registered_name: []
    exception exit: {{listen_error,rabbit_web_dispatch_sup_15692,eaddrinuse},[{ranch_acceptors_sup,listen_error,5,[{file,"src/ranch_acceptors_sup.erl"},{line,66}]},{ranch_acceptors_sup,init,1,[{file,"src/ranch_acceptors_sup.erl"},{line,44}]},{supervisor,init,1,[{file,"supervisor.erl"},{line,295}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,374}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,342}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}
    ancestors: [<0.604.0>,rabbit_web_dispatch_sup,<0.597.0>]
    message_queue_len: 0
    messages: []
    links: [<0.604.0>]
    dictionary: [{logger,error_logger}]
    trap_exit: true
    status: running
    heap_size: 987
    stack_size: 27
    reductions: 1356
  neighbours:
2020-05-21 13:57:40 =ERROR REPORT====
** Generic server rabbit_web_dispatch_registry terminating
** Last message in was {add,rabbitmq_prometheus_tcp,[{port,15692},{protocol,'http/prometheus'}],#Fun<rabbit_web_dispatch.0.73002970>,[{'_',[],[{[<<"metrics">>],[],rabbit_prometheus_handler,[]},{[<<"metrics">>,registry],[],rabbit_prometheus_handler,[]}]}],{[],"RabbitMQ Prometheus"}}
** When Server state == undefined
** Reason for termination ==
** {{could_not_start_listener,[{port,15692},{protocol,'http/prometheus'}],{shutdown,{failed_to_start_child,ranch_acceptors_sup,{listen_error,rabbit_web_dispatch_sup_15692,eaddrinuse}}}},[{rabbit_web_dispatch_sup,check_error,2,[{file,"src/rabbit_web_dispatch_sup.erl"},{line,141}]},{rabbit_web_dispatch_registry,handle_call,3,[{file,"src/rabbit_web_dispatch_registry.erl"},{line,75}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,661}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,690}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}
** Client <0.603.0> stacktrace
** [{gen,do_call,4,[{file,"gen.erl"},{line,167}]},{gen_server,call,3,[{file,"gen_server.erl"},{line,219}]},{rabbit_web_dispatch,register_context_handler,5,[{file,"src/rabbit_web_dispatch.erl"},{line,35}]},{rabbit_prometheus_app,start_listener,1,[{file,"src/rabbit_prometheus_app.erl"},{line,83}]},{rabbit_prometheus_app,'-start_configured_listener/0-lc$^0/1-0-',1,[{file,"src/rabbit_prometheus_app.erl"},{line,57}]},{rabbit_prometheus_app,start,2,[{file,"src/rabbit_prometheus_app.erl"},{line,32}]},{application_master,start_it_old,4,[{file,"application_master.erl"},{line,277}]}]
2020-05-21 13:57:40 =CRASH REPORT====
  crasher:
    initial call: rabbit_web_dispatch_registry:init/1
    pid: <0.599.0>
    registered_name: rabbit_web_dispatch_registry
    exception exit: {{could_not_start_listener,[{port,15692},{protocol,'http/prometheus'}],{shutdown,{failed_to_start_child,ranch_acceptors_sup,{listen_error,rabbit_web_dispatch_sup_15692,eaddrinuse}}}},[{rabbit_web_dispatch_sup,check_error,2,[{file,"src/rabbit_web_dispatch_sup.erl"},{line,141}]},{rabbit_web_dispatch_registry,handle_call,3,[{file,"src/rabbit_web_dispatch_registry.erl"},{line,75}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,661}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,690}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}
    ancestors: [rabbit_web_dispatch_sup,<0.597.0>]
    message_queue_len: 0
    messages: []
    links: [<0.598.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 1598
    stack_size: 27
    reductions: 749
  neighbours:
2020-05-21 13:57:40 =SUPERVISOR REPORT====
     Supervisor: {local,rabbit_web_dispatch_sup}
     Context:    child_terminated
     Reason:     {could_not_start_listener,[{port,15692},{protocol,'http/prometheus'}],{shutdown,{failed_to_start_child,ranch_acceptors_sup,{listen_error,rabbit_web_dispatch_sup_15692,eaddrinuse}}}}
     Offender:   [{pid,<0.599.0>},{id,rabbit_web_dispatch_registry},{mfargs,{rabbit_web_dispatch_registry,start_link,[]}},{restart_type,transient},{shutdown,5000},{child_type,worker}]

2020-05-21 13:57:40 =CRASH REPORT====
  crasher:
    initial call: application_master:init/4
    pid: <0.602.0>
    registered_name: []
    exception exit: {{bad_return,{{rabbit_prometheus_app,start,[normal,[]]},{'EXIT',{{could_not_start_listener,[{port,15692},{protocol,'http/prometheus'}],{shutdown,{failed_to_start_child,ranch_acceptors_sup,{listen_error,rabbit_web_dispatch_sup_15692,eaddrinuse}}}},{gen_server,call,[rabbit_web_dispatch_registry,{add,rabbitmq_prometheus_tcp,[{port,15692},{protocol,'http/prometheus'}],#Fun<rabbit_web_dispatch.0.73002970>,[{'_',[],[{[<<"metrics">>],[],rabbit_prometheus_handler,[]},{[<<"metrics">>,registry],[],rabbit_prometheus_handler,[]}]}],{[],"RabbitMQ Prometheus"}},infinity]}}}}},[{application_master,init,4,[{file,"application_master.erl"},{line,138}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}
    ancestors: [<0.601.0>]
    message_queue_len: 1
    messages: [{'EXIT',<0.603.0>,normal}]
    links: [<0.601.0>,<0.44.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 376
    stack_size: 27
    reductions: 227
  neighbours:

后来查看端口15692的占用情况,发现

root@k-node1:~# lsof -i :15692
lsof: no pwd entry for UID 42439
COMMAND    PID     USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
lsof: no pwd entry for UID 42439
beam.smp 11708    42439   93u  IPv4 200265      0t0  TCP *:15692 (LISTEN)

想到两套有rabbitmq,猜想,两套都起了prometheus listener,所以导致了这一套rabbitmq的出现因为端口冲突而启动rabbitmq失败。
后来把outward_rabbitmq暂时停掉发现rabbitmq可用了。

TASK [ovs-dpdk : Install ovs-dpdkctl service and config] failed
TASK [neutron : Copying over openvswitch_agent.ini] failed
TASK [neutron : Copying over openvswitch_agent.ini] ******************************************************************************************************************************************************************************************
task path: /root/kolla-ansible/ansible/roles/neutron/tasks/config.yml:172
<10.10.1.201> ESTABLISH SSH CONNECTION FOR USER: root
<10.10.1.201> SSH: EXEC sshpass -d11 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/d5eb36b79d 10.10.1.201 '/bin/sh -c '"'"'echo ~root && sleep 0'"'"''
<10.10.1.201> (0, b'/root\n', b'')
<10.10.1.201> ESTABLISH SSH CONNECTION FOR USER: root
<10.10.1.201> SSH: EXEC sshpass -d11 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/d5eb36b79d 10.10.1.201 '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir /root/.ansible/tmp/ansible-tmp-1590113556.592888-38224-240706786989573 && echo ansible-tmp-1590113556.592888-38224-240706786989573="` echo /root/.ansible/tmp/ansible-tmp-1590113556.592888-38224-240706786989573 `" ) && sleep 0'"'"''
<10.10.1.201> (0, b'ansible-tmp-1590113556.592888-38224-240706786989573=/root/.ansible/tmp/ansible-tmp-1590113556.592888-38224-240706786989573\n', b'')
<10.10.1.201> ESTABLISH SSH CONNECTION FOR USER: root
<10.10.1.201> SSH: EXEC sshpass -d11 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o 'User="root"' -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/d5eb36b79d 10.10.1.201 '/bin/sh -c '"'"'rm -f -r /root/.ansible/tmp/ansible-tmp-1590113556.592888-38224-240706786989573/ > /dev/null 2>&1 && sleep 0'"'"''
<10.10.1.201> (0, b'', b'')
fatal: [10.10.1.201]: FAILED! => {
    "msg": "An unhandled exception occurred while templating '{{ 'tunnel' | kolla_address }}'. Error was a <class 'kolla_ansible.exception.FilterError'>, original message: Interface 'dpdk_bridge' not present on host '10.10.1.201'"
}

原因可能是TASK [ovs-dpdk : Binds the interface to the target driver specifed in the config]虽然跑过来,但是这个task的output却有问题:

changed: [10.10.1.202] => {
    "changed": true,
    "cmd": [
        "/etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh",
        "bind_nics"
    ],
    "delta": "0:00:00.094136",
    "end": "2020-05-28 15:33:07.105387",
    "invocation": {
        "module_args": {
            "_raw_params": "/etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh bind_nics",
            "_uses_shell": false,
            "argv": null,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "stdin_add_newline": true,
            "strip_empty_ends": true,
            "warn": true
        }
    },
    "rc": 0,
    "start": "2020-05-28 15:33:07.011251",
    "stderr": "++ realpath /etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh\n+ FULL_PATH=/etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh\n+ CONFIG_FILE=/etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf\n+ SERVICE_FILE=/etc/systemd/system/ovs-dpdkctl.service\n+ BRIDGE_SERVICE_FILE=/etc/systemd/system/ovs-dpdk-bridge.service\n+ '[' 1 -ge 1 ']'\n+ func=bind_nics\n+ shift\n+ eval 'bind_nics '\n++ bind_nics\n+++ list_dpdk_nics\n++++ get_value ovs port_mappings\n++++ crudini --get /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ovs port_mappings\n++++ tr , '\\n'\n++++ cut -d : -f 1\n+++ for nic in '$(get_value ovs port_mappings | tr '\\'','\\'' '\\''\\n'\\'' | cut -d : -f 1)'\n+++ echo ens34\n++ for nic in '$(list_dpdk_nics)'\n+++ get_value ens34 address\n+++ crudini --get /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ens34 address\n++ device_address=0000:02:02.0\n+++ get_driver_by_address 0000:02:02.0\n+++ ls /sys/bus/pci/devices/0000:02:02.0/driver -al\n+++ awk '{n=split($NF,a,\"/\"); print a[n]}'\nls: cannot access '/sys/bus/pci/devices/0000:02:02.0/driver': No such file or directory\n++ current_driver=\n+++ get_value ens34 driver\n+++ crudini --get /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ens34 driver\n++ target_driver=uio_pci_generic\n++ '[' '' '!=' uio_pci_generic ']'\n++ set_value ens34 old_driver\n++ crudini --set /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ens34 old_driver ''\n++ unbind_nic 0000:02:02.0\n++ echo 0000:02:02.0\n/etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh: line 106: /sys/bus/pci/drivers//unbind: Permission denied\n++ echo\n++ bind_nic 0000:02:02.0 uio_pci_generic\n++ echo uio_pci_generic\n++ echo 0000:02:02.0\n/etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh: line 102: echo: write error: No such device\n+ set +o xtrace",
    "stderr_lines": [
        "++ realpath /etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh",
        "+ FULL_PATH=/etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh",
        "+ CONFIG_FILE=/etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf",
        "+ SERVICE_FILE=/etc/systemd/system/ovs-dpdkctl.service",
        "+ BRIDGE_SERVICE_FILE=/etc/systemd/system/ovs-dpdk-bridge.service",
        "+ '[' 1 -ge 1 ']'",
        "+ func=bind_nics",
        "+ shift",
        "+ eval 'bind_nics '",
        "++ bind_nics",
        "+++ list_dpdk_nics",
        "++++ get_value ovs port_mappings",
        "++++ crudini --get /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ovs port_mappings",
        "++++ tr , '\\n'",
        "++++ cut -d : -f 1",
        "+++ for nic in '$(get_value ovs port_mappings | tr '\\'','\\'' '\\''\\n'\\'' | cut -d : -f 1)'",
        "+++ echo ens34",
        "++ for nic in '$(list_dpdk_nics)'",
        "+++ get_value ens34 address",
        "+++ crudini --get /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ens34 address",
        "++ device_address=0000:02:02.0",
        "+++ get_driver_by_address 0000:02:02.0",
        "+++ ls /sys/bus/pci/devices/0000:02:02.0/driver -al",
        "+++ awk '{n=split($NF,a,\"/\"); print a[n]}'",
        "ls: cannot access '/sys/bus/pci/devices/0000:02:02.0/driver': No such file or directory",
        "++ current_driver=",
        "+++ get_value ens34 driver",
        "+++ crudini --get /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ens34 driver",
        "++ target_driver=uio_pci_generic",
        "++ '[' '' '!=' uio_pci_generic ']'",
        "++ set_value ens34 old_driver",
        "++ crudini --set /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ens34 old_driver ''",
        "++ unbind_nic 0000:02:02.0",
        "++ echo 0000:02:02.0",
        "/etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh: line 106: /sys/bus/pci/drivers//unbind: Permission denied",
        "++ echo",
        "++ bind_nic 0000:02:02.0 uio_pci_generic",
        "++ echo uio_pci_generic",
        "++ echo 0000:02:02.0",
        "/etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh: line 102: echo: write error: No such device",
        "+ set +o xtrace"
    ],
    "stdout": "",
    "stdout_lines": []
}

根据这边/etc/kolla/ovsdpdk-db/ovs-dpdkctl.sh: line 102:的代码
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UEsxZB89-1596375800706)(:storage/865ae890-c0b3-4375-b2f1-2fd0051c29d3/a11a8ed3.png)]

实际上做的就是

(deploy) root@k-node1:~# crudini --get /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ovs port_mappings | tr ',' '\n' | cut -d : -f 1
ens34
(deploy) root@k-node1:~# crudini --get /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ens34 address
0000:02:02.0
(deploy) root@k-node1:~# ls /sys/bus/pci/devices/0000\:02\:01.0/driver -al | awk '{n=split($NF,a,"/"); print a[n]}'
e1000
(deploy) root@k-node1:~# crudini --get /etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf ens34 old_driver
e1000

$device_address=0000:02:02.0
$current_driver=e1000
$target_driver=uio_pci_generic

echo 0000:02:02.0 > /sys/bus/pci/drivers/e1000/unbind
echo > /sys/bus/pci/devices/0000:02:02.0/driver_override



 /sys/bus/pci/drivers/e1000/bind
/sys/bus/pci/drivers/e1000/unbind

/sys/bus/pci/devices/0000:02:02.0/driver_override
/sys/bus/pci/devices/0000:02:02.0/driver not fount

配置文件/etc/kolla/ovsdpdk-db/ovs-dpdkctl.conf内容如下

[ovs]
bridge_mappings = physnet1:dpdk_bridge
port_mappings = ens34:dpdk_bridge
cidr_mappings = dpdk_bridge:192.168.115.202/24
ovs_coremask = 0x1
pmd_coremask = 0x2
ovs_mem_channels = 4
ovs_socket_mem = 1024
dpdk_interface_driver = uio_pci_generic
hugepage_mountpoint = /dev/hugepages
physical_port_policy = named
pci_whitelist = -w 0000:02:02.0

[ens33]
address = 0000:02:01.0
driver =e1000

[ens34]
address = 0000:02:02.0
driver =uio_pci_generic
old_driver =

跑taskRUNNING HANDLER [ovs-dpdk : Ensuring ovsdpdk bridges are properly setup named]出错了但是没退出

ok: [10.10.1.202] => {
    "changed": false,
    "cmd": [
        "docker",
        "exec",
        "ovsdpdk_db",
        "/bin/sh",
        "-c",
        "CONFIG_FILE=/var/lib/kolla/config_files/ovs-dpdkctl.conf /var/lib/kolla/config_files/ovs-dpdkctl.sh init"
    ],
    "delta": "0:00:00.383534",
    "end": "2020-05-28 18:33:38.817271",
    "invocation": {
        "module_args": {
            "_raw_params": "docker exec ovsdpdk_db /bin/sh -c 'CONFIG_FILE=/var/lib/kolla/config_files/ovs-dpdkctl.conf /var/lib/kolla/config_files/ovs-dpdkctl.sh init'\n",
            "_uses_shell": false,
            "argv": null,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "stdin_add_newline": true,
            "strip_empty_ends": true,
            "warn": true
        }
    },
    "rc": 0,
    "start": "2020-05-28 18:33:38.433737",
    "stderr": "++ realpath /var/lib/kolla/config_files/ovs-dpdkctl.sh\n+ FULL_PATH=/var/lib/kolla/config_files/ovs-dpdkctl.sh\n+ CONFIG_FILE=/var/lib/kolla/config_files/ovs-dpdkctl.conf\n+ SERVICE_FILE=/etc/systemd/system/ovs-dpdkctl.service\n+ BRIDGE_SERVICE_FILE=/etc/systemd/system/ovs-dpdk-bridge.service\n+ '[' 1 -ge 1 ']'\n+ func=init\n+ shift\n+ eval 'init '\n++ init\n++ init_ovs_db\n++ ovs-vsctl init\n+++ get_value ovs pmd_coremask\n+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs pmd_coremask\n/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found\n+++ get_value ovs ovs_coremask\n+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs ovs_coremask\n/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found\n+++ get_value ovs ovs_mem_channels\n+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs ovs_mem_channels\n/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found\n+++ get_value ovs ovs_socket_mem\n+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs ovs_socket_mem\n/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found\n+++ get_value ovs hugepage_mountpoint\n+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs hugepage_mountpoint\n/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found\n+++ get_value ovs pci_whitelist\n+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs pci_whitelist\n/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found\n++ ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask= other_config:dpdk-init=True other_config:dpdk-lcore-mask= other_config:dpdk-mem-channels= other_config:dpdk-socket-mem= other_config:dpdk-hugepage-dir= 'other_config:dpdk-extra= --proc-type primary  '\novs-vsctl: other_config:pmd-cpu-mask=: argument does not end in \"=\" followed by a value.\n++ init_ovs_bridges\n+++ get_value ovs bridge_mappings\n+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs bridge_mappings\n/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found\n++ raw_bridge_mappings=\n++ bridge_mappings=(${raw_bridge_mappings//,/ })\n++ init_ovs_interfaces\n++ pci_port_pairs=\n+++ list_dpdk_nics\n++++ get_value ovs port_mappings\n++++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs port_mappings\n++++ cut -d : -f 1\n++++ tr , '\\n'\n/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found\n+++ echo\n+++ sort\n++ pci_port_pairs=\n++ dpdk_port_number=0\n+ set +o xtrace",
    "stderr_lines": [
        "++ realpath /var/lib/kolla/config_files/ovs-dpdkctl.sh",
        "+ FULL_PATH=/var/lib/kolla/config_files/ovs-dpdkctl.sh",
        "+ CONFIG_FILE=/var/lib/kolla/config_files/ovs-dpdkctl.conf",
        "+ SERVICE_FILE=/etc/systemd/system/ovs-dpdkctl.service",
        "+ BRIDGE_SERVICE_FILE=/etc/systemd/system/ovs-dpdk-bridge.service",
        "+ '[' 1 -ge 1 ']'",
        "+ func=init",
        "+ shift",
        "+ eval 'init '",
        "++ init",
        "++ init_ovs_db",
        "++ ovs-vsctl init",
        "+++ get_value ovs pmd_coremask",
        "+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs pmd_coremask",
        "/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found",
        "+++ get_value ovs ovs_coremask",
        "+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs ovs_coremask",
        "/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found",
        "+++ get_value ovs ovs_mem_channels",
        "+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs ovs_mem_channels",
        "/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found",
        "+++ get_value ovs ovs_socket_mem",
        "+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs ovs_socket_mem",
        "/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found",
        "+++ get_value ovs hugepage_mountpoint",
        "+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs hugepage_mountpoint",
        "/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found",
        "+++ get_value ovs pci_whitelist",
        "+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs pci_whitelist",
        "/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found",
        "++ ovs-vsctl --no-wait set Open_vSwitch . other_config:pmd-cpu-mask= other_config:dpdk-init=True other_config:dpdk-lcore-mask= other_config:dpdk-mem-channels= other_config:dpdk-socket-mem= other_config:dpdk-hugepage-dir= 'other_config:dpdk-extra= --proc-type primary  '",
        "ovs-vsctl: other_config:pmd-cpu-mask=: argument does not end in \"=\" followed by a value.",
        "++ init_ovs_bridges",
        "+++ get_value ovs bridge_mappings",
        "+++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs bridge_mappings",
        "/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found",
        "++ raw_bridge_mappings=",
        "++ bridge_mappings=(${raw_bridge_mappings//,/ })",
        "++ init_ovs_interfaces",
        "++ pci_port_pairs=",
        "+++ list_dpdk_nics",
        "++++ get_value ovs port_mappings",
        "++++ crudini --get /var/lib/kolla/config_files/ovs-dpdkctl.conf ovs port_mappings",
        "++++ cut -d : -f 1",
        "++++ tr , '\\n'",
        "/var/lib/kolla/config_files/ovs-dpdkctl.sh: line 14: crudini: command not found",
        "+++ echo",
        "+++ sort",
        "++ pci_port_pairs=",
        "++ dpdk_port_number=0",
        "+ set +o xtrace"
    ],
    "stdout": "",
    "stdout_lines": []
}

安装crudini

docker exec -u root -it ovsdpdk_db bash

curl http://archive.ubuntu.com/ubuntu/pool/universe/c/crudini/crudini_0.7-1_amd64.deb -o crudini_0.7-1_amd64.deb
curl http://archive.ubuntu.com/ubuntu/pool/universe/p/python-iniparse/python-iniparse_0.4-2.2_all.deb -o python-iniparse_0.4-2.2_all.deb
curl http://archive.ubuntu.com/ubuntu/pool/main/s/six/python-six_1.11.0-2_all.deb -o python-six_1.11.0-2_all.deb
dpkg -i python-six_1.11.0-2_all.deb
dpkg -i python-iniparse_0.4-2.2_all.deb
dpkg -i crudini_0.7-1_amd64.deb

我在k-node1上检查了我所有的Ethernet controller

for item in $(lspci | grep Ethernet | awk '{print $1}'); \
do echo $item; \
ls /sys/bus/pci/devices/0000:$item; done


02:01.0
acpi_index            config                    device         driver_override  irq            local_cpus  net        remove  resource   resource4  subsystem_device  vendor
broken_parity_status  consistent_dma_mask_bits  dma_mask_bits  enable           label          modalias    numa_node  rescan  resource0  rom        subsystem_vendor
class                 d3cold_allowed            driver         firmware_node    local_cpulist  msi_bus     power      reset   resource2  subsystem  uevent
02:02.0
acpi_index            config                    device           enable         label          modalias   power   reset      resource2  subsystem         uevent
broken_parity_status  consistent_dma_mask_bits  dma_mask_bits    firmware_node  local_cpulist  msi_bus    remove  resource   resource4  subsystem_device  vendor
class                 d3cold_allowed            driver_override  irq            local_cpus     numa_node  rescan  resource0  rom        subsystem_vendor

发现一张网卡上没有driver文件

在k-node2上检查所有的Ethernet controller

root@k-node2:~# for item in $(lspci | grep Ethernet | awk '{print $1}'); \
> do echo $item; \
> ls /sys/bus/pci/devices/0000:$item; done
02:01.0
acpi_index            config                    device         driver_override  irq            local_cpus  net        remove  resource   resource4  subsystem_device  vendor
broken_parity_status  consistent_dma_mask_bits  dma_mask_bits  enable           label          modalias    numa_node  rescan  resource0  rom        subsystem_vendor
class                 d3cold_allowed            driver         firmware_node    local_cpulist  msi_bus     power      reset   resource2  subsystem  uevent
02:02.0
acpi_index            config                    device         driver_override  irq            local_cpus  net        remove  resource   resource4  subsystem_device  vendor
broken_parity_status  consistent_dma_mask_bits  dma_mask_bits  enable           label          modalias    numa_node  rescan  resource0  rom        subsystem_vendor
class                 d3cold_allowed            driver         firmware_node    local_cpulist  msi_bus     power      reset   resource2  subsystem  uevent

发现k-node2上所有的网卡都是有driver文件的。

查了资料linux - writing in /sys/bus/pci/… fails - Stack Overflow 查看k-node1上有问题的网卡

(deploy) root@k-node1:~/kolla-ansible# lspci -v -s 0000:02:02.0
02:02.0 Ethernet controller: Intel Corporation 82545EM Gigabit Ethernet Controller (Copper) (rev 01)
	DeviceName: Ethernet1
	Subsystem: VMware PRO/1000 MT Single Port Adapter
	Physical Slot: 34
	Flags: 66MHz, medium devsel, IRQ 16
	Memory at fc020000 (64-bit, non-prefetchable) [size=128K]
	Memory at fc050000 (64-bit, non-prefetchable) [size=64K]
	I/O ports at 1040 [size=64]
	Expansion ROM at fc080000 [disabled] [size=64K]
	Capabilities: [dc] Power Management version 2
	Capabilities: [e4] PCI-X non-bridge device
	Kernel modules: e1000

查看k-node2上对应没有问题的网卡

root@k-node2:~# lspci -v -s 0000:02:02.0
02:02.0 Ethernet controller: Intel Corporation 82545EM Gigabit Ethernet Controller (Copper) (rev 01)
	DeviceName: Ethernet1
	Subsystem: VMware PRO/1000 MT Single Port Adapter
	Physical Slot: 34
	Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 16
	Memory at fc020000 (64-bit, non-prefetchable) [size=128K]
	Memory at fc050000 (64-bit, non-prefetchable) [size=64K]
	I/O ports at 1040 [size=64]
	Expansion ROM at fc080000 [disabled] [size=64K]
	Capabilities: [dc] Power Management version 2
	Capabilities: [e4] PCI-X non-bridge device
	Kernel driver in use: e1000
	Kernel modules: e1000
lsmod | grep uio_pci_generic
muranoThe Keystone service is temporarily unavailable. (HTTP 503)
docker exec murano_api murano \
--os-username admin \
--os-password \
uxlNttg9tklN88M5FH7SaWNq6UXZ5sBOIsyHZnh0 \
--os-project-name admin \
--os-auth-url https://10.10.1.205:35357 \
--os-cacert /etc/pki/ca-trust/source/anchors/kolla-customca-haproxy-internal.crt \
--murano-url https://10.10.1.205:8082 package-list

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-XS8s4DTy-1596375800707)(:storage/865ae890-c0b3-4375-b2f1-2fd0051c29d3/e04bfedd.png)]
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4g1bcEkv-1596375800709)(:storage/865ae890-c0b3-4375-b2f1-2fd0051c29d3/1b61261d.png)]

ansible -i multinode all  -m shell -a "docker volume ls"


ansible -i /root/multinode control  -m shell -a "docker ps | grep mariadb"

ansible -i /root/multinode control  -m shell \
-a "cat /etc/kolla/mariadb/galera.cnf"


ansible -i /root/multinode control  -m shell \
-a "cat /etc/kolla/mariadb/config.json"
task - name: Wait for Monasca Grafana to load failed
fatal: [10.10.1.201]: FAILED! => {
    "action": "uri",
    "attempts": 10,
    "cache_control": "no-cache",
    "changed": false,
    "connection": "close",
    "content": "<html><body><h1>503 Service Unavailable</h1>\nNo server is available to handle this request.\n</body></html>\n",
    "content_type": "text/html",
    "elapsed": 0,
    "invocation": {
        "module_args": {
            "attributes": null,
            "backup": null,
            "body": null,
            "body_format": "raw",
            "client_cert": null,
            "client_key": null,
            "content": null,
            "creates": null,
            "delimiter": null,
            "dest": null,
            "directory_mode": null,
            "follow": false,
            "follow_redirects": "safe",
            "force": false,
            "force_basic_auth": false,
            "group": null,
            "headers": {},
            "http_agent": "ansible-httpget",
            "method": "GET",
            "mode": null,
            "owner": null,
            "regexp": null,
            "remote_src": null,
            "removes": null,
            "return_content": false,
            "selevel": null,
            "serole": null,
            "setype": null,
            "seuser": null,
            "src": null,
            "status_code": [
                "200"
            ],
            "timeout": 30,
            "unix_socket": null,
            "unsafe_writes": null,
            "url": "https://10.10.1.205:3001/login",
            "url_password": null,
            "url_username": null,
            "use_proxy": true,
            "validate_certs": false
        }
    },
    "msg": "Status code was 503 and not [200]: HTTP Error 503: Service Unavailable",
    "redirected": false,
    "status": 503,
    "url": "https://10.10.1.205:3001/login"
}
2020-06-04 07:00:44 CST | ERROR | forwarder | tornado.application(ioloop.py:909) | Exception in callback <bound method Forwarder.flush of <monasca_agent.forwarder.daemon.Forwarder object at 0x7f4642bbda90>>
Traceback (most recent call last):
  File "/var/lib/kolla/venv/lib64/python3.6/site-packages/tornado/ioloop.py", line 907, in _run
    return self.callback()
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/monasca_agent/forwarder/daemon.py", line 168, in flush
    self._post_metrics()
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/monasca_agent/forwarder/daemon.py", line 159, in _post_metrics
    self._endpoint.post_metrics(message_batch)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/monasca_agent/forwarder/api/monasca_api.py", line 136, in post_metrics
    self._post(tenant_group[tenant], tenant)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/monasca_agent/forwarder/api/monasca_api.py", line 85, in _post
    self._mon_client = self._get_mon_client()
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/monasca_agent/forwarder/api/monasca_api.py", line 140, in _get_mon_client
    endpoint = k.get_monasca_url()
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/monasca_agent/common/keystone.py", line 309, in get_monasca_url
    catalog = self._init_client().auth_ref.service_catalog
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/monasca_agent/common/keystone.py", line 276, in _init_client
    ks = get_client(**self._config)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/monasca_agent/common/keystone.py", line 191, in get_client
    disc = discover.Discover(session=sess)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneclient/discover.py", line 178, in __init__
    authenticated=authenticated)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneclient/_discover.py", line 143, in __init__
    authenticated=authenticated)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneclient/_discover.py", line 38, in get_version_data
    resp = session.get(url, headers=headers, authenticated=authenticated)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py", line 1123, in get
    return self.request(url, 'GET', **kwargs)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py", line 772, in request
    auth_headers = self.get_auth_headers(auth)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py", line 1183, in get_auth_headers
    return auth.get_headers(self, **kwargs)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/plugin.py", line 95, in get_headers
    token = self.get_token(session)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 88, in get_token
    return self.get_access(session).auth_token
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 134, in get_access
    self.auth_ref = self.get_auth_ref(session)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py", line 206, in get_auth_ref
    self._plugin = self._do_create_plugin(session)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py", line 138, in _do_create_plugin
    authenticated=False)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 610, in get_discovery
    authenticated=authenticated)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/discover.py", line 1452, in get_discovery
"/var/lib/docker/volumes/kolla_logs/_data/monasca/agent-forwarder.log" 4029234L, 322958739C                   1,1           Top
    ks = get_client(**self._config)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/monasca_agent/common/keystone.py", line 191, in get_client
    disc = discover.Discover(session=sess)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneclient/discover.py", line 178, in __init__
    authenticated=authenticated)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneclient/_discover.py", line 143, in __init__
    authenticated=authenticated)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneclient/_discover.py", line 38, in get_version_data
    resp = session.get(url, headers=headers, authenticated=authenticated)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py", line 1123, in get
    return self.request(url, 'GET', **kwargs)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py", line 772, in request
    auth_headers = self.get_auth_headers(auth)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py", line 1183, in get_auth_headers
    return auth.get_headers(self, **kwargs)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/plugin.py", line 95, in get_headers
    token = self.get_token(session)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 88, in get_token
    return self.get_access(session).auth_token
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 134, in get_access
    self.auth_ref = self.get_auth_ref(session)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py", line 206, in get_auth_ref
    self._plugin = self._do_create_plugin(session)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/generic/base.py", line 138, in _do_create_plugin
    authenticated=False)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/identity/base.py", line 610, in get_discovery
    authenticated=authenticated)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/discover.py", line 1452, in get_discovery
    disc = Discover(session, url, authenticated=authenticated)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/discover.py", line 536, in __init__
    authenticated=authenticated)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/discover.py", line 102, in get_version_data
    resp = session.get(url, headers=headers, authenticated=authenticated)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py", line 1123, in get
    return self.request(url, 'GET', **kwargs)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py", line 913, in request
    resp = send(**kwargs)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/keystoneauth1/session.py", line 1004, in _send_request
    resp = self.session.request(method, url, **kwargs)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/requests/adapters.py", line 416, in send
    self.cert_verify(conn, request.url, verify, cert)
  File "/var/lib/kolla/venv/lib/python3.6/site-packages/requests/adapters.py", line 228, in cert_verify
    "invalid path: {}".format(cert_loc))
OSError: Could not find a suitable TLS CA certificate bundle, invalid path: /var/lib/kolla/venv/lib/python' ~ distro_python_version ~ '/site-packages/certifi/cacert.pem
task [monasca : List influxdb databases] failed
fatal: [10.10.1.202 -> 10.10.1.202]: FAILED! => {
    "changed": false,
    "cmd": [
        "docker",
        "exec",
        "influxdb",
        "influx",
        "-host",
        "10.10.1.205",
        "-port",
        "8086",
        "-execute",
        "show databases"
    ],
    "delta": "0:00:00.300792",
    "end": "2020-06-06 09:28:26.936263",
    "invocation": {
        "module_args": {
            "_raw_params": "docker exec influxdb influx -host 10.10.1.205 -port 8086 -execute 'show databases'",
            "_uses_shell": false,
            "argv": null,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "stdin": null,
            "stdin_add_newline": true,
            "strip_empty_ends": true,
            "warn": true
        }
    },
    "msg": "non-zero return code",
    "rc": 1,
    "start": "2020-06-06 09:28:26.635471",
    "stderr": "Failed to connect to http://10.10.1.205:8086: Get http://10.10.1.205:8086/ping: EOF\nPlease check your connection settings and ensure 'influxd' is running.",
    "stderr_lines": [
        "Failed to connect to http://10.10.1.205:8086: Get http://10.10.1.205:8086/ping: EOF",
        "Please check your connection settings and ensure 'influxd' is running."
    ],
    "stdout": "",
    "stdout_lines": []
}
NO MORE HOSTS LEFT *************************************************************************************************************

PLAY RECAP *********************************************************************************************************************
10.10.1.201                : ok=46   changed=0    unreachable=0    failed=0    skipped=27   rescued=0    ignored=0
10.10.1.202                : ok=77   changed=2    unreachable=0    failed=1    skipped=12   rescued=0    ignored=0
localhost                  : ok=4    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

手动在节点2上执行了一把

root@k-node2:~# docker exec influxdb influx -host 10.10.1.205 -port 8086 -execute 'show databases'
Failed to connect to http://10.10.1.205:8086: Get http://10.10.1.205:8086/ping: EOF
Please check your connection settings and ensure 'influxd' is running.

查看启动文件/etc/kolla/influxdb/config.json 发现启动命令是

"command": "/usr/bin/influxd -config /etc/influxdb/influxdb.conf",

手动起一下服务,command改成下面的

"command": "sleep infinity",

lsof 是ok的

后来发现是happroxy那边挂了ssl所以调用时要加ssl

docker exec influxdb influx -host 10.10.1.205 -port 8086 -unsafeSsl -ssl  -execute 'show databases'
TASK [monasca : Enable Monasca Grafana datasource for control plane organisation] failed
fatal: [10.10.1.202]: FAILED! => {
    "msg": "The conditional check 'monasca_grafana_datasource_response.status not in [200, 409] or (monasca_grafana_datasource_response.status == 409 and (\"Data source with same name already exists\" not in  monasca_grafana_datasource_response.json.message|default(\"\"))' failed. The error was: template error while templating string: unexpected '}', expected ')'. String: {% if monasca_grafana_datasource_response.status not in [200, 409] or (monasca_grafana_datasource_response.status == 409 and (\"Data source with same name already exists\" not in  monasca_grafana_datasource_response.json.message|default(\"\")) %} True {% else %} False {% endif %}"
}

通过debug打印出来的数据如下
实际上的数据如下

{
    "changed": false,
    "msg": "All items completed",
    "results": [
        {
            "action": "uri",
            "ansible_loop_var": "item",
            "changed": false,
            "connection": "close",
            "content_length": "55",
            "content_type": "application/json; charset=UTF-8",
            "date": "Thu, 11 Jun 2020 14:27:34 GMT",
            "elapsed": 0,
            "failed": false,
            "invocation": {
                "module_args": {
                    "attributes": null,
                    "backup": null,
                    "body": "{\"name\": \"Monasca API\", \"type\": \"monasca-datasource\", \"access\": \"proxy\", \"url\": \"https://10.10.1.205:8070\", \"isDefault\": true, \"basicAuth\": false, \"jsonData\": {\"keystoneAuth\": true}}",
                    "body_format": "json",
                    "client_cert": null,
                    "client_key": null,
                    "content": null,
                    "creates": null,
                    "delimiter": null,
                    "dest": null,
                    "directory_mode": null,
                    "follow": false,
                    "follow_redirects": "safe",
                    "force": false,
                    "force_basic_auth": true,
                    "group": null,
                    "headers": {
                        "Content-Type": "application/json"
                    },
                    "http_agent": "ansible-httpget",
                    "method": "POST",
                    "mode": null,
                    "owner": null,
                    "password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
                    "regexp": null,
                    "remote_src": null,
                    "removes": null,
                    "return_content": false,
                    "selevel": null,
                    "serole": null,
                    "setype": null,
                    "seuser": null,
                    "src": null,
                    "status_code": [
                        "200",
                        " 409"
                    ],
                    "timeout": 30,
                    "unix_socket": null,
                    "unsafe_writes": null,
                    "url": "https://10.10.1.205:3001/api/datasources",
                    "url_password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
                    "url_username": "grafana_local_admin",
                    "use_proxy": true,
                    "user": "grafana_local_admin",
                    "validate_certs": false
                }
            },
            "item": {
                "key": "monasca",
                "value": {
                    "data": {
                        "access": "proxy",
                        "basicAuth": false,
                        "isDefault": true,
                        "jsonData": {
                            "keystoneAuth": true
                        },
                        "name": "Monasca API",
                        "type": "monasca-datasource",
                        "url": "https://10.10.1.205:8070"
                    },
                    "enabled": true
                }
            },
            "json": {
                "message": "Data source with same name already exists"
            },
            "msg": "HTTP Error 409: Conflict",
            "redirected": false,
            "status": 409,
            "url": "https://10.10.1.205:3001/api/datasources"
        }
    ]
}
curl -X POST -u grafana_local_admin:XxrkRhDOHGJZnYAiXnnHt8buvKy9i3e5o5EZFSUP \
-d   \
https://10.10.1.205:3001/api/datasources

检查各节点性能

ansible -i ~/multinode chrony -m shell -a "uptime echo\n; free -h;"

调整

ansible -i ~/multinode chrony -m shell -a "shutdown 1"
补丁
git fetch ssh://XiaojueGuan@review.opendev.org:29418/openstack/kolla-ansible refs/changes/60/724460/8 && git cherry-pick FETCH_HEAD

git fetch ssh://XiaojueGuan@review.opendev.org:29418/openstack/kolla-ansible refs/changes/17/724217/6 && git cherry-pick FETCH_HEAD


git fetch ssh://XiaojueGuan@review.opendev.org:29418/openstack/kolla-ansible refs/changes/89/726289/2 && git cherry-pick FETCH_HEAD

git fetch ssh://XiaojueGuan@review.opendev.org:29418/openstack/kolla-ansible refs/changes/38/727638/3 && git cherry-pick FETCH_HEAD