文档功能说明

    文档通过ansible+shell+consul的方式实现批量下发安装Linux操作系统监控的node_exporter软件,自动加载node_exporter软件到系统开机启动中并通过consul注册的功能。为部署prometheus系统做好前期准备。

适用范围

    文档试用于centos、redhat系列操作系统。由于文档使用了ansible对主机操作,被管理端linux需要有python2.6以上的环境才能使用,对于centos和redhat系统默认只能控制6及以上的系统,6以下的操作系统需要单独的升级python版本。所有被管主机上请提前安装好wget工具。
    注:
    对于已有业务承载的6以下操作系统不建议升级python包来实现,该部分机器请手动下发软件及脚本。

环境准备

    ansible服务器一台        批量操作控制主机节点  

    httpd服务器一台     存放soft文件和脚本文件   

    consul服务器一台         实现自动注册node_exporter到资源中

    ansible和consul可以部署到一台设备上,也可以分开部署。文档中不涉及到ansible及consul服务的安装部署操作,如有需求请查看作者其他部署笔记或自己百度。

ansible 使用sudo看文件 ansible consul_vim

步骤

    1.在ansible中添加被管控节点列表

    2.编写脚本下发软件及脚本至服务器

    3.实现node_exporter注册到consul中

    4.检查查看consul注册情况

step1:配置ansible服务及ansible控制清单

文章不对ansible安装进行介绍,在ansible安装完成的前提下进行配置。进行配置前需要保证环境中以完成ansible软件安装。

配置首次通过ssh协议登录被控主机时不用敲yes。

修改/etc/ansible/ansible.cfg文件中的host_key_checking = False

# vim /etc/ansible/ansible.cfg

# additional paths to search for roles in, colon separated
#roles_path = /etc/ansible/roles

# uncomment this to disable SSH key host checking
host_key_checking = False

# change the default callback, you can only have one 'stdout' type enabled at a time.

 

修改/etc/ssh/ssh_config中StrictHostKeyChecking ask选项为StrictHostKeyChecking no

# vim /etc/ssh/ssh_config

#   CheckHostIP yes
#   AddressFamily any
#   ConnectTimeout 0
StrictHostKeyChecking no
#   IdentityFile ~/.ssh/id_rsa
#   IdentityFile ~/.ssh/id_dsa

 

编辑/etc/ansible/hosts文件添加需要控制的主机清单,文档使用明文密码添加内容。

# vim /etc/ansible/hosts

[node_exporter_group]
node_1   ansible_ssh_host=192.168.111.12   ansible_ssh_user=root    ansible_ssh_pass=yinwan
node_2   ansible_ssh_host=192.168.111.124   ansible_ssh_user=root    ansible_ssh_pass=yinwan

配置说明:

           ansible_ssh_host: 被控主机ip地址或域名

           ansible_ssh_user: 被控主机可访问用户

           ansible_ssh_pass: 对应ansible_ssh_user用户的密码

 验证ansible服务配置成功,使用ansible调用命令查看被控主机主机名,如果正常能够显示出主机名。

 查看ansible中一共有多少个被控主机

[root@prometheus ~]# ansible all --list 
  hosts (2):
    node_1
    node_2
[root@prometheus ~]#

  

 

查看被控主机的主机名,能够正常查看主机名表示主机能够被ansible服务控制,配置添加成功。

[root@prometheus ~]# ansible all -m shell -a 'hostname'
node_2 | CHANGED | rc=0 >>
docker_0001
node_1 | CHANGED | rc=0 >>
ole6
[root@prometheus ~]#

  

 

 

step2:编写脚本下发软件及脚本至服务器

实验使用http放置node_exporter软件和各种脚本,客户主机通过wget可从http服务器获取到资源。在进行软件及脚本下发前需要读者自定搭建一个http服务,作者直接使用yum命令安装了一个httpd服务,把软件及脚本放置/var/www/html/soft/文件夹中。

脚本说明:

存放地址:/var/www/html/soft/

[root@prometheus ~]# ls -l /var/www/html/soft/
total 18300
-rw-r--r-- 1 root root     511 Jul 21 23:47 auto_consul_zc.sh
-rw-r--r-- 1 root root     426 Jul 17 12:48 auto_start_node_exporter.sh
drwxr-xr-x 4 root root      70 Jul 18 22:35 data
-rw-r--r-- 1 root root    2578 Jul 21 22:56 get_soft.sh
-rw-r--r-- 1 root root 9245080 Jul 17 11:44 node_exporter-1.0.0.linux-386.tar.gz
-rw-r--r-- 1 root root 9476268 Jul 17 11:44 node_exporter-1.0.0.linux-amd64.tar.gz
[root@prometheus ~]#

get_soft.sh:客户端主机下载node_exporter软件、node_exporter启动脚本。并且完成node_exporter安装。

auto_start_node_exporter.sh:客户端启动node_exporter脚本

auto_consul_zc.sh:使用/etc/ansible/hosts列表生成注册consul命令

auto_prometh_server.sh:启动prometheus+consul+alertmanager脚本

 

# vim get_soft.sh

ansible 使用sudo看文件 ansible consul_ansible 使用sudo看文件_02

ansible 使用sudo看文件 ansible consul_linux_03

#!/bin/bash
    if uname -a | grep -i _64 &> /dev/null ;then
        echo "64位处理方式"
        if [ -f /root/node_exporter-1.0.0.linux-amd64.tar.gz ];then
            echo ' node_exporter-1.0.0.linux-amd64.tar.gz file exist'
        else
            if timeout 20  wget -P /root  http://192.168.111.83/soft/node_exporter-1.0.0.linux-amd64.tar.gz;then
                if [ -f  /usr/local/node_exporter/node_exporter ];then
                    echo "node_exporter file exist , not exec command. "
                else
                    if [ -f /root/auto_start_node_exporter.sh ];then
                        echo 'auto_start_node_exporter.sh  exist '
                    else
                        if timeout 20  wget -P /root  http://192.168.111.83/soft/auto_start_node_exporter.sh;then
                            if cat /etc/rc.d/rc.local | grep -i auto_start_node_exporter &> /dev/null ;then
                                echo "file in auto startup  low"
                            else
                                echo '/root/auto_start_node_exporter.sh'  >> /etc/rc.d/rc.local
                                chmod a+x /etc/rc.d/rc.local 
                            fi    
                        else
                            echo 'auto_start_node_exporter.sh  copy faile !! '
                        fi
                    fi
                    mkdir /scripts/soft/   -p
                    tar -zxvf /root/node_exporter-1.0.0.linux-amd64.tar.gz  -C /scripts/soft/   && mv /scripts/soft/node_exporter-1.0.0.linux-amd64/ /usr/local/node_exporter
                    chmod a+x /root/auto_start_node_exporter.sh
                fi 
            else
                echo "node_exporter soft_file copy faile !!"
            fi
        fi
    else
        echo "32位处理方式"
        if [ -f /root/node_exporter-1.0.0.linux-amd64.tar.gz ];then
            echo ' node_exporter-1.0.0.linux-amd64.tar.gz file exist'
        else
            if timeout 20  wget -P /root  http://192.168.111.83/soft/node_exporter-1.0.0.linux-386.tar.gz;then
                if [ -f  /usr/local/node_exporter/node_exporter ];then
                    echo "node_exporter file exist , not exec command. "
                else
                    if [ -f /root/auto_start_node_exporter.sh ];then
                        echo 'auto_start_node_exporter.sh  exist '
                    else
                        if timeout 20  wget -P /root  http://192.168.111.83/soft/auto_start_node_exporter.sh;then
                            if cat /etc/rc.d/rc.local | grep -i auto_start_node_exporter &> /dev/null ;then
                                echo "file in auto startup  low"
                            else
                                echo '/root/auto_start_node_exporter.sh'  >> /etc/rc.d/rc.local
                                chmod a+x /etc/rc.d/rc.local 
                            fi    
                        else
                            echo 'auto_start_node_exporter.sh  copy faile !! '
                        fi
                    fi
                    mkdir /scripts/soft/   -p
                    tar -zxvf /root/node_exporter-1.0.0.linux-386.tar.gz  -C /scripts/soft/   && mv /scripts/soft/node_exporter-1.0.0.linux-386/ /usr/local/node_exporter
                    chmod a+x /root/auto_start_node_exporter.sh
                fi 
            else
                echo "node_exporter soft_file copy faile !!"
            fi
        fi
    fi

View Code

# vim auto_start_node_exporter.sh

ansible 使用sudo看文件 ansible consul_ansible 使用sudo看文件_02

ansible 使用sudo看文件 ansible consul_linux_03

#!/bin/bash
if  which netstat &> /dev/null;then
        if netstat -alntup | grep -i 9100 &> /dev/null;then
                echo "9100 port exist  netstat !"
        else
              /usr/local/node_exporter/node_exporter  &> /dev/null &
        fi
elif  which ss &> /dev/null ;then
        if  ss -alntup | grep -i 9100 &> /dev/null;then
                echo "9100 port exist ss ! "
        else
              /usr/local/node_exporter/node_exporter  &> /dev/null &
        fi
else
        echo "未知错误,不启动程序"
fi

View Code

# vim auto_consul_zc.sh

ansible 使用sudo看文件 ansible consul_ansible 使用sudo看文件_02

ansible 使用sudo看文件 ansible consul_linux_03

#!/bin/bash
cat /etc/ansible/hosts  | grep -v '\[' | grep -v ^$  | sed 's/=/ /'  | awk '{print $1,$3}' > /prometheus/tmp_consul_list.txt 
sleep 1
cat /prometheus/tmp_consul_list.txt  | while read  host_name    host_addr
do
    echo " curl -X PUT -d ' {\"id\": \"${host_name}\",\"name\": \"${host_name}\",\"address\": \"${host_addr}\",\"port\": 9100,\"tags\": [\"test\",\"node\",\"linux\"],\"checks\":  [{\"http\": \"http://${host_addr}:9100/metrics\", \"interval\": \"5s\"}]}' http://192.168.111.83:8500/v1/agent/service/register "
done

View Code

# vim auto_prometh_server.sh

ansible 使用sudo看文件 ansible consul_ansible 使用sudo看文件_02

ansible 使用sudo看文件 ansible consul_linux_03

#!/bin/bash
sleep 1
prometheus --config.file="/usr/local/prometheus/prometheus.yml" --storage.tsdb.retention.time=90d &> /dev/null &
consul agent -server -bootstrap-expect 1 -data-dir=/usr/local/consul_data/data -ui -bind 192.168.111.83 -client 0.0.0.0  &> /dev/null &
alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml --cluster.advertise-address=0.0.0.0:9093 &> /dev/null &

View Code

 

 执行ansible批量下发node_exporter和启动脚本并在主机端自动安装node_exporter软件加入启动脚本到开机启动。

 在下发之前需要确认被控主机节点有wget命令,如果没有wget命令则需要先完成wget安装。

[root@prometheus ~]# ansible all -m shell -a 'which wget '
node_1 | CHANGED | rc=0 >>
/usr/bin/wget
node_2 | CHANGED | rc=0 >>
/usr/bin/wget

 

使用ansible的script模块执行脚本/var/www/html/soft/get_soft.sh

ansible 使用sudo看文件 ansible consul_ansible 使用sudo看文件_02

ansible 使用sudo看文件 ansible consul_linux_03

[root@prometheus ~]# ansible all -m  script -a '/var/www/html/soft/get_soft.sh' 
node_2 | CHANGED => {
    "changed": true,
    "rc": 0,
    "stderr": "Shared connection to 192.168.111.124 closed.\r\n",
    "stderr_lines": [
        "Shared connection to 192.168.111.124 closed."
    ],
    "stdout": "64位处理方式\r\n--2020-07-22 15:09:35--  http://192.168.111.83/soft/node_exporter-1.0.0.linux-amd64.tar.gz\r\nConnecting to 192.168.111.83:80... connected.\r\nHTTP request sent, awaiting response... 200 OK\r\nLength: 9476268 (9.0M) [application/x-gzip]\r\nSaving to: ‘/root/node_exporter-1.0.0.linux-amd64.tar.gz’\r\n\r\n\r 0% [                                       ] 0           --.-K/s              \r100%[======================================>] 9,476,268   --.-K/s   in 0.05s   \r\n\r\n2020-07-22 15:09:35 (188 MB/s) - ‘/root/node_exporter-1.0.0.linux-amd64.tar.gz’ saved [9476268/9476268]\r\n\r\n--2020-07-22 15:09:35--  http://192.168.111.83/soft/auto_start_node_exporter.sh\r\nConnecting to 192.168.111.83:80... connected.\r\nHTTP request sent, awaiting response... 200 OK\r\nLength: 426 [application/x-sh]\r\nSaving to: ‘/root/auto_start_node_exporter.sh’\r\n\r\n\r 0% [                                       ] 0           --.-K/s              \r100%[======================================>] 426         --.-K/s   in 0s      \r\n\r\n2020-07-22 15:09:35 (77.3 MB/s) - ‘/root/auto_start_node_exporter.sh’ saved [426/426]\r\n\r\nnode_exporter-1.0.0.linux-amd64/\r\nnode_exporter-1.0.0.linux-amd64/node_exporter\r\nnode_exporter-1.0.0.linux-amd64/NOTICE\r\nnode_exporter-1.0.0.linux-amd64/LICENSE\r\n",
    "stdout_lines": [
        "64位处理方式",
        "--2020-07-22 15:09:35--  http://192.168.111.83/soft/node_exporter-1.0.0.linux-amd64.tar.gz",
        "Connecting to 192.168.111.83:80... connected.",
        "HTTP request sent, awaiting response... 200 OK",
        "Length: 9476268 (9.0M) [application/x-gzip]",
        "Saving to: ‘/root/node_exporter-1.0.0.linux-amd64.tar.gz’",
        "",
        "",
        " 0% [                                       ] 0           --.-K/s              ",
        "100%[======================================>] 9,476,268   --.-K/s   in 0.05s   ",
        "",
        "2020-07-22 15:09:35 (188 MB/s) - ‘/root/node_exporter-1.0.0.linux-amd64.tar.gz’ saved [9476268/9476268]",
        "",
        "--2020-07-22 15:09:35--  http://192.168.111.83/soft/auto_start_node_exporter.sh",
        "Connecting to 192.168.111.83:80... connected.",
        "HTTP request sent, awaiting response... 200 OK",
        "Length: 426 [application/x-sh]",
        "Saving to: ‘/root/auto_start_node_exporter.sh’",
        "",
        "",
        " 0% [                                       ] 0           --.-K/s              ",
        "100%[======================================>] 426         --.-K/s   in 0s      ",
        "",
        "2020-07-22 15:09:35 (77.3 MB/s) - ‘/root/auto_start_node_exporter.sh’ saved [426/426]",
        "",
        "node_exporter-1.0.0.linux-amd64/",
        "node_exporter-1.0.0.linux-amd64/node_exporter",
        "node_exporter-1.0.0.linux-amd64/NOTICE",
        "node_exporter-1.0.0.linux-amd64/LICENSE"
    ]
}
node_1 | CHANGED => {
    "changed": true,
    "rc": 0,
    "stderr": "Shared connection to 192.168.111.12 closed.\r\n",
    "stderr_lines": [
        "Shared connection to 192.168.111.12 closed."
    ],
    "stdout": "64位处理方式\r\n--2020-07-22 15:09:35--  http://192.168.111.83/soft/node_exporter-1.0.0.linux-amd64.tar.gz\r\nConnecting to 192.168.111.83:80... connected.\r\nHTTP request sent, awaiting response... 200 OK\r\nLength: 9476268 (9.0M) [application/x-gzip]\r\nSaving to: “/root/node_exporter-1.0.0.linux-amd64.tar.gz”\r\n\r\n\r 0% [                                       ] 0           --.-K/s              \r68% [=========================>             ] 6,524,944   31.0M/s              \r100%[======================================>] 9,476,268   27.7M/s   in 0.3s    \r\n\r\n2020-07-22 15:09:35 (27.7 MB/s) - “/root/node_exporter-1.0.0.linux-amd64.tar.gz” saved [9476268/9476268]\r\n\r\n--2020-07-22 15:09:35--  http://192.168.111.83/soft/auto_start_node_exporter.sh\r\nConnecting to 192.168.111.83:80... connected.\r\nHTTP request sent, awaiting response... 200 OK\r\nLength: 426 [application/x-sh]\r\nSaving to: “/root/auto_start_node_exporter.sh”\r\n\r\n\r 0% [                                       ] 0           --.-K/s              \r100%[======================================>] 426         --.-K/s   in 0s      \r\n\r\n2020-07-22 15:09:35 (116 MB/s) - “/root/auto_start_node_exporter.sh” saved [426/426]\r\n\r\nnode_exporter-1.0.0.linux-amd64/\r\nnode_exporter-1.0.0.linux-amd64/node_exporter\r\nnode_exporter-1.0.0.linux-amd64/NOTICE\r\nnode_exporter-1.0.0.linux-amd64/LICENSE\r\n",
    "stdout_lines": [
        "64位处理方式",
        "--2020-07-22 15:09:35--  http://192.168.111.83/soft/node_exporter-1.0.0.linux-amd64.tar.gz",
        "Connecting to 192.168.111.83:80... connected.",
        "HTTP request sent, awaiting response... 200 OK",
        "Length: 9476268 (9.0M) [application/x-gzip]",
        "Saving to: “/root/node_exporter-1.0.0.linux-amd64.tar.gz”",
        "",
        "",
        " 0% [                                       ] 0           --.-K/s              ",
        "68% [=========================>             ] 6,524,944   31.0M/s              ",
        "100%[======================================>] 9,476,268   27.7M/s   in 0.3s    ",
        "",
        "2020-07-22 15:09:35 (27.7 MB/s) - “/root/node_exporter-1.0.0.linux-amd64.tar.gz” saved [9476268/9476268]",
        "",
        "--2020-07-22 15:09:35--  http://192.168.111.83/soft/auto_start_node_exporter.sh",
        "Connecting to 192.168.111.83:80... connected.",
        "HTTP request sent, awaiting response... 200 OK",
        "Length: 426 [application/x-sh]",
        "Saving to: “/root/auto_start_node_exporter.sh”",
        "",
        "",
        " 0% [                                       ] 0           --.-K/s              ",
        "100%[======================================>] 426         --.-K/s   in 0s      ",
        "",
        "2020-07-22 15:09:35 (116 MB/s) - “/root/auto_start_node_exporter.sh” saved [426/426]",
        "",
        "node_exporter-1.0.0.linux-amd64/",
        "node_exporter-1.0.0.linux-amd64/node_exporter",
        "node_exporter-1.0.0.linux-amd64/NOTICE",
        "node_exporter-1.0.0.linux-amd64/LICENSE"
    ]
}
[root@prometheus ~]#

View Code

 

 

确认客户端上是否存在node_exporter软件,如果存在/usr/local/node_exporter/node_exporter文件表示软件安装成功。

[root@prometheus ~]# ansible all -m shell -a ' ls -l /usr/local/node_exporter/node_exporter  '
node_1 | CHANGED | rc=0 >>
-rwxr-xr-x 1 3434 3434 19572271 May 26 14:02 /usr/local/node_exporter/node_exporter
node_2 | CHANGED | rc=0 >>
-rwxr-xr-x 1 3434 3434 19572271 May 26 14:02 /usr/local/node_exporter/node_exporter
[root@prometheus ~]#

 rc=0表示命令有返回值,则表示文件存在,软件安装成功。

 

确认客户端上是否存在auto_start_node_exporter.sh启动脚本,如果存在/root/auto_start_node_exporter.sh则启动脚本获取成功。

[root@prometheus ~]# ansible all -m shell -a ' ls -l /root/auto_start_node_exporter.sh  '
node_1 | CHANGED | rc=0 >>
-rwxr-xr-x 1 root root 426 Jul 17 12:48 /root/auto_start_node_exporter.sh
node_2 | CHANGED | rc=0 >>
-rwxr-xr-x 1 root root 426 Jul 17 12:48 /root/auto_start_node_exporter.sh
[root@prometheus ~]#

  rc=0表示命令有返回值,则表示启动脚本存在。

 

确认客户端上是否成功添加启动脚本到开机启动,如果/etc/rc.d/rc.local中有/root/auto_start_node_exporter.sh行则开机启动添加成功。

[root@prometheus ~]# ansible all -m shell -a ' cat /etc/rc.d/rc.local  | grep -i  auto_start_node_exporter  '
node_1 | CHANGED | rc=0 >>
/root/auto_start_node_exporter.sh
node_2 | CHANGED | rc=0 >>
/root/auto_start_node_exporter.sh
[root@prometheus ~]#

   rc=0表示命令有返回值,则表示/etc/rc.d/rc.local文件中有/root/auto_start_node_exporter.sh行,表示脚本开机启动配置完成。

 

通过命令启动客户端的node_exporter软件,启动后查看客户主机上9100端口是否处于监听状态。

[root@prometheus ~]# ansible all -m shell -a 'nohup /root/auto_start_node_exporter.sh ' 
node_1 | CHANGED | rc=0 >>
nohup: ignoring input
node_2 | CHANGED | rc=0 >>
nohup: ignoring input
[root@prometheus ~]#

 调用客户端脚本启动node_exporter软件。

 

确认客户端9100端口由node_exporter软件监听。

[root@prometheus ~]# ansible all -m shell -a 'ss -alntup | grep -i 9100 ' 
node_1 | CHANGED | rc=0 >>
tcp    LISTEN     0      128                   :::9100                 :::*      users:(("node_exporter",3913,3))
node_2 | CHANGED | rc=0 >>
tcp    LISTEN     0      128      :::9100                 :::*                   users:(("node_exporter",pid=10833,fd=3)

 9100端口启动表示node_exporter软件安装完成。

 

step3:注册客户端node_exporter到consul

使用shell脚本从/etc/ansible/hosts中提取被控主机信息,自动生成注册consul命令。

在进行该步骤前需要提前完成consul软件的安装与配置。

[root@prometheus ~]# source /var/www/html/soft/auto_consul_zc.sh 
curl -X PUT -d '{"id": "node_1","name": "node_1","address": "192.168.111.12","port": 9100,"tags": ["test","node","linux"],"checks": [{"http": "http://192.168.111.12:9100/metrics", "interval": "5s"}]}'  http://192.168.111.83:8500/v1/agent/service/register 
curl -X PUT -d '{"id": "node_2","name": "node_2","address": "192.168.111.124","port": 9100,"tags": ["test","node","linux"],"checks": [{"http": "http://192.168.111.124:9100/metrics", "interval": "5s"}]}'  http://192.168.111.83:8500/v1/agent/service/register

 注:

    auto_consul_zc.sh脚本中http://192.168.111.83:8500/v1/agent/service/register中的ip地址是写死的,读者请根据自己的consul服务器地址修改。

 

复制脚本生成的命令并执行。

[root@prometheus ~]# curl -X PUT -d '{"id": "node_1","name": "node_1","address": "192.168.111.12","port": 9100,"tags": ["test","node","linux"],"checks": [{"http": "http://192.168.111.12:9100/metrics", "interval": "5s"}]}'  http://192.168.111.83:8500/v1/agent/service/register 
[root@prometheus ~]# 
[root@prometheus ~]# curl -X PUT -d '{"id": "node_2","name": "node_2","address": "192.168.111.124","port": 9100,"tags": ["test","node","linux"],"checks": [{"http": "http://192.168.111.124:9100/metrics", "interval": "5s"}]}'  http://192.168.111.83:8500/v1/agent/service/register

  

 

step4:确认consul注册信息

使用网页浏览器打开输入consul服务器的地址端口查看界面中是否有新注册服务器,如果有则表示注册成功。

http://192.168.111.83:8500/

ansible 使用sudo看文件 ansible consul_linux_12

 能够在web界面查看到节点表示节点注册成功,此时可以在prometheus中添加consul资源实现对数据的收集工作。