一、Prometheus安装及配置

请参考: CentOS7.5 Prometheus2.5+Grafana5.4监控部署

二 、基于Consul的服务发现

1、概述

  • Consul 是一个支持多数据中心分布式高可用的服务发现和配置共享的服务软件.
  • Consul 由 HashiCorp公司用Go语言开发, 基于Mozilla Public License 2.0的协议进行开源. 
  • Consul 支持健康检查,并允许 HTTP 和 DNS 协议调用 API 存储键值对.
  • 命令行超级好用的虚拟机管理软件 vgrant 也是 HashiCorp 公司开发的产品.
  • 一致性协议采用 Raft 算法,用来保证服务的高可用. 使用 GOSSIP 协议管理成员和广播消息, 并且支持 ACL 访问控制.

架构图

2、下载及安装

wget https://releases.hashicorp.com/consul/1.2.4/consul_1.2.4_linux_amd64.zip
unzip consul_1.2.4_linux_amd64.zip -d /app/prometheus/bin/
cd /app/prometheus/bin/
chown -R prometheus.prometheus consul

3、创建Consul.service 的 systemd unit 文件

# vim /usr/lib/systemd/system/consul.service
[Unit]
Description=consul
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple
User=prometheus
ExecStart=/app/prometheus/bin/consul agent \
-server -bootstrap-expect 1 \
-bind=0.0.0.0 \
-client=172.16.9.201 \
-data-dir=/app/prometheus/consuld/data/consul \
-node=172.17.9.201 \
-config-dir=/app/prometheus/consuld/conf \
-ui
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
参数说明
  • –net=host docker参数, 使得docker容器越过了net namespace的隔离,免去手动指定端口映射的步骤
  • -server consul支持以server或client的模式运行, server是服务发现模块的核心, client主要用于转发请求
  • -advertise 将本机私有IP传递到consul
  • -bootstrap-expect 指定consul将等待几个节点连通,成为一个完整的集群
  • -retry-join 指定要加入的consul节点地址,失败会重试, 可多次指定不同的地址
  • -client consul绑定在哪个client地址上,这个地址提供HTTP、DNS、RPC等服务,默认是127.0.0.1
  • -bind 该地址用来在集群内部的通讯,集群内的所有节点到地址都必须是可达的,默认是0.0.0.0
  • allow_stale 设置为true, 表明可以从consul集群的任一server节点获取dns信息, false则表明每次请求都会经过consul server leader

4、启动服务

 systemctl daemon-reload
 systemctl start consul.service 
 systemctl enable consul.service 

5、查看运行状态

# systemctl status consul.service 
● consul.service - consul
   Loaded: loaded (/usr/lib/systemd/system/consul.service; enabled; vendor preset: disabled)
   Active: active (running) since 二 2018-12-11 15:23:12 CST; 20min ago
     Docs: https://prometheus.io/
 Main PID: 5721 (consul)
   CGroup: /system.slice/consul.service
           └─5721 /app/prometheus/bin/consul agent -server -bootstrap-expect 1 -bind=0.0.0.0 -client=172.16.9.201 -data-dir=/app/prometheus/consuld/data/consul -node=172.17.9.201 -config...

12月 11 15:23:18 prometheus-node1 consul[5721]: 2018/12/11 15:23:18 [INFO] raft: Election won. Tally: 1
12月 11 15:23:18 prometheus-node1 consul[5721]: 2018/12/11 15:23:18 [INFO] raft: Node at 172.16.9.201:8300 [Leader] entering Leader state
12月 11 15:23:18 prometheus-node1 consul[5721]: 2018/12/11 15:23:18 [INFO] consul: cluster leadership acquired
12月 11 15:23:18 prometheus-node1 consul[5721]: 2018/12/11 15:23:18 [INFO] consul: New leader elected: 172.17.9.201
12月 11 15:23:18 prometheus-node1 consul[5721]: 2018/12/11 15:23:18 [INFO] consul: member '172.17.9.201' joined, marking health alive
12月 11 15:23:18 prometheus-node1 consul[5721]: 2018/12/11 15:23:18 [INFO] agent: Synced node info
12月 11 15:23:25 prometheus-node1 consul[5721]: ==> Newer Consul version available: 1.4.0 (currently running: 1.2.4)
12月 11 15:40:21 prometheus-node1 consul[5721]: 2018/12/11 15:40:21 [WARN] agent: Service name "node_exporter" will not be discoverable via DNS due to invalid characters. Val...and dashes.
12月 11 15:40:21 prometheus-node1 consul[5721]: 2018/12/11 15:40:21 [INFO] agent: Synced service "node_exporter"
12月 11 15:42:08 prometheus-node1 consul[5721]: 2018/12/11 15:42:08 [INFO] agent: Synced service "node_exporter"
Hint: Some lines were ellipsized, use -l to show in full.

http://172.16.9.201:8500/ui/

6、配置Consul.service自动注册

yum -y install jq

服务查询

# curl -s http://172.16.9.201:8500/v1/catalog/services|jq
{
  "consul": []
}

使用HTTP接口服务注册:

# curl -X PUT -d '{"ID": "node_exporter", "Name": "node_exporter", "Address": "172.16.9.201", "Port": 9100, "Tags": ["lock"], "EnableTagOverride": false}' http://172.16.9.201:8500/v1/agent/service/register
# curl -s http://172.16.9.201:8500/v1/catalog/services|jq
{
  "consul": [],
  "node_exporter": [
    "lock"
  ]
}

7、prometheus服务配置文件

# vim prometheus.yml 

global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 172.16.9.201:9093
      - 172.16.9.202:9093

rule_files:
   - /app/prometheus/cfg/rule.yml
scrape_configs:
  - job_name: 'prometheus'
    metrics_path:    /metrics
    honor_labels:    false
    static_configs:
      - targets: ['localhost:9090']
        labels:
          group: 'node'
          service: 'prometheus'
  - job_name: 'prod_discover'
    metrics_path: /metrics
    honor_labels: false
    consul_sd_configs:
    - server: '172.16.9.201:8500'
      services: ['node_exporter']
      tag_separator: ''
    relabel_configs:
    - source_labels: ['__meta_consul_tags']
      target_label: 'product'
    - source_labels: ['__meta_consul_dc']
      target_label: 'idc'
    - source_labels: ['__meta_consul_service']
      target_label: 'service'
    - source_labels: ['job']
      target_label: 'environment'
      regex:        '(.*)_discover'
      replacement:   '${1}'
打开WEB界面

http://172.16.9.201:9090/targets