由于工作需要,我们公司用的是kube-prometheus这个组合套件,没有通过二进制方式安装,省事不少,在测试时候也遇到不少坑,在这里做一下记录 1 prometheus规则编写 这个规则网上大多数都有,我这里只是利用kube-prometheus方式写规则方法记录 编辑prometheus-rules.yaml(这个文件在kube-prometheus克隆好代码后 的/manifests目录里),在最下面写入如下内容,目前是写好探测Pod的运行状态及 node节点状态功能,其它类似,增加如下内容 #探测pod状态 - alert: pod-status annotations: message: pod-{{ $labels.pod }}故障 expr: | kube_pod_container_status_running != 1 for: 1m labels: severity: warning

  #探测node节点状态
- alert: node-status
  annotations:
    message: node-{{ $labels.hostname }}故障
  expr: |
    kube_node_status_condition{status="unknown",condition="Ready"} == 1
  for: 1m
  labels:
    severity: warning
	最后保存退出
	再编辑alertmanager-secret.yaml文件,这个文件主要是配置发送邮件或是钉钉,我  
	这里是钉钉方式告警,邮件也配置了,只是没有用到,邮件现在很少看,所以直接
	钉钉告警查看了,把以下内容替换掉原来,如下:
	apiVersion: v1

data: {} kind: Secret metadata: name: alertmanager-main namespace: monitoring stringData: alertmanager.yaml: |- global: resolve_timeout: 1m # 处理超时时间 smtp_smarthost: 'smtp.abc.net:25' # 邮箱smtp服务器代理 smtp_from: 'monitor-admin@abc.net' # 发送邮箱名称 smtp_auth_username: 'monitor-admin@abc.net' # 邮箱名称 smtp_auth_password: 'Zabbixabc2016' # 授权密码 smtp_require_tls: false # 不开启tls 默认开启

receivers:
- name: 'webhook'
  webhook_configs:
  - url: 'http://webhook-dingtalk/dingtalk/send/'   #钉钉报警连接,这个一会要单独
    部署,因为默认alertmanager发送的报警内容,钉钉不能识别,需要转换下
    send_resolved: true
route:
  group_interval: 1m # 在发送新警报前的等待时间
  group_wait: 10s # 最初即第一次等待多久时间发送一组警报的通知 
  receiver: webhook
  repeat_interval: 1m # 发送重复警报的周期

type: Opaque 最后保存退出即可。 现在要部署钉钉报警的一个pod 这里感谢http://www.mamicode.com/info-detail-2845201.html作者,我是在此基 础上把报警脚本给定制了下,符合我司告警内容,我更改好后如下图: 原来报警图: 更改后的 更改好符合我司脚本内容如下:就是那个app.py脚本 ``` #!/usr/bin/env python import time,io, sys,arrow,os

 sys.stdout = io.TextIOWrapper(sys.stdout.detach(), encoding='utf-8')
sys.stderr = io.TextIOWrapper(sys.stderr.detach(), encoding='utf-8')

from flask import Flask, Response
from flask import request
import requests
import logging
import json
import locale

#locale.setlocale(locale.LC_ALL,"en_US.UTF-8")

app = Flask(name)

console = logging.StreamHandler() fmt = '%(asctime)s - %(filename)s:%(lineno)s - %(name)s - %(message)s' formatter = logging.Formatter(fmt) console.setFormatter(formatter) log = logging.getLogger("flask_webhook_dingtalk") log.addHandler(console) log.setLevel(logging.DEBUG)

EXCLUDE_LIST = ['prometheus', 'endpoint']

@app.route('/') def index(): return 'Webhook Dingtalk by Billy https://blog.51cto.com/billy98'

@app.route('/dingtalk/send/',methods=['POST'])

def hander_session():

profile_url = sys.argv[1]
post_data = request.get_data()
post_data = json.loads(post_data.decode("utf-8"))['alerts']
post_data = post_data[0]
messa_list = []
if post_data['status'].upper() == "FIRING":
   messa_list.append('### 报警名称: Prometheus-alert')
   messa_list.append('**报警状态: 异常**')
   messa_list.append('**报警时间: %s**' % arrow.get(post_data['startsAt']).to('Asia/Shanghai').format('YYYY-MM-DD HH:mm:ss ZZ'))
   messa_list.append('**报警级别: %s**' % post_data['labels']['severity'])
   messa_list.append('**报警类型: %s**' % post_data['labels']['alertname'])
   messa_list.append('**报警详情: %s**' % post_data['annotations']['message'])
   messa = (' \\n\\n > '.join(messa_list))
else:
   messa_list.append('### 报警名称: Prometheus-alert')
   messa_list.append('**报警状态: 恢复**')
   messa_list.append('**报警时间: %s**' % arrow.get(post_data['startsAt']).to('Asia/Shanghai').format('YYYY-MM-DD HH:mm:ss ZZ'))
   messa_list.append('**恢复时间: %s**' % arrow.get(post_data['endsAt']).to('Asia/Shanghai').format('YYYY-MM-DD HH:mm:ss ZZ'))
   messa_list.append('**报警级别: %s**' % post_data['labels']['severity'])
   messa_list.append('**报警类型: %s**' % post_data['labels']['alertname'])
   messa_list.append('**报警详情: %s**' % post_data['annotations']['message'])
   messa = (' \\n\\n > '.join(messa_list))
status = alert_data(messa, post_data['labels']['alertname'], profile_url )
log.info(status)
return status

def alert_data(data,title,profile_url): headers = {'Content-Type':'application/json'} send_data = '{"msgtype": "markdown","markdown": {"title": "%s" ,"text": "%s" }}' %(title,data) # type: str send_data = send_data.encode('utf-8') reps = requests.post(url=profile_url, data=send_data, headers=headers) return reps.text

if name == 'main': app.debug = False app.run(host='0.0.0.0', port='8080') 最后重新打一个镜像即可。按照 Dockerfile内容如下: FROM centos:7 as build MAINTAINER billy98 5884625@qq.com RUN mkdir /root/.pip ADD pip.conf /root/.pip/pip.conf

RUN curl -o /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo && yum install -y python36 python36-pip && pip3.6 install flask requests werkzeug arrow requests ADD app.py /usr/local/alert-dingtalk.py

FROM gcr.io/distroless/python3 COPY --from=build /usr/local/alert-dingtalk.py /usr/local/alert-dingtalk.py COPY --from=build usr/local/lib64/python3.6/site-packages usr/local/lib64/python3.6/site-packages COPY --from=build usr/local/lib/python3.6/site-packages usr/local/lib/python3.6/site-packages ENV PYTHONPATH=usr/local/lib/python3.6/site-packages:usr/local/lib64/python3.6/site-packages EXPOSE 8080 ENTRYPOINT ["python","/usr/local/alert-dingtalk.py"] 最后更改成k8s的你的dingding.yaml或是其它文件名称即可,部署下即可。 更多k8s相关或是自动化运维,请移步到www.wangshuying.cn网站查看 里面有更多运维相关方面的知识要点。