方案说明

本方案是使用CloudWatch如何集成第三方如飞书、微信等支持API操作的即时聊天软件,以下以飞书告警为例。在这篇文章中,我将会介绍如何通过Amazon SNSAWS Lambda来实现将AWS CloudWatch告警信息发送到飞书。

监控数据流向图

方案架构

本方案中 CloudWatch 接收 EC2 运行指标并进行监控。当 EC2 指标超出设定阈值后,CloudWatch 触发告警事件,并将事件消息通过 SNS 发送到 Lambda 函数。Lambda函数执行用户自定义的代码,包括:解析告警消息并发送到飞书、企业微信、钉钉机器人、或者 Prometheus 等平台。

6wDqCq.png

飞书设置

启用机器人

企业自建应用,创建好应用后,点击应用功能,选择“机器人”,启用机器人,具体如下:

版本管理与发布

版本发布后,待飞书管理员审批通过后,会显示已上架,如下:

查看自建应用

创建 AWS Lambda

创建函数

上传zip代码包

创建 Amazon SNS

创建主题

6wsg6f.png

创建订阅

创建基于Lambda协议的订阅:

飞书对接cloudwatch报警代码

# Note that the file name must be lambda_function.py
# cat lambda_function.py

# -*- coding: UTF-8 -*-
# author: tengfei.wu
# date: 20211028
# version: V1

import requests
import json
import os

def lambda_handler(event, context):
    # Set Feishu parameters
    data_app = {
        "app_id": "your-id",
        "app_secret": "your-secret"
    }
    chat_name="your Feishu alarm group name"
    # Get token
    try:
        res = requests.post("https://open.feishu.cn/open-apis/auth/v3/tenant_access_token/internal/", json=data_app)
        if res.status_code == 200:
            res_json = res.json()
            access_token = res_json.get("tenant_access_token")
            access_token = access_token
    except Exception as e:
        return {"error": e}
    headers={
        "Authorization": "Bearer {}".format(access_token),
        "Content-Type": "application/json; charset=utf-8"
    }
    # 获取群列表
    params = {
        "page_size": 100,
        "page_token": ""
    }
    try:
        res = requests.get("https://open.feishu.cn/open-apis/chat/v4/list", params=params, headers=headers)
        if res.status_code == 200:
            res_json = res.json()
            data = res_json.get("data")
            groups = data.get("groups")
            for i in groups:
                if i.get("name") == chat_name:
                    group = i
    except Exception as e:
        return {"error": e}
    # send Message
    chat_id = group.get("chat_id")

    message = event['Records'][0]['Sns']
    Timestamp = message['Timestamp']
    Subject = message['Subject']
    sns_message = json.loads(message['Message'])
    region = message['TopicArn'].split(':')[-3]
    NewStateReason = json.loads(event['Records'][0]['Sns']['Message'])['NewStateReason']

    if "ALARM" in Subject:
        title = '[AI生产环境] 警报!!'
    elif "OK" in Subject:
        title = '[AI生产环境] 故障恢复!'
    else:
        title = '[AI生产环境] 警报状态异常'
    
    content = title \
              + "\n> **详情信息**" \
              + "\n> **时间**: " + Timestamp \
              + "\n> **内容**: " + Subject \
              + "\n> **状态**: {old} => {new}".format(old=sns_message['OldStateValue'], new=sns_message['NewStateValue']) \
              + "\n> " \
              + "\n> **AWS区域**: " + sns_message['Region'] \
              + "\n> **监控资源对象**: " + sns_message['Trigger']['Namespace'] \
              + "\n> **监控指标**: " + sns_message['Trigger']['MetricName'] \
              + "\n> " \
              + "\n> **报警名称**: " + sns_message['AlarmName'] \
              + "\n> **报警创建方式**: " + sns_message['AlarmDescription'] \
              + "\n> " \
              + "\n> **报警细节**: " + NewStateReason

    data = {
        "chat_id": chat_id,
        "msg_type": 'text',
        "content": {'text': content}
    }
    print(data)
    try:
        response=requests.post("https://open.feishu.cn/open-apis/message/v4/send/", headers=headers,json=data)
        print(response)
        print(response.json())
    except Exception as e:
        return {"error":e}

飞书报警范例

代码升级

详见飞书 Amazon CloudWatch 告警代码升级

参考链接