方案说明
本方案是使用CloudWatch
如何集成第三方如飞书、微信等支持API操作的即时聊天软件,以下以飞书告警为例。在这篇文章中,我将会介绍如何通过Amazon SNS
和AWS Lambda
来实现将AWS CloudWatch
告警信息发送到飞书。
监控数据流向图
方案架构
本方案中 CloudWatch 接收 EC2 运行指标并进行监控。当 EC2 指标超出设定阈值后,CloudWatch 触发告警事件,并将事件消息通过 SNS 发送到 Lambda 函数。Lambda函数执行用户自定义的代码,包括:解析告警消息并发送到飞书、企业微信、钉钉机器人、或者 Prometheus 等平台。
飞书设置
启用机器人
企业自建应用,创建好应用后,点击应用功能,选择“机器人”,启用机器人,具体如下:
版本管理与发布
版本发布后,待飞书管理员审批通过后,会显示已上架,如下:
查看自建应用
创建 AWS Lambda
创建函数
上传zip代码包
创建 Amazon SNS
创建主题
创建订阅
创建基于Lambda
协议的订阅:
飞书对接cloudwatch报警代码
# Note that the file name must be lambda_function.py
# cat lambda_function.py
# -*- coding: UTF-8 -*-
# author: tengfei.wu
# date: 20211028
# version: V1
import requests
import json
import os
def lambda_handler(event, context):
# Set Feishu parameters
data_app = {
"app_id": "your-id",
"app_secret": "your-secret"
}
chat_name="your Feishu alarm group name"
# Get token
try:
res = requests.post("https://open.feishu.cn/open-apis/auth/v3/tenant_access_token/internal/", json=data_app)
if res.status_code == 200:
res_json = res.json()
access_token = res_json.get("tenant_access_token")
access_token = access_token
except Exception as e:
return {"error": e}
headers={
"Authorization": "Bearer {}".format(access_token),
"Content-Type": "application/json; charset=utf-8"
}
# 获取群列表
params = {
"page_size": 100,
"page_token": ""
}
try:
res = requests.get("https://open.feishu.cn/open-apis/chat/v4/list", params=params, headers=headers)
if res.status_code == 200:
res_json = res.json()
data = res_json.get("data")
groups = data.get("groups")
for i in groups:
if i.get("name") == chat_name:
group = i
except Exception as e:
return {"error": e}
# send Message
chat_id = group.get("chat_id")
message = event['Records'][0]['Sns']
Timestamp = message['Timestamp']
Subject = message['Subject']
sns_message = json.loads(message['Message'])
region = message['TopicArn'].split(':')[-3]
NewStateReason = json.loads(event['Records'][0]['Sns']['Message'])['NewStateReason']
if "ALARM" in Subject:
title = '[AI生产环境] 警报!!'
elif "OK" in Subject:
title = '[AI生产环境] 故障恢复!'
else:
title = '[AI生产环境] 警报状态异常'
content = title \
+ "\n> **详情信息**" \
+ "\n> **时间**: " + Timestamp \
+ "\n> **内容**: " + Subject \
+ "\n> **状态**: {old} => {new}".format(old=sns_message['OldStateValue'], new=sns_message['NewStateValue']) \
+ "\n> " \
+ "\n> **AWS区域**: " + sns_message['Region'] \
+ "\n> **监控资源对象**: " + sns_message['Trigger']['Namespace'] \
+ "\n> **监控指标**: " + sns_message['Trigger']['MetricName'] \
+ "\n> " \
+ "\n> **报警名称**: " + sns_message['AlarmName'] \
+ "\n> **报警创建方式**: " + sns_message['AlarmDescription'] \
+ "\n> " \
+ "\n> **报警细节**: " + NewStateReason
data = {
"chat_id": chat_id,
"msg_type": 'text',
"content": {'text': content}
}
print(data)
try:
response=requests.post("https://open.feishu.cn/open-apis/message/v4/send/", headers=headers,json=data)
print(response)
print(response.json())
except Exception as e:
return {"error":e}