文章目录
- 1 概述
- 2 效果预览
- 3 组件安装
- 2.1 metric-server
- 2.2 Prometheus及其它
- 4 重点文件解析
- 报警配置 alertmanager.yaml
- grafana的部署及配置文件 grafana.yaml
- 前端域名访问配置 ingress.yaml
- Prometheus附加配置文件 prometheus-additional.yaml
- 部署Prometheus的pod及配置报警规则等 prometheus.yaml
1 概述
本文通过Prometheus为主的若干组件实现k8s全组件监控及业务监控。
- 使用metric-server收集数据给k8s集群内使用,如kubectl,hpa,scheduler等
- 使用prometheus-operator部署prometheus,存储监控数据
- 使用kube-state-metrics收集k8s集群内资源对象数据
- 使用node_exporter收集集群中各节点的数据
- 使用prometheus收集apiserver,scheduler,controller-manager,kubelet组件数据
- 使用alertmanager实现监控报警
- 使用grafana实现数据可视化
2 效果预览
3 组件安装
2.1 metric-server
# 部署
git clone https://github.com/ct1150/k8s-metrics-server.git
cd k8s-metrics-server/
kubectl apply -f ./
# 查看状态
kubectl get pods -n kube-system|grep metrics-server
metrics-server-5897d76755-84pc5 1/1 Running 0 36d
2.2 Prometheus及其它
git clone https://github.com/ct1150/k8s-prometheus.git
cd k8s-prometheus
kubectl apply -f ./
文件名 | 说明 |
├── alertmanager-deploy.yaml | 报警模块部署 |
├── alertmanager.yaml | 报警模块配置文件,需转为secret |
├── blackexporter-configmap.yaml | 黑盒监控配置文件,主要用于监控域名 |
├── black-exporter.yaml | 黑盒监控部署文件 |
├── grafana.yaml | 前端展示模块部署 |
├── ingress.yaml | 创建用于前端展示的域名 |
├── kube-k8s-service.yaml | 创建各组件的服务 |
├── kube-servicemonitor.yaml | Prometheus服务发现 |
├── kube-state-metrics.yaml | 部署kube-stat-metrics用于导出集群metrics |
├── monitoring-namespace.yaml | 创建namespace |
├── node_exporter.yaml | 部署nodeexporter用于收集node节点数据 |
├── prometheus-additional-configs.yaml | prometheus额外配置项 |
├── prometheus-additional.yaml | prometheus额外配置项 |
├── prometheus-operator.yaml | 部署prometheus-operator |
├── prometheus.yaml | 部署prometheus |
├── README.md | |
└── wechat.tmpl | 微信报警模板 |
- 生成Prometheus额外配置
kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml -n monitoring --dry-run -oyaml > additional-scrape-configs.yaml
默认的Prometheus配置无法满足所有需求,因此新增了以下功能(基于Prometheus的k8s发现与blackboxexporter):
job name | 触发条件 | 触发值 | 作用 |
kubernetes-ingresses | __meta_kubernetes_ingress_annotation_prometheus_io_scrape | true | ingress http监控 |
kubernetes-service-http-probe | __meta_kubernetes_service_annotation_prometheus_io_scrape, __meta_kubernetes_service_annotation_prometheus_io_http_probe | true | service http监控 |
kubernetes-service-tcp-probe | __meta_kubernetes_service_annotation_prometheus_io_scrape, __meta_kubernetes_service_annotation_prometheus_io_tcp_probe | true | service tcp监控 |
kubernetes-app-metrics | __meta_kubernetes_service_annotation_prometheus_io_scrape, __meta_kubernetes_service_annotation_prometheus_io_app_metrics | true | 自定义metrics收集(基于endpoint) |
kubernetes-pods | __meta_kubernetes_pod_annotation_prometheus_io_scrape | true | 自定义metrics收集(基于pod) |
例如:监控ingress中某个域名的/lottery路径,配置如下
"annotations": {
"kubernetes.io/ingress.class": "traefik",
"prometheus.io/http-probe-path": "/lottery",
"prometheus.io/scrape": "true"
}
4 重点文件解析
报警配置 alertmanager.yaml
[root@k8s1 k8s-monitor]# cat alertmanager.yaml
route:
group_by: ['alertname']
receiver: 'wechat'
repeat_interval: 60m
receivers:
- name: 'wechat'
wechat_configs:
- corp_id: 'your_corp_id'
to_party: '2'
agent_id: '1'
api_secret: 'your_api_secret'
wechat_config中填入你申请的企业号提供的对应参数
grafana的部署及配置文件 grafana.yaml
封装了若干个默认的dashboard,以configmap的形式挂载到grafana目录
- configMap:
name: grafana-dashboards
name: grafana-dashboards
- configMap:
name: grafana-dashboard-k8s-cluster-rsrc-use
name: grafana-dashboard-k8s-cluster-rsrc-use
- configMap:
name: grafana-dashboard-k8s-node-rsrc-use
name: grafana-dashboard-k8s-node-rsrc-use
- configMap:
name: grafana-dashboard-k8s-resources-cluster
name: grafana-dashboard-k8s-resources-cluster
- configMap:
name: grafana-dashboard-k8s-resources-namespace
name: grafana-dashboard-k8s-resources-namespace
- configMap:
name: grafana-dashboard-k8s-resources-pod
name: grafana-dashboard-k8s-resources-pod
- configMap:
name: grafana-dashboard-nodes
name: grafana-dashboard-nodes
- configMap:
name: grafana-dashboard-pods
name: grafana-dashboard-pods
- configMap:
name: grafana-dashboard-statefulset
name: grafana-dashboard-statefulset
前端域名访问配置 ingress.yaml
配置Prometheus及grafana的前端访问域名
Prometheus附加配置文件 prometheus-additional.yaml
# prometheus-additional.yaml用于实现黑盒监控的自动发现,自动监控满足条件的pod/ingress/service等
# 将配置文件转换为secret用于挂载到Prometheus容器的配置中
kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml -n monitoring --dry-run -oyaml > additional-scrape-configs.yaml
部署Prometheus的pod及配置报警规则等 prometheus.yaml
#报警规则实例如下,可自行修改添加,已涵盖集群大部分性能及可用性报警
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: k8s
role: alert-rules
name: prometheus-k8s-rules
namespace: monitoring
spec:
groups:
- name: k8s.rules
rules:
- expr: |
sum(rate(container_cpu_usage_seconds_total{job="kubelet", image!=""}[5m])) by (namespace)
record: namespace:container_cpu_usage_seconds_total:sum_rate
..................