参考:https://blog.51cto.com/flyfish225/2554294
参考:https://blog.csdn.net/qq_31555951/article/details/110666480

Prometheus 部署

wget  https://github.com/prometheus/prometheus/releases/download/v2.23.0/prometheus-2.23.0.linux-amd64.tar.gz
tar xf prometheus-2.23.0.linux-amd64.tar.gz  -C /usr/local/
cd /usr/local
mv prometheus-2.23.0.linux-amd64/ prometheus
cd prometheus/
vim prometheus.yml
   global:
     scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
     evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
     # scrape_timeout is set to the global default (10s).
   alerting:
     alertmanagers:
     - static_configs:
       - targets:
         # - alertmanager:9093
   rule_files:
     # - "first_rules.yml"
     # - "second_rules.yml"
   scrape_configs:
     # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
     - job_name: 'prometheus'
       # metrics_path defaults to '/metrics'
       # scheme defaults to 'http'.
       static_configs:
       - targets: ['192.168.31.10:9090']
     - job_name: 'node'
       static_configs:
       - targets: ['192.168.31.10:9100']
         labels:
           app: master01
           nodename: k8s-master01
           role: master
       - targets: ['192.168.31.5:9100']
         labels:
           app: master01
           nodename: test1
           role: master
启动:
   nohup  ./    prometheus &

node_exporter 部署

wget  https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.linux-amd64.tar.gz
tar xf node_exporter-1.0.1.linux-amd64.tar.gz -C /usr/local/
mv node_exporter-1.0.1.linux-amd64 node_exporter    
启动:
  nohup ./node_exporter  &

node是否存活

up{nodename="test1"}

磁盘使用率公式

监控 / 使用率
round((1 - (node_filesystem_avail_bytes{fstype=~"ext3|ext4|xfs|nfs", nodename="test1",mountpoint="/"} / node_filesystem_size_bytes{fstype=~"ext3|ext4|xfs|nfs", nodename="test1",mountpoint="/"})) * 100) 
监控磁盘类型是ext4和xfs和NFS的使用率
round((1 - (node_filesystem_avail_bytes{fstype=~"ext4|xfs|nfs", nodename="test1"} / node_filesystem_size_bytes{fstype=~"ext4|xfs|nfs", nodename="test1"})) * 100) 

cpu 负载

node_load1{nodename="test1"}  #1分钟负载
node_load5{nodename="test1"}  #5分 钟负载
node_load15    {nodename="test1"} #15分钟负载

内存使用率

ceil((1 - (node_memory_MemAvailable_bytes{nodename="test1"} / (node_memory_MemTotal_bytes{nodename="test1"})))* 100 ) 

CPU

ceil(100 - sum(increase(node_cpu_seconds_total{nodename="test1",mode="idle"}[5m]))  by(instance) / sum(increase(node_cpu_seconds_total{nodename="test1"}[5m]))  by(instance)*100) 

查看打开文件数

node_filefd_allocated{nodename="test1"}

监控tcp链接等待关闭的链接

node_sockstat_TCP_tw