自定义HPA

什么是自定义hpa

在日常使用中,一般使用CPU或内存指标作为hpa扩缩容的依据,但有些时候仅仅只参考CPU或内存还是无法满足业务场景的,比如基于业务单副本QPS大小来进行自动扩缩容。
所以衍生出自定义HPA。HPA又分为v1和v2两种ApiService类型,v1一般针对cpu、内存扩缩容,v2针对自定义hpa进行扩缩容。但针对v2这种ApiService并不是原生就能支持的,
需要安装一个特殊的工具,如PrometheusAdapter。

PrometheusAdapter的作用:

prometheus采集到的metrics并不能直接给k8s使用,因为两者数据格式不兼容,这时候就需要另外一个组件(Prometheus Adapter),
将Prometheus的metrics数据格式转换为K8s的API接口能识别的格式,因为Prometheus-Adapter 是自定义API service ,所以还需要Kubernetes aggregator在主API服务器中注册,
以便通过/api/ 来访问

本文档主要围绕自定义HPA进行展开说明。

一般使用自定义HPA又分为两种场景。

  1. 业务暴露指标,使用prometheus进行抓取后,通过Adapter将prometheus指标转换为k8s可识别的格式,使用HPA中的pod类型绑定需要扩缩容的Deployment
    资源对象与相关metrics做自定义自动扩缩容
  2. 指标暴露并不是需要被扩容的pod提供出来的,如根据节点的TCP连接数扩容pod,这种并不能使用HPA中的pod类型,需要使用external类型,
    将暴露的指标转换为Kubernetes apiservice后进行hpa指标与资源对象的绑定,进而实现hpa自定义扩缩容。

adapter的转换逻辑:

将prometheus的指标转换为k8s可识别方式,同时在adapter configmap文件中,一般需要提供名称空间,以及pod标签与k8s中的资源相绑定,
adapter以暴露api的方式将用于hpa自定义指标的扩缩容,hpa绑定的deployment将传递adapter中定义的标签与之匹配值,传递给adapter注册至apiservice中。
使用api 方式检验转换结果是否正常:

# 查询adapter中的external类型 kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/"|jq . # 查询adapter中的resource类型 kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/"|jq . # 查询adapter中的resource类型,并且指定adapter标签中对应的标签与值,这里查询名称空间为monitor,pods为所有,查询暴露指标为start_time_seconds,

注意:在monitor这个名称空间下一定需要存在指标暴露的pod,如果没有是查询不到结果。

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/monitor/pods/*/start_time_seconds" | jq .

一般指标暴露方并不是被扩容方的时候,没法直接将标签传递给adapter,

hpa的分类

# kubernetes apiserver提供了三种API用于监控相关的操作
resource metrics Api: 被设计用来给k8s核心组件提供监控指标,如kubectl top,需要安装组件如metrics-server
custom metrics Api: 被设计用来给HPA控制器提供指标
external metrics Api: 被设计用来通过外部指标扩容

# Prometheus-adpater支持以下三种API
- kubectl top node/pod 是resource metrics 指标。所以我们可以用prometheus-adapter 代替metrics-server
1. resource metrics API
2. custom metrics API
3. external metrics API

API Aggregation

在 Kubernetes 1.7 版本引入了聚合层,允许第三方应用程序通过将自己注册到kube-apiserver上,仍然通过API server的HTTP URL
对新的api进行操作和访问,为了实现这个机制,Kubernetes 在 kube-apiserver 服务中引入了一个API聚合层(API Aggregation Layer),
用于将扩展 API 的访问请求转发到用户服务的功能。
当你访问 apis/metrics.k8s.io/v1beta1,实际上访问到kube-aggregator的代理,kube-apiserver正式这个代理的后端;而metrics server
则是另一个后端。通过这种方式,就能很方便的扩展kubernetest api了。

1> 资源指标工作流程:hpa -> apiserver -> kube aggregation -> metrics-server -> kubelet(cadvisor)
2> 自定义资源指标工作流:hpa -> apiserver -> kube aggregation -> prometheus-adapter -> prometheus -> pods

prometheus监控k8s各种资源和各节点 k8s prometheus adapter_自定义

不同类型的hpa如何使用

HPA 通常会根据 type 从 aggregated APIs (metrics.k8s.io, custom.metrics.k8s.io, external.metrics.k8s.io)的资源路径上拉取 metrics

一、根据CPU/MEM

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: app1
  labels:
    app: test-app1
spec:
  minReplicas: 1
  maxReplicas: 3
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: test-app1
  metrics:
  - type: Resource
    resource:
      name: cpu
      targetAverageUtilization: 70
    resource:
      name: memory
      targetAverageUtilization: 70

二、根据自定义指标

0.创建custom api,与adapter绑定

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.custom.metrics.k8s.io
spec:
  service:
    name: prometheus-adapter
    namespace: monitoring
    port: 443
  group: custom.metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 998
  versionPriority: 10

1.adapter指标转换

前提为prometheus已经采集到指标

apiVersion: v1
kind: ConfigMap
metadata:
  name: adapter-config
  namespace: monitoring
data:
  config.yaml: |
# 使用rules将prometheus采集后转换为k8s可识别指标
    rules:
    # 查询prometheus指标开头为container,并且标签中container不是POD,以及namespacce,pod不为空
    - seriesQuery: '{__name__=~"^container_.*",container!="POD",namespace!="",pod!=""}'
      seriesFilters: []
      resources:
        overrides:
        # prometheus中的namespace标签对应k8s资源中的namespace以及pod,注意该label必须是一个真实的k8s的resource,
如k8s的pod名称将映射为pod resourcce,所以metrics中必须存在一个真实的resource名称,将其映射为k8s resourcce
          namespace:
            resource: namespace
          pod:
            resource: pod
      name:
        # 将查询出来的指标as改为别名,hpa可以与别名相绑定
        matches: ^container_(.*)_seconds_total$
        as: ""
        # 处理调用custom metrics api获取到的metrics的value,该值最终提供hpa进行扩缩容
      metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[1m])) by (<<.GroupBy>>)
    - seriesQuery: '{__name__=~"^container_.*",container!="POD",namespace!="",pod!=""}'
      seriesFilters:
      - isNot: ^container_.*_seconds_total$
      resources:
        overrides:
          namespace:
            resource: namespace
          pod:
            resource: pod
      name:
        matches: ^container_(.*)_total$
        as: ""
      metricsQuery: sum(rate(<<.Series>>{<<.LabelMatchers>>,container!="POD"}[1m])) by (<<.GroupBy>>)
# adapter中暴露cpu以及内存指标
    "resourceRules":
      "cpu":
        "containerLabel": "container"
        "containerQuery": |
          sum by (<<.GroupBy>>) (
            irate (
                container_cpu_usage_seconds_total{<<.LabelMatchers>>,pod!=""}[120s]
            )
          )
      "memory":
        "containerLabel": "container"
        "containerQuery": |
          sum by (<<.GroupBy>>) (
            container_memory_working_set_bytes{<<.LabelMatchers>>,pod!=""}
          )
        "nodeQuery": |
          sum by (<<.GroupBy>>) (
            node_memory_MemTotal_bytes{job="node-exporter",<<.LabelMatchers>>}
            -
            node_memory_MemAvailable_bytes{job="node-exporter",<<.LabelMatchers>>}
          )
          or sum by (<<.GroupBy>>) (
            windows_cs_physical_memory_bytes{job="windows-exporter",<<.LabelMatchers>>}
            -
            windows_memory_available_bytes{job="windows-exporter",<<.LabelMatchers>>}
          )
        "resources":
          "overrides":
            "node":
              "resource": "node"
            "namespace":
              "resource": "namespace"
            "pod":
              "resource": "pod"
      "window": "5m"

2.hpa扩容

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: ingress-hpa-test
  namespace: monitoring
spec:
  minReplicas: 1
  maxReplicas: 10
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: test-ingress-monitoring-ingress-nginx-controller
  metrics:
  - type: Pods
    pods:
      metric:
        name: container_cpu_system_seconds_total
      target:
        averageValue: '10'
        type: AverageValue

三、根据external指标扩缩容

如节点的信息使用node-exporter进行收集,这种指标实际与被扩容的业务pod并无关系,所以需要使用external类型将指标注册到api-resources。
如可以支持nginx的值扩容mysql。节点的tcp连接数扩容nginx-ingress-controller

0.创建external api,用于external类型hpa获取指标

apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  name: v1beta1.external.metrics.k8s.io
spec:
  service:
    name: prometheus-adapter
    namespace: monitoring
  group: external.metrics.k8s.io
  version: v1beta1
  insecureSkipTLSVerify: true
  groupPriorityMinimum: 100
  versionPriority: 100

1.adapter指标转换

通过adapter暴露指标node_sockstat_TCP_alloc

apiVersion: v1
data:
  config.yaml: |-
    externalRules:
	# 获取prometheus指标,指标为ingress="node-hpa"
    - seriesQuery: '{__name__="node_sockstat_TCP_alloc",ingress="node-hpa"}'
      # 指标查询语句
      metricsQuery: node_sockstat_TCP_alloc{ingress="node-hpa",instance=~"10.+"}
      resources:
        overrides:
        # 指定名称空间
          namespace: { resource: "namespace" }
      name:
        matches: "node_sockstat_TCP_alloc"
        as: "node_sockstat_tcp_alloc"
kind: ConfigMap
metadata:
  labels:
    app.kubernetes.io/component: metrics-adapter
    app.kubernetes.io/name: prometheus-adapter
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 0.8.4
  name: adapter-config
  namespace: monitoring

2.hpa自动扩缩容

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: ingress-hpa-test
  namespace: monitoring
spec:
  minReplicas: 1
  maxReplicas: 2
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: test-ingress-monitoring-ingress-nginx-controller
  metrics:
  - type: External
    external:
      metric:
        name: node_sockstat_tcp_alloc    # 指定adapter的指标名
        selector:
          matchLabels:
            job: "node-exporter-hpa"   
            ingress: "node-hpa"
      target:
        type: AverageValue
        averageValue: 60

3.验证adapter的指标

root@management:/opt/kubernetes/prometheus/k8s-dev/manifests/configuration-files# kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/monitoring/node_sockstat_tcp_alloc"|jq .
{
  "kind": "ExternalMetricValueList",
  "apiVersion": "external.metrics.k8s.io/v1beta1",
  "metadata": {},
  "items": [
    {
      "metricName": "node_sockstat_tcp_alloc",
      "metricLabels": {
        "__name__": "node_sockstat_TCP_alloc",
        "beta_kubernetes_io_arch": "amd64",
        "beta_kubernetes_io_instance_type": "S4.LARGE8",
        "beta_kubernetes_io_os": "linux",
        "cloud_tencent_com_auto_scaling_group_id": "asg-9j80v1i5",
        "cloud_tencent_com_node_instance_id": "ins-7dznkner",
        "failure_domain_beta_kubernetes_io_region": "sh",
        "failure_domain_beta_kubernetes_io_zone": "200004",
        "ingress": "node-hpa",
        "instance": "10.4.64.15",
        "job": "node-exporter",
        "kubernetes_io_arch": "amd64",
        "kubernetes_io_hostname": "10.4.64.15",
        "kubernetes_io_os": "linux",
        "node": "10.4.64.15",
        "node_kubernetes_io_instance_type": "S4.LARGE8",
        "tke_cloud_tencent_com_nodepool_id": "np-p03bj711",
        "topology_kubernetes_io_region": "sh",
        "topology_kubernetes_io_zone": "200004"
      },
      "timestamp": "2022-11-18T10:09:38Z",
      "value": "96"
    },
    {
      "metricName": "node_sockstat_tcp_alloc",
      "metricLabels": {
        "__name__": "node_sockstat_TCP_alloc",
        "beta_kubernetes_io_arch": "amd64",
        "beta_kubernetes_io_instance_type": "S4.LARGE8",
        "beta_kubernetes_io_os": "linux",
        "cloud_tencent_com_auto_scaling_group_id": "asg-9j80v1i5",
        "cloud_tencent_com_node_instance_id": "ins-eqd7lizl",
        "failure_domain_beta_kubernetes_io_region": "sh",
        "failure_domain_beta_kubernetes_io_zone": "200005",
        "ingress": "node-hpa",
        "instance": "10.4.80.19",
        "job": "node-exporter",
        "kubernetes_io_arch": "amd64",
        "kubernetes_io_hostname": "10.4.80.19",
        "kubernetes_io_os": "linux",
        "node": "10.4.80.19",
        "node_kubernetes_io_instance_type": "S4.LARGE8",
        "tke_cloud_tencent_com_nodepool_id": "np-p03bj711",
        "topology_kubernetes_io_region": "sh",
        "topology_kubernetes_io_zone": "200005"
      },
      "timestamp": "2022-11-18T10:09:38Z",
      "value": "71"
    }
  ]
}

坑点

  1. 在使用external类型hpa时,获取自定义metrics,发现hpa取值异常,原本定位以为adapter异常,但是看api获取值却不会出现异常情况,
    最终定位为hpa值显示问题。(84500m实际就是84.5,因为小数问题,显示和其它数值不一样,同样这也是一种计数方式)

root@management:~# k8s-v6 get hpa -n monitoring

ingress-hpa-test Deployment/test-ingress-monitoring-ingress-nginx-controller 84500m/60 (avg) 1 2 2 22h
ingress-hpa-test Deployment/test-ingress-monitoring-ingress-nginx-controller 83/60 (avg) 1 2 2 22h

  1. 原本adapter一直在尝试rule的方式,并没有看过相关external的文章,导致踩坑不少,不过却学习到了一种标签聚合的方式,在两个指标中找到共同点,
    可以取到指标中有用的标签赋予给其他指标所用。

# on中填写共同指标 # group_left 中取右边的指标标签赋予给左边的指标 kube_pod_info{} * on(pod) group_left(app,component) go_memstats_stack_sys_bytes{app!='',pod!='',component!=''}