什么是Operator
K8S(Kubernetes) 是使用 Linux 容器部署应用程序的平台。它最初由 Google 开发,用于在计算机集群上部署 Web 应用程序,现在是开源代码。Kubernetes 的开发人员允许从非常早期的版本扩展 Kubernetes 的 API,如今 Kubernetes 可以部署的不仅仅是 Linux 容器。它可以使用 KubeVirt、FreeBSD Jail 部署虚拟机,甚至可以使用 Cluster API 部署整个 Kubernetes 集群。
在 Kubernetes 的早期,开发人员就意识到允许扩展 Kubernetes 是成功采用的关键。版本 1.7 增加了定义 ThirdPartyResource 的功能,允许扩展 Kubernetes。这些后来在版本 1.8 及更高版本中被命名为 CustomResourceDefinition。
虽然 Golang 是 Kubernetes 生态系统中的主要语言,但没有什么能阻止你用其他语言编写组件,只要它们满足 API。例如,可以将用 Go 编写的 runc 替换为用 C 语言编写的 crun,因为两者都实现了 OCI 容器运行时规范。或者你可以用 Rust 编写的 Krustlet 替换 Kubelet,因为它实现了 Kubelet API。
Kubernetes 可以调度的不仅仅是容器,并且您可以使用自己的自定义资源定义扩展 API。
Operator是特定于域的自定义资源和控制器程序的集合,用于对集群或这些特定资源的更改做出反应。例如,操作员可以监视 Pod 或 Deployment 上的某些注释,并在检测到这些注释时操作集群内部或外部的对象。例如,这就是 CertManager 或 ExternalDNS 的工作方式。具体来说,当您在 Ingress 上创建注释时,会有一个操作链,该操作在集群内部和外部触发。在此过程中,将向 LetEncrypt 发送证书请求,如果身份验证成功,则会创建一个包含证书的新密钥,并使用该密钥通过 HTTPS 保护对 Ingress 的访问。这里的关键信息是:操作员可以观察内置或自定义的 Kubernetes 对象,并对对象(可以是集群的外部或内部)采取行动,使它们达到所需的状态(参见图 1)。
获取使用 Minikube 运行的 Kubernetes 集群
要开发 Kubernetes Operator,您需要访问工作集群。如果你没有,使用 minikube 很容易生成一个集群。
$ minikube start -p entwickler.de --driver docker
$ minikube start -p entwickler.de --driver docker
😄 [entwickler.de] minikube v1.25.2 on Gentoo 2.8
✨ Using the docker driver based on user configuration
👍 Starting control plane node entwickler.de in cluster entwickler.de
🚜 Pulling base image ...
🔥 Creating docker container (CPUs=2, Memory=2848MB) ...
🐳 Preparing Kubernetes v1.23.3 on Docker 20.10.12 ...
▪ kubelet.housekeeping-interval=5m
▪ Generating certificates and keys ...
▪ Booting up control plane ...
▪ Configuring RBAC rules ...
🔎 Verifying Kubernetes components...
▪ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟 Enabled addons: storage-provisioner, default-storageclass
🏄 Done! kubectl is now configured to use "entwickler.de" cluster and "default" namespace by default
现在,您的集群已经运行,您可以通过执行以下操作来验证它是否正常工作:
$ kubectl get nodes
编写 Kubernetes Operator
要编写 Kubernetes Operator,我们可以使用官方的 Python 客户端或任何其他替代客户端,或者任何 Python 库都可以通过 HTTP 与 kube-api-server 进行通信。对于本文,将使用 pykube-ng,它自称为 Kubernetes API 的轻量级客户端库。我喜欢使用它,因为它感觉比官方的 Python Client for Kubernetes 更像 pythonic。
我们首先创建一个 CustomResourceDefinition:
$ cat k8s/blackadder-v1alpha1.yml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: chaosagents.blackadder.io
spec:
group: blackadder.io
scope: Cluster # a CRD can also be Namespaced
names:
plural: chaosagents
singular: chaosagent
kind: ChaosAgent
shortNames:
- ca
versions:
- name: v1alpha1 # you can serve multiple versions e.g v1beta2 or v1alpha1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
tantrumMode:
type: boolean
podTolerance:
type: integer
additionalPrinterColumns:
- name: Tantrum
type: boolean
description: Kills Pods randomly
jsonPath: .spec.tantrumMode
- name: Tolerance
type: integer
description: Total number of Pod to tolerate before randomly killing Pods
jsonPath: .spec.podTolerance
应用此 CRD 清单:
$ kubectl apply -f k8s/blackadder-v1alpha1.yml
customresourcedefinition.apiextensions.k8s.io/chaosagents.blackadder.io created
创建一个chaos agent示例,其中包含:
$ cat k8s/edmund.yml
apiVersion: blackadder.io/v1alpha1
kind: ChaosAgent
metadata:
name: princeedmund
spec:
tantrumMode: true
podTolerance: 10
应用此清单并查看
$ kubectl apply -f k8s/edmund.yml
chaosagent.blackadder.io/princeedmund1 created
$ kubectl get chaosagents.blackadder.io
NAME TANTRUM TOLERANCE
princeedmund true 10
向 CustomResourceDefintion 添加更多切换、开关和定义
$ kubectl get chaosagents.blackadder.io
NAME TANTRUM TOLERANCE CANCER IPSUM EAGERNESS PAUSE EXCLUDED
princeedmund true 10 false true 20 30 ["kube-system"]
注意,CustomResourceDefinition 可以执行各种类型输入验证,例如,我们可以将 eagerness 定义为从 1 到 100 的整数:
$ kubectl apply -f k8s/edmund-v1beta1.yml
The ChaosAgent "princeedmund" is invalid: spec.eagerness: Invalid value: 200: spec.eagerness in body should be less than or equal to 100
添加控制器逻辑
现在,我们创建了一个存储在 Kubernetes 中并由* kube-api-server* 提供服务的新资源。因此,我们现在可以创建控制器逻辑。我们首先用伪代码起草一个算法
client = connect_to_kubernetes()
# retrieves our agent configuration from the kube-api-server
chaos_agent = client.get_chaos_agent()
while True:
pods = client.list_pods(exclude_namespaces)
deployments = client.list_deployments(exclude_namespaces)
namespaces = client.list_configmaps(exclude_namespaces)
if chaos_agent.tantrum:
randomly_kill_pods(pods, chaos_agent.tolerance, chaos_agent.eagerness)
if chaos_agent.cancer:
randomly_scale_deployments(deployments, chaos_agent.eagerness)
if chaos_agent.ipsum:
randomly_write_configmaps(configmaps, chaos_agent.eagerness)
time.sleep(chaos_agent.pause)
我们需要做的第一件事,就是获取一个 Kubernetes 客户端,这样我们就可以和 kube-api-server 通信了:
import pykube
# automatically detect load in cluster token from
# /run/secrets/kubernetes.io/serviceaccount/token in cluster or
# from ~/.kube/config
config = pykube.KubeConfig.from_env()
api = pykube.HTTPClient(config)
使用我们刚刚创建的客户端,可以轻松列出存储在 Kubernetes 数据库中的对象。首先,让我们创建几个 Pod 和一个 Deployment:
$ kubectl run --image docker.io/nginx test -n kube-public
$ kubectl run --image docker.io/nginx test -n default
$ kubectl create deployment my-dep --image=nginx --replicas=3
使用交互式 Python 控制台列出它们,这对于在原型设计时发现 API 非常方便:
$ python -m pykube
Pykube v22.7.0, loaded "/home/oznt/.kube/config" with context "etwickler.de".
Example commands:
[d.name for d in Deployment.objects(api)] # get names of deployments in default namespace
list(DaemonSet.objects(api, namespace='kube-system')) # list daemonsets in "kube-system"
Pod.objects(api).get_by_name('mypod').labels # labels of pod "mypod"
Use Ctrl-D to exit
>>> [f"{p.namespace}/{p.name}" for p in Pod.objects(api, namespace=pykube.all)
if p.namespace not in ["kube-system"]]
['default/my-dep-84885b44-29vg7', 'default/my-dep-84885b44-l5nkw',
'default/my-dep-84885b44-p2gcd', 'default/test', 'kube-public/test']
pykube-ng 没有预定义的对象来列出 ChaosAgents。使用对象工厂,我们可以创建这样的对象:
现在,我们已经具备了创建 ChaosAgent 的完整控制器所需的一切:
import time
import pykube
import munch
config = pykube.KubeConfig.from_env()
api = pykube.HTTPClient(config)
ChaosAgent = pykube.object_factory(api, "blackadder.io/v1beta1", "ChaosAgent")
# retrieves our agent configuration from the kube-api-server
agent = ChaosAgent().objects(api, namespace=pykube.all)
agent.config = munch.munchify(agent.obj["spec"])
exclude_namespaces = agent.config.excludedNamespaces
def randomly_kill_pods(pods, tolerance, eagerness):
pass
def randomly_scale_deployments(deployments, eagerness):
pass
def randomly_write_configmaps(configmaps, eagerness):
pass
while True:
pods = api.list_pods(exclude_namespaces)
deployments = api.list_deployments(exclude_namespaces)
configmaps = api.list_configmaps(exclude_namespaces)
if agent.config.tantrumMode:
randomly_kill_pods(pods, agent.config.tolerance, agent.config.eagerness)
if agent.config.cancerMode:
randomly_scale_deployments(deployments, agent.config.eagerness)
if agent.config.ipsumMode:
randomly_write_configmaps(configmaps, agent.config.eagerness)
time.sleep(agent.config.pauseDuration)
pod 删除
def randomly_kill_pods(pods, tolerance, eagerness):
if len(pods) < tolerance:
return
for p in pods:
if random.randint(0, 100) < eagerness:
p.delete()
print(f"Deleted {p.namespace}/{p.name}")
chaosagent函数部署扩展
def randomly_scale_deployments(deployments, eagerness):
for d in deployments:
if random.randint(0, 100) < eagerness:
while True:
try:
d.replicas = if d.replicas < 128:
d.replicas = min(d.replicas * 2, 128)
d.update()
print(f"scaled {d.namespace}/{d.name} to {d.replicas}")
break
except (requests.exceptions.HTTPError, pykube.exceptions.HTTPError):
print(f"error scaling {d.namespace}/{d.name} to {d.replicas}")
d.reload()
continue
写入 Lorem Ipsum 片段的函数
def randomly_write_configmaps(configmaps, eagerness):
for cm in configmaps:
print(f"Checking {cm.namespace}/{cm.name}")
if cm.obj.get("immutable"):
continue
if random.randint(0, 100) < eagerness:
for k, v in cm.obj["data"].items():
cm.obj["data"][k] = lorem.paragraph()
print(f"Lorem Ipsum in {cm.namespace}/{cm.name}")
这样,控制器代码就完成了。
完整代码
import random
import sys
import time
import lorem
import munch
import pykube
import requests
from pykube import Pod, Deployment, ConfigMap
def list_objects(self, k8s_obj, exclude_namespaces):
exclude_namespaces = ",".join("metadata.namespace!=" + ns
for ns in exclude_namespaces)
return list(
k8s_obj.objects(api).filter(namespace=pykube.all,
field_selector=exclude_namespaces
))
config = pykube.KubeConfig.from_env()
pykube.HTTPClient.list_objects = list_objects
api = pykube.HTTPClient(config)
ChaosAgent = pykube.object_factory(api, "blackadder.io/v1beta1", "ChaosAgent")
# retrieves our agent configuraton from the kube-api-server
agent = list(ChaosAgent.objects(api, namespace=pykube.all))[0]
agent.config = munch.munchify(agent.obj["spec"])
exclude_namespaces = agent.config.excludedNamespaces
def randomly_kill_pods(pods, tolerance, eagerness):
if len(pods) < tolerance:
return
for p in pods:
if random.randint(0, 100) < eagerness:
p.delete()
print(f"Deleted {p.namespace}/{p.name}",)
def randomly_scale_deployments(deployments, eagerness):
for d in deployments:
if random.randint(0, 100) < eagerness:
while True:
try:
if d.replicas < 128:
d.replicas = min(d.replicas * 2, 128)
d.update()
print(f"scaled {d.namespace}/{d.name} to {d.replicas}",)
break
except (requests.exceptions.HTTPError, pykube.exceptions.HTTPError):
print(
f"error scaling {d.namespace}/{d.name} to {d.replicas}",)
d.reload()
continue
def randomly_write_configmaps(configmaps, eagerness):
for cm in configmaps:
print(f"Checking {cm.namespace}/{cm.name}")
if cm.obj.get("immutable"):
continue
if random.randint(0, 100) < eagerness:
for k, v in cm.obj["data"].items():
cm.obj["data"][k] = lorem.paragraph()
print(f"Lorem Impsum in {cm.namespace}/{cm.name}",)
def main():
while True:
pods = api.list_objects(Pod, exclude_namespaces)
deployments = api.list_objects(Deployment, exclude_namespaces)
configmaps = api.list_objects(ConfigMap, exclude_namespaces)
if agent.config.tantrumMode:
randomly_kill_pods(pods,
agent.config.podTolerance,
agent.config.eagerness)
if agent.config.cancerMode:
randomly_scale_deployments(deployments,
agent.config.eagerness)
if agent.config.ipsumMode:
randomly_write_configmaps(configmaps,
agent.config.eagerness)
time.sleep(agent.config.pauseDuration)
if __name__ == "__main__":
print("This is the blackadder version 0.1.1")
print("Ready to start a havoc in your cluster")
main()
输出结果
$ python controller.py
Deleted default/my-dep-84885b44-bjg4t
Deleted default/my-dep-84885b44-ljvdn
...
Lorem Impsum in kube-node-lease/kube-root-ca.crt
...
scaled default/my-dep to 4
这适用于带有 minikube 为您创建的管理配置文件的本地 shell。将控制器部署到集群时,您需要授予控制器列出、修补和删除 Pod、Deployment 和 ConfigMap 对象的权限。
Dockerfile 使用多阶段构建和 pipenv 来管理依赖项安装:
FROM docker.io/python:3.10 AS builder
RUN pip install --user pipenv
# Tell pipenv to create venv in the current directory
ENV PIPENV_VENV_IN_PROJECT=1
ADD Pipfile.lock Pipfile /usr/src/
WORKDIR /usr/src
RUN /root/.local/bin/pipenv sync
RUN /usr/src/.venv/bin/python3 -c "import pykube; print(pykube.__version__)"
FROM docker.io/python:3.10 AS runtime
RUN mkdir -v /usr/src/venv
COPY --from=builder /usr/src/.venv/ /usr/src/venv/
RUN /usr/src/venv/bin/python3 -c "import pykube; print(pykube.__version__)"
WORKDIR /usr/src/
COPY controller.py .
CMD ["./venv/bin/python", "-u", "controller.py"]
构建映像并将其推送到公共或私有存储库:
$ docker build -t oz123/blackadder:0.1 .
Sending build context to Docker daemon 166.4kB
Step 1/13 : FROM docker.io/python:3.10 AS builder
3.10: Pulling from library/python
1339eaac5b67: Pull complete
4c78fa1b9799: Pull complete
$ docker push oz123/blackadder:0.1
The push refers to repository [docker.io/oz123/blackadder]
2ce87cdce319: Pushed
645d7db6379e: Pushing [==================================================>] 17.26MB
3c924eba81b8: Pushed
...
为要部署在以下位置的控制器创建一个命名空间:
$ kubectl create namespace chaos-operator
namespace/chaos-operator created
在部署控制器之前,我们应从监视的命名空间列表中排除该命名空间:
现在,我们可以为chaos controller创建部署:
$ kubectl create deployment blackadder –image=oz123/blackadder:0.1 --replicas=1 \
-n chaos-operator
deployment.apps/blackadder created
当您查看容器的日志时,您会看到它崩溃了:
$ kubectl logs -n chaos-operator blackadder-65bc54f7f9-v56bp
Traceback (most recent call last):
File "/usr/src/controller.py", line 35, in
agent = list(ChaosAgent.objects(api, namespace=pykube.all))[0]
File "/usr/src/venv/lib/python3.10/site-packages/pykube/query.py", line 195, in __iter__
return iter(self.query_cache["objects"])
File "/usr/src/venv/lib/python3.10/site-packages/pykube/query.py", line 185, in query_cache
cache["response"] = self.execute().json()
File "/usr/src/venv/lib/python3.10/site-packages/pykube/query.py", line 160, in execute
r.raise_for_status()
File "/usr/src/venv/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
Requests.exceptions.HTTPError :
403 Client Error: Forbidden for url: https://10.96.0.1:443/apis/blackadder.io/v1beta1/chaosagents
因为命名空间的服务帐户无权列出 ChaosAgent 对象。
要解决这个问题,我们需要定义一个 ClusterRole 和 ClusterRoleBinding,并将它们分配给运行控制器的用户。
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: blackadder
rules:
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "patch"]
- apiGroups: ["blackadder.io"]
resources: ["chaosagents"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "delete"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "patch"]
ClusterRoleBinding 由以下命令定义:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: blackadder
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: blackadder
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: system:serviceaccount:chaos-operator:default
应用这些清单:
$ kubectl apply -f k8s/clusterrole.yml
clusterrole.rbac.authorization.k8s.io/blackadder created
$ kubectl apply -f k8s/clusterrolebinding.yml
clusterrolebinding.rbac.authorization.k8s.io/blackadder created
重新启动 Pod 后,您将看到它正在运行。
注意,在controller的最终版本中,while True 循环被移动到 main 函数中,因此代码如下所示:
# this is docker label oz123/blackadder:0.1.1
def main():
while True:
pods = api.list_objects(Pod, exclude_namespaces)
deployments = api.list_objects(Deployment, exclude_namespaces)
configmaps = api.list_objects(ConfigMap, exclude_namespaces)
...
if __name__ == "__main__":
print("This is the blackadder version 0.1")
print("Ready to start a havoc in your cluster")
main()
此时查看控制器日志,已正常:
$ kubectl logs -n chaos-operator blackadder-7695b89559-8q4qp
This is the blackadder version 0.1.1
Ready to start a havoc in your cluster
Checking default/kube-root-ca.crt
Checking kube-node-lease/kube-root-ca.crt
Checking kube-public/cluster-info
...