项目准备和构建过程

典型的 CI/CD 过程 - DevOps

Service Mesh - Istio实战篇(上)

GitOps 持续交付过程

  • GitOps:一种集群管理和应用分发的持续交付方式
  • GitOps与典型的CI/CD不同,其中最大的不同点在于使用 Git 作为信任源,保存声明式基础架构(declarative infrastructure)和应用程序
  • 以 Git 作为交付过程(pipeline)的中心,配置文件如k8s的yaml文件都保存在git进行管理
  • 开发者只需要通过 pull request 完成应用的部署和运维任务,不需要去使用别的一些CI/CD工具
  • 优势:提高生产率、改进开发体验、一致性和标准化、安全
    Service Mesh - Istio实战篇(上)

push vs pull 流程(pipeline):
Service Mesh - Istio实战篇(上)

使用 Flux 构建和发布应用

Flux 官方定义:

  • The GitOps operator for Kubernetes
  • 自动化部署工具(基于 GitOps)
  • 特性:
    • 自动同步、自动部署
    • 声明式
    • 基于代码(Pull request),而不是容器
      Service Mesh - Istio实战篇(上)

准备工作

首先,我们需要准备一个Kubernetes集群:

以及在k8s中安装好 Istio 环境:

如下图所示,我们要部署一个由两个服务组成的Mesh,除此之外还会有一个网关和一个外部服务,可以说是精简且完整了:
Service Mesh - Istio实战篇(上)

  • 在调用链路上可以看出 sleep 是作为客户端的角色,httpbin 作为服务端的角色

准备一个 Git 仓库:
Service Mesh - Istio实战篇(上)

安装 Flux

官方文档:

首先,安装 fluxctl 命令工具,到Github仓库上下载可执行文件即可。然后将其放到 /usr/bin 目录下,并赋予可执行权限:

[root@m1 /usr/local/src]# mv fluxctl_linux_amd64 /usr/bin/fluxctl
[root@m1 ~]# chmod a+x /usr/bin/fluxctl 
[root@m1 ~]# fluxctl version
1.21.0
[root@m1 ~]# 

给 Flux 创建一个命名空间,然后将 Flux Operator 部署到k8s集群:

[root@m1 ~]# kubectl create ns flux
namespace/flux created
[root@m1 ~]# git clone https://github.com/fluxcd/flux.git
[root@m1 ~]# cd flux/

在部署 Flux 之前,需要先修改几个Git相关的配置,修改为你Git仓库的用户名、邮箱、url等:

[root@m1 ~/flux]# vim deploy/flux-deployment.yaml  # 修改如下几个配置项
...
        # Replace the following URL to change the Git repository used by Flux.
        # HTTP basic auth credentials can be supplied using environment variables:
        # https://$(GIT_AUTHUSER):$(GIT_AUTHKEY)@github.com/user/repository.git
        - --git-url=git@github.com:fluxcd/flux-get-started
        - --git-branch=master
        # Include this if you want to restrict the manifests considered by flux
        # to those under the following relative paths in the git repository
        # - --git-path=subdir1,subdir2
        - --git-label=flux-sync
        - --git-user=Flux automation
        - --git-email=flux@example.com

修改完成后,进行部署:

[root@m1 ~/flux]# kubectl apply -f deploy
[root@m1 ~/flux]# kubectl get all -n flux
NAME                            READY   STATUS    RESTARTS   AGE
pod/flux-65479fb87-k5zxb        1/1     Running   0          7m20s
pod/memcached-c86cd995d-5gl5p   1/1     Running   0          44m

NAME                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)     AGE
service/memcached   ClusterIP   10.106.229.44   <none>        11211/TCP   44m

NAME                        READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/flux        1/1     1            1           44m
deployment.apps/memcached   1/1     1            1           44m

NAME                                  DESIRED   CURRENT   READY   AGE
replicaset.apps/flux-65479fb87        1         1         1       7m20s
replicaset.apps/memcached-c86cd995d   1         1         1       44m
[root@m1 ~]# 

除了以上方式,也可以使用命令行部署 Flux:

fluxctl install \
--git-user=xxx \
--git-email=xxx@xxx \
--git-url=git@github.com:xxx/smdemo \
--namespace=flux | kubectl apply -f -

由于使用的是私有仓库,我们还需要一些额外的操作,需要将其主机密钥添加到Flux daemon容器中的 ~/.ssh/known_hosts 文件中。具体步骤如下:

[root@m1 ~]# kubectl exec -n flux flux-65479fb87-k5zxb -ti -- \
    env GITHOST="gitee.com" GITREPO="git@gitee.com:demo_focus/service-mesh-demo.git" PS1="container$ " /bin/sh
container$ ssh-keyscan $GITHOST >> ~/.ssh/known_hosts   # 添加host key
container$ git clone $GITREPO   # 测试确保能正常对仓库进行克隆
Cloning into 'service-mesh-demo'...
remote: Enumerating objects: 10, done.
remote: Counting objects: 100% (10/10), done.
remote: Compressing objects: 100% (10/10), done.
remote: Total 10 (delta 2), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (10/10), done.
Resolving deltas: 100% (2/2), done.
container$ 

完成 Flux 的部署后,我们需要将 Flux 生成的 deploy key 添加到 git 仓库中(read/write 权限),获取 deploy key 的命令如下:

[root@m1 ~]# fluxctl identity --k8s-fwd-ns flux
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDsyfN+x4jen+Ikpff8LszXLFTwXSQviFxCrIx7uMy7LJM5uUEsDdFs/DZL1g9h/YnkfLJlFrxOCJ+tuqPrXuj3ceEFfal4T3YWiDwf1RsGJvJd6ED5APjsxyu5gkj9LvkOB8OlYwPlS8Pygv997n93gtH7rFbocK5EQpbhhBlue3Or2ufI/KBxDCx6xLaH9U/16EEi+BDVSsCetGIQI+TSRqqpN30+Y8paS6iCYajKTubKv7x44WaVFgSDT9Y/OycUq1LupJoVoD8/5Y2leUMaF9dhMbQgoc8zjh8q2HF2n97mAvgYWJosjeIcAKS82C0zPlPupPevNedAhhEb82svPWh7BI4N4XziA06ypAEmfEz3JuUTTeABpF2hEoV4UEagkSyS8T3xhfdjigVcKiBW5AqRsRyx+ffW4WREHjARSC8CKl0Oj00a9FOGoNsDKkFuTbJePMcGdgvjs61UlgUUjdQFfHoZz2UVo2OEynnCpY7hj5SrEudkujRon4HEhJE= root@flux-7f5f7776df-l65lx
[root@m1 ~]# 

复制密钥内容,到Git仓库上进行添加:
Service Mesh - Istio实战篇(上)

部署应用

为应用创建一个单独的命名空间,并且为其添加 istio-injection=enabled 标签,让 Istio 可以注入代理:

[root@m1 ~]# kubectl create ns demo
namespace/demo created
[root@m1 ~]# kubectl label namespace demo istio-injection=enabled
namespace/demo labeled
[root@m1 ~]# 

将Git仓库克隆到本地,在仓库下创建 config 目录:

[root@m1 ~]# git clone git@gitee.com:demo_focus/service-mesh-demo.git
[root@m1 ~]# cd service-mesh-demo/
[root@m1 ~/service-mesh-demo]# mkdir config

在该目录下创建服务的配置文件:

[root@m1 ~/service-mesh-demo]# vim config/httpbin.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: httpbin
  namespace: demo
---
apiVersion: v1
kind: Service
metadata:
  name: httpbin
  namespace: demo
  labels:
    app: httpbin
spec:
  ports:
  - name: http
    port: 8000
    targetPort: 80
  selector:
    app: httpbin
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpbin
  namespace: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpbin
      version: v1
  template:
    metadata:
      labels:
        app: httpbin
        version: v1
    spec:
      serviceAccountName: httpbin
      containers:
      - image: docker.io/kennethreitz/httpbin
        imagePullPolicy: IfNotPresent
        name: httpbin
        ports:
        - containerPort: 80

[root@m1 ~/service-mesh-demo]# vim config/sleep.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: sleep
  namespace: demo
---
apiVersion: v1
kind: Service
metadata:
  name: sleep
  namespace: demo
  labels:
    app: sleep
spec:
  ports:
  - port: 80
    name: http
  selector:
    app: sleep
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sleep
  namespace: demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sleep
  template:
    metadata:
      labels:
        app: sleep
    spec:
      serviceAccountName: sleep
      containers:
      - name: sleep
        image: governmentpaas/curl-ssl
        command: ["/bin/sleep", "3650d"]
        imagePullPolicy: IfNotPresent
        volumeMounts:
        - mountPath: /etc/sleep/tls
          name: secret-volume
      volumes:
      - name: secret-volume
        secret:
          secretName: sleep-secret
          optional: true

将配置文件提交到远程仓库,更新 git repo:

[root@m1 ~/service-mesh-demo]# git add .
[root@m1 ~/service-mesh-demo]# git commit -m "commit yaml"
[root@m1 ~/service-mesh-demo]# git push origin master

执行如下命令,让 Flux 去同步仓库的变更,并进行自动部署:

[root@m1 ~]# fluxctl sync --k8s-fwd-ns flux
Synchronizing with ssh://git@gitee.com/demo_focus/service-mesh-demo
Revision of master to apply is 49bc37e
Waiting for 49bc37e to be applied ...
Done.
[root@m1 ~]# 
  • 默认情况下,Flux 会每隔5分钟自动进行 sync,并不需要我们手动去操作

此时查看 demo 命名空间下的资源,可以看到 Flux 自动帮我们部署了所有服务:

[root@m1 ~]# kubectl get all -n demo
NAME                           READY   STATUS    RESTARTS   AGE
pod/httpbin-74fb669cc6-v9lc5   2/2     Running   0          36s
pod/sleep-854565cb79-mcmnb     2/2     Running   0          40s

NAME              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/httpbin   ClusterIP   10.105.17.57    <none>        8000/TCP   36s
service/sleep     ClusterIP   10.103.14.114   <none>        80/TCP     40s

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/httpbin   1/1     1            1           36s
deployment.apps/sleep     1/1     1            1           40s

NAME                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/httpbin-74fb669cc6   1         1         1       36s
replicaset.apps/sleep-854565cb79     1         1         1       40s
[root@m1 ~]# 

测试服务之间的连通性是否正常:

[root@m1 ~]# kubectl exec -it -n demo sleep-854565cb79-mcmnb -c sleep -- curl http://httpbin.demo:8000/ip
{
  "origin": "127.0.0.1"
}
[root@m1 ~]# 

实现自动化灰度发布

灰度发布过程

Service Mesh - Istio实战篇(上)

自动化灰度发布 - Flagger

灰度发布是个一点点迁移流量进行滚动升级的过程,因此如果通过人工手动来操作这个过程显然效率低下、容易出错,所以我们就需要使用自动灰度发布工具,例如 Flagger:

  • Flagger:Weaveworks开源的自动灰度发布工具
  • 支持多种 Service Mesh 产品:Istio、Linkerd、App AWS Mesh
  • 指标监控灰度发布状态
  • 通知(slack、Microsoft team)
    Service Mesh - Istio实战篇(上)

Flagger 工作流程:
Service Mesh - Istio实战篇(上)

Flagger 安装

官方文档:

添加 Flagger 的 Helm 仓库:

[root@m1 ~]# helm repo add flagger https://flagger.app
"flagger" has been added to your repositories
[root@m1 ~]# 

创建 Flagger 的 crd:

[root@m1 ~]# kubectl apply -f https://raw.githubusercontent.com/fluxcd/flagger/main/artifacts/flagger/crd.yaml
[root@m1 ~]# kubectl get crd |grep flagger
alertproviders.flagger.app                            2020-12-23T14:40:00Z
canaries.flagger.app                                  2020-12-23T14:40:00Z
metrictemplates.flagger.app                           2020-12-23T14:40:00Z
[root@m1 ~]# 

通过 Helm 把 Flagger 部署到 istio-system 命名空间下:

[root@m1 ~]# helm upgrade -i flagger flagger/flagger \
--namespace=istio-system \
--set crd.create=false \
--set meshProvider=istio \
--set metricsServer=http://prometheus.istio-system:9090

添加一个slack的hooks到flagger里,可以让flagger发送通知到slack频道里,这一步是可选的:

[root@m1 ~]# helm upgrade -i flagger flagger/flagger \
--namespace=istio-system \
--set crd.create=false \
--set slack.url=https://hooks.slack.com/services/xxxxxx \
--set slack.channel=general \
--set slack.user=flagger

除了slack外,我们还可以为flagger配置一个grafana,该grafana集成了一个canary dashboard,可以方便我们去查看灰度发布的进度:

[root@m1 ~]# helm upgrade -i flagger-grafana flagger/grafana \
--namespace=istio-system \
--set url=http://prometheus.istio-system:9090 \
--set user=admin \
--set password=admin

以上操作完成后,确认下flagger的部署情况:

[root@m1 ~]# kubectl get pods -n istio-system 
NAME                                    READY   STATUS    RESTARTS   AGE
flagger-b68b578b-5f8bh                  1/1     Running   0          7m50s
flagger-grafana-77b8c8df65-7vv89        1/1     Running   0          71s
...

为网格创建一个ingress网关:

[root@m1 ~]# kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: public-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - "*"
EOF

另外,我们还可以部署一个负载测试工具,当然这也是可选的:

[root@m1 ~]# kubectl create ns test
namespace/test created
[root@m1 ~]# kubectl apply -k https://github.com/fluxcd/flagger/tree/main/kustomize/tester
[root@m1 ~]# kubectl get pods -n test
NAME                                  READY   STATUS    RESTARTS   AGE
flagger-loadtester-64695f854f-5hsmg   1/1     Running   0          114s
[root@m1 ~]# 

如果上面这种方式比较慢的话也可以将仓库克隆下来,然后对 tester 进行部署:

[root@m1 ~]# cd /usr/local/src
[root@m1 /usr/local/src]# git clone https://github.com/fluxcd/flagger.git
[root@m1 /usr/local/src]# kubectl apply -k flagger/kustomize/tester/

灰度发布配置

为 httpbin 服务配置HAP,让它可以支持动态伸缩,这也是可选的,但通常建议将HAP配置上:

[root@m1 ~]# kubectl apply -n demo -f - <<EOF
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
  name: httpbin
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: httpbin
  minReplicas: 2
  maxReplicas: 4
  metrics:
  - type: Resource
    resource:
      name: cpu
      # scale up if usage is above
      # 99% of the requested CPU (100m)
      targetAverageUtilization: 99
EOF

创建用于验证灰度发布的 metric ,falgger会根据该指标逐渐迁移流量:

[root@m1 ~]# kubectl apply -f - <<EOF
apiVersion: flagger.app/v1beta1
kind: MetricTemplate
metadata:
  name: latency
  namespace: istio-system
spec:
  provider:
    type: prometheus
    address: http://prometheus.istio-system:9090
  query: |
    histogram_quantile(
        0.99,
        sum(
            rate(
                istio_request_duration_milliseconds_bucket{
                    reporter="destination",
                    destination_workload_namespace="{{ namespace }}",
                    destination_workload=~"{{ target }}"
                }[{{ interval }}]
            )
        ) by (le)
    )
EOF

创建 flagger 的 canary,具体的配置内容如下,灰度发布的相关配置信息都定义在这里:

apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: httpbin
  namespace: demo
spec:
  # deployment reference
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: httpbin
  # the maximum time in seconds for the canary deployment
  # to make progress before it is rollback (default 600s)
  progressDeadlineSeconds: 60
  # HPA reference (optional)
  autoscalerRef:
    apiVersion: autoscaling/v2beta1
    kind: HorizontalPodAutoscaler
    name: httpbin
  service:
    # service port number
    port: 8000
    # container port number or name (optional)
    targetPort: 80
    # Istio gateways (optional)
    gateways:
    - public-gateway.istio-system.svc.cluster.local
  analysis:
    # schedule interval (default 60s)
    interval: 30s
    # max number of failed metric checks before rollback
    threshold: 5
    # max traffic percentage routed to canary
    # percentage (0-100)
    maxWeight: 100
    # canary increment step
    # percentage (0-100)
    stepWeight: 20
    metrics:
    - name: request-success-rate
      # minimum req success rate (non 5xx responses)
      # percentage (0-100)
      thresholdRange:
        min: 99
      interval: 1m
    - name: latency
      templateRef:
        name: latency
        namespace: istio-system
      # maximum req duration P99
      # milliseconds
      thresholdRange:
        max: 500
      interval: 30s
    # testing (optional)
    webhooks:
      - name: load-test
        url: http://flagger-loadtester.test/
        timeout: 5s
        metadata:
          cmd: "hey -z 1m -q 10 -c 2 http://httpbin-canary.demo:8000/headers"

创建了 Canary 后,会发现它在集群中自动为 httpbin 创建了一些带 primary 命名的资源,还会创建一个Virtual Service,其路由规则指向 httpbin-primary 和 httpbin-canary 服务:

[root@m1 ~]# kubectl get pods -n demo
NAME                             READY   STATUS    RESTARTS   AGE
httpbin-74fb669cc6-6ztkg         2/2     Running   0          50s
httpbin-74fb669cc6-vfs4h         2/2     Running   0          38s
httpbin-primary-9cb49747-94s4z   2/2     Running   0          3m3s
httpbin-primary-9cb49747-xhpcg   2/2     Running   0          3m3s
sleep-854565cb79-mcmnb           2/2     Running   0          94m
[root@m1 ~]# kubectl get svc -n demo
NAME              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
httpbin           ClusterIP   10.105.17.57    <none>        8000/TCP   86m
httpbin-canary    ClusterIP   10.99.206.196   <none>        8000/TCP   3m14s
httpbin-primary   ClusterIP   10.98.196.235   <none>        8000/TCP   3m14s
sleep             ClusterIP   10.103.14.114   <none>        80/TCP     95m
[root@m1 ~]# kubectl get vs -n demo
NAME      GATEWAYS                                            HOSTS         AGE
httpbin   ["public-gateway.istio-system.svc.cluster.local"]   ["httpbin"]   3m29s
[root@m1 ~]# 

然后我们使用如下命令触发灰度:

[root@m1 ~]# kubectl -n demo set image deployment/httpbin httpbin=httpbin-v2
deployment.apps/httpbin image updated
[root@m1 ~]# 
  • Tips:dep、configmap、secret 都会触发

查看 canary 的事件,可以看到已经检测到新版本了:

[root@m1 ~]# kubectl describe canary httpbin -n demo
...
Events:
  Type     Reason  Age                  From     Message
  ----     ------  ----                 ----     -------
  ...
  Normal   Synced  2m57s                flagger  New revision detected! Scaling up httpbin.demo
  Warning  Synced  27s (x5 over 2m27s)  flagger  canary deployment httpbin.demo not ready: waiting for rollout to finish: 1 out of 2 new replicas have been updated

此时查看 httpbin 的Virtual Service,会发现已经将20%的流量切换到灰度发布版本了:

[root@m1 ~]# kubectl describe vs httpbin -n demo
...
Spec:
  Gateways:
    public-gateway.istio-system.svc.cluster.local
  Hosts:
    httpbin
  Http:
    Route:
      Destination:
        Host:  httpbin-primary
      Weight:  80
      Destination:
        Host:  httpbin-canary
      Weight:  20
Events:        <none>

然后进入 sleep 服务中,使用脚本循环访问 httpbin 服务:

[root@m1 ~]# kubectl exec -it -n demo sleep-854565cb79-mcmnb -c sleep -- sh
/ # while [ 1 ]; do curl http://httpbin.demo:8000/headers;sleep 2s; done

再次查看 httpbin 的Virtual Service,会发现已经将60%的流量切换到灰度发布版本了:

[root@m1 ~]# kubectl describe vs httpbin -n demo
...
Spec:
  Gateways:
    public-gateway.istio-system.svc.cluster.local
  Hosts:
    httpbin
  Http:
    Route:
      Destination:
        Host:  httpbin-primary
      Weight:  40
      Destination:
        Host:  httpbin-canary
      Weight:  60
Events:        <none>

我们可以打开flagger的Grafana:

[root@m1 ~]# kubectl -n istio-system port-forward svc/flagger-grafana 3000:80 --address 192.168.243.138
Forwarding from 192.168.243.138:3000 -> 3000

内置了如下dashboard:
Service Mesh - Istio实战篇(上)

在 Istio Canary Dashboard 可以查看发布过程:
Service Mesh - Istio实战篇(上)

最终将100%的流量切换到灰度发布版本代表发布完成:

[root@m1 ~]# kubectl describe vs httpbin -n demo
...
Spec:
  Gateways:
    public-gateway.istio-system.svc.cluster.local
  Hosts:
    httpbin
  Http:
    Route:
      Destination:
        Host:  httpbin-primary
      Weight:  0
      Destination:
        Host:  httpbin-canary
      Weight:  100
Events:        <none>

从 canary httpbin 的事件日志中也可以看到流量迁移的过程:

[root@m1 ~]# kubectl describe canary httpbin -n demo
  ...
  Normal   Synced  3m44s (x2 over 18m)  flagger  New revision detected! Restarting analysis for httpbin.demo
  Normal   Synced  3m14s (x2 over 18m)  flagger  Starting canary analysis for httpbin.demo
  Normal   Synced  3m14s (x2 over 18m)  flagger  Advance httpbin.demo canary weight 20
  Warning  Synced  2m44s (x2 over 17m)  flagger  Halt advancement no values found for istio metric request-success-rate probably httpbin.demo is not receiving traffic: running query failed: no values found
  Normal   Synced  2m14s                flagger  Advance httpbin.demo canary weight 40
  Normal   Synced  104s                 flagger  Advance httpbin.demo canary weight 60
  Normal   Synced  74s                  flagger  Advance httpbin.demo canary weight 80
  Normal   Synced  44s                  flagger  Advance httpbin.demo canary weight 100

当发布完成后,canary httpbin 的状态就会变更为 Succeeded :

[root@m1 ~]# kubectl get canary -n demo
NAME      STATUS        WEIGHT   LASTTRANSITIONTIME
httpbin   Succeeded   0        2020-12-23T16:03:04Z
[root@m1 ~]# 

提升系统的弹性能力

弹性设计目前在很多领域都很流行,例如环境景观设计中的弹性是指具有一定的灾后恢复能力,但灾难发生之后景观可以快速地恢复它的结构和功能。在产品设计中,一般弹性是指对产品形态特征等设计时,留有一定的余地,方便修改。

分布式系统中的弹性一般是指让系统具有一定的容错能力和应对能力,在故障发生时能够快速恢复,能够应对故障。本小节我们就来为之前部署的示例应用增加一些弹性能力。

系统可用性度量

我们先来了解一个概念:服务级别协议(SLA – Service Level Agreement)。服务级别协议是指提供服务的企业与客户之间就服务的品质、水准、性能等方面所达成的双方共同认可的协议或契约。 例如通常一个服务的提供商都会跟客户保证自己的服务具有什么级别的可用性,也就是我们平时说的几个9的可用性级别。

系统的可用性计算公式:
Service Mesh - Istio实战篇(上)

常见的可用性级别如下:
Service Mesh - Istio实战篇(上)

弹性设计

  • 应对故障的一种方法,就是让系统具有容错和适应能力
  • 防止故障(Fault)转化为失败(Failure)
  • 主要包括:
    • 容错性:重试、幂等
    • 伸缩性:自动水平扩展(autoscaling)
    • 过载保护:超时、熔断、降级、限流
    • 弹性测试:故障注入

Istio 所提供的弹性能力:

  • 超时
  • 重试
  • 熔断
  • 故障注入

为 demo 应用提供弹性能力

首先,我们为 demo 应用创建一个Virtual Service:

[root@m1 ~]# kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
  namespace: demo
spec:
  hosts:
  - "*"
  gateways:
  - httpbin-gateway
  http:
  - route:
    - destination:
        host: httpbin
        port:
          number: 8000
EOF

添加第一个弹性能力:配置超时,配置如下所示:

[root@m1 ~]# kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
  namespace: demo
spec:
  hosts:
  - "*"
  gateways:
  - httpbin-gateway
  http:
  - route:
    - destination:
        host: httpbin
        port:
          number: 8000
    timeout: 1s  # 配置超时
EOF

超时配置规则:

  • timeout & retries.perTryTimout 同时存在时
  • 超时生效 = min (timeout, retry.perTryTimout * retry.attempts)

在超时的基础上,我们还可以配置重试策略:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
  namespace: demo
spec:
  hosts:
  - "*"
  gateways:
  - httpbin-gateway
  http:
  - route:
    - destination:
        host: httpbin
        port:
          number: 8000
    retry:  # 配置重试策略
      attempts: 3
      perTryTimeout: 1s
    timeout: 8s

重试配置项:
Service Mesh - Istio实战篇(上)

  • x-envoy-retry-on:5xx, gateway-error, reset, connect-failure…
  • x-envoy-retry-grpc-on:cancelled, deadline-exceeded, internal, unavailable…

配置熔断:

apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: httpbin
  namespace: demo
spec:
  host: httpbin
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 1
      http:
        http1MaxPendingRequests: 1
        maxRequestsPerConnection: 1
      outlierDetection:
        consecutiveErrors: 1
        interval: 1s
        baseEjectionTime: 3m
        maxEjectionPercent: 100

熔断配置:

  • TCP 和 HTTP 连接池大小为 1
  • 只容许出错 1 次
  • 每秒 1 次请求计数
  • 可以从负载池中移除全部 pod
  • 发生故障的 pod 移除 3m 之后才能再次加入

配置安全策略

Istio 的安全解决方案

Service Mesh - Istio实战篇(上)

Istio 安全架构

Service Mesh - Istio实战篇(上)

实战

对特定的服务(httpbin)创建授权,注意没有配置rule,表示deny当前服务:

[root@m1 ~]# kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: httpbin
  namespace: demo
spec:
  selector:
    matchLabels:
      app: httpbin
EOF

以上配置的意思就是对于这个服务完全不可访问,我们可以测试一下:

# 请求被拒绝
$ kubectl exec -it -n demo ${sleep_pod_name} -c sleep -- curl "http://httpbin.demo:8000/get"
RBAC: access denied  # 响应

# 其他版本可以正常访问
$ kubectl exec -it -n demo ${sleep_pod_name} -c sleep -- curl "http://httpbin-v2.demo:8000/get"

我们可以通过如下配置对请求来源进行限定,例如请求来源必须是 demo 这个命名空间:

[root@m1 ~]# kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: httpbin
 namespace: demo
spec:
 action: ALLOW
 rules:
 - from:
   - source:
       principals: ["cluster.local/ns/demo/sa/sleep"]
   - source:
       namespaces: ["demo"]
EOF

测试:

# 请求通过
$ kubectl exec -it -n demo ${sleep_pod_name} -c sleep -- curl "http://httpbin.demo:8000/get"

# 请求被拒绝
$ kubectl exec -it -n ${other_namespace} ${sleep_pod_name} -c sleep -- curl "http://httpbin.demo:8000/get"

# 修改service account为${other_namespace}后,通过

除了限定请求来源外,还可以限定只有特定的接口允许被访问:

[root@m1 ~]# kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: httpbin
 namespace: demo
spec:
 action: ALLOW
 rules:
 - from:
   - source:
       principals: ["cluster.local/ns/demo/sa/sleep"]
   - source:
       namespaces: ["demo"]
   to:
   - operation:
       methods: ["GET"]
       paths: ["/get"]
EOF

测试:

# 请求通过
$ kubectl exec -it -n demo ${sleep_pod_name} -c sleep -- curl "http://httpbin.demo:8000/get"

# 请求被拒绝
$ kubectl exec -it -n demo ${sleep_pod_name} -c sleep -- curl "http://httpbin.demo:8000/ip"

还可以配置其他特定条件,例如限定请求头,通常用于我们需要客户端携带特定的请求头才允许访问接口的场景:

[root@m1 ~]# kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
 name: httpbin
 namespace: demo
spec:
 action: ALLOW
 rules:
 - from:
   - source:
       principals: ["cluster.local/ns/demo/sa/sleep"]
   - source:
       namespaces: ["demo"]
   to:
   - operation:
       methods: ["GET"]
       paths: ["/get"]
   when:
   - key: request.headers[x-rfma-token]
     values: ["test*"]
EOF

测试:

# 请求不通过
$ kubectl exec -it -n demo ${sleep_pod_name} -c sleep -- curl "http://httpbin.demo:8000/get"

# 加token后通过
$ kubectl exec -it -n demo ${sleep_pod_name} -c sleep -- curl "http://httpbin.demo:8000/get" -H x-rfma-token:test1

下篇: