1.错误的容器映像/无效的注册表权限

最常见的两个问题是(a)指定了错误的容器映像,以及(b)尝试在不提供注册表凭据的情况下使用私有映像。在开始使用Kubernetes或首次连接CI / CD时,这些技巧特别棘手。

让我们来看一个例子。首先,我们将创建一个名为fail不存在的Docker映像的部署:

<span style="color:#f8f8f2"><code class="language-bash">
$ kubectl run fail --image<span style="color:#f8f8f2">=</span>rosskukulinski/dne:v1.0.0
</code></span>

然后,我们可以检查Pod,看看有一个状态为ErrImagePull或的Pod ImagePullBackOff

<span style="color:#f8f8f2"><code class="language-bash">
$ kubectl get pods
NAME                    READY     STATUS             RESTARTS   AGE
fail-1036623984-hxoas   0/1       ImagePullBackOff   0          2m
</code></span>

有关其他信息,我们可以describe对失败的Pod进行操作:

<span style="color:#f8f8f2"><code class="language-bash">
$ kubectl describe pod fail-1036623984-hxoas
</code></span>

如果我们查看命令Events输出的部分,describe我们将看到类似以下内容:

<span style="color:#f8f8f2"><code class="language-bash">
Events:
  FirstSeen    LastSeen    Count   From                        SubObjectPath       Type        Reason      Message
  ---------    --------    -----   ----                        -------------       --------    ------      -------
  5m        5m      1   <span style="color:#f8f8f2">{</span>default-scheduler <span style="color:#f8f8f2">}</span>                            Normal      Scheduled   Successfully assigned fail-1036623984-hxoas to gke-nrhk-1-default-pool-a101b974-wfp7
  5m        2m      5   <span style="color:#f8f8f2">{</span>kubelet gke-nrhk-1-default-pool-a101b974-wfp7<span style="color:#f8f8f2">}</span> spec.containers<span style="color:#f8f8f2">{</span>fail<span style="color:#f8f8f2">}</span>   Normal      Pulling     pulling image <span style="color:#a6e22e">"rosskukulinski/dne:v1.0.0"</span>
  5m        2m      5   <span style="color:#f8f8f2">{</span>kubelet gke-nrhk-1-default-pool-a101b974-wfp7<span style="color:#f8f8f2">}</span> spec.containers<span style="color:#f8f8f2">{</span>fail<span style="color:#f8f8f2">}</span>   Warning     Failed      Failed to pull image <span style="color:#a6e22e">"rosskukulinski/dne:v1.0.0"</span><span style="color:#66d9ef">:</span> Error: image rosskukulinski/dne not found
  5m        2m      5   <span style="color:#f8f8f2">{</span>kubelet gke-nrhk-1-default-pool-a101b974-wfp7<span style="color:#f8f8f2">}</span>             Warning     FailedSync  Error syncing pod, skipping: failed to <span style="color:#a6e22e">"StartContainer"</span> <span style="color:#66d9ef">for</span> <span style="color:#a6e22e">"fail"</span> with ErrImagePull: <span style="color:#a6e22e">"Error: image rosskukulinski/dne not found"</span>

  5m    11s 19  <span style="color:#f8f8f2">{</span>kubelet gke-nrhk-1-default-pool-a101b974-wfp7<span style="color:#f8f8f2">}</span> spec.containers<span style="color:#f8f8f2">{</span>fail<span style="color:#f8f8f2">}</span>   Normal  BackOff     Back-off pulling image <span style="color:#a6e22e">"rosskukulinski/dne:v1.0.0"</span>
  5m    11s 19  <span style="color:#f8f8f2">{</span>kubelet gke-nrhk-1-default-pool-a101b974-wfp7<span style="color:#f8f8f2">}</span>             Warning FailedSync  Error syncing pod, skipping: failed to <span style="color:#a6e22e">"StartContainer"</span> <span style="color:#66d9ef">for</span> <span style="color:#a6e22e">"fail"</span> with ImagePullBackOff: <span style="color:#a6e22e">"Back-off pulling image \"rosskukulinski/dne:v1.0.0\""</span>
</code></span>

错误字符串Failed to pull image "rosskukulinski/dne:v1.0.0": Error: image rosskukulinski/dne not found告诉我们Kubernetes无法找到该图像rosskukulinski/dne:v1.0.0

那么问题来了:为什么Kubernetes不能拉图像?

除了网络连接问题外,还有三大罪魁祸首:

  • 图片标签不正确
  • 该映像不存在(或在其他注册表中)
  • Kubernetes无权拉该映像

如果您在图像标签中没有发现错字,那么该使用本地计算机进行测试了。

我通常从docker pull在本地开发机器上运行具有完全相同的图像标签开始。在这种情况下,我将运行docker pull rosskukulinski/dne:v1.0.0

  • If this succeeds, then it probably means that Kubernetes doesn't have correct permissions to pull that image. Go read up on Image Pull Secrets to fix this issue.
  • If the exact image tag fails, then I will test without an explicit image tag - docker pull rosskukulinski/dne - which will attempt to pull the latest tag. If this succeeds, then that means the original tag specified doesn't exist. This could be due to human error, typo, or maybe a misconfiguration of the CI/CD system.

如果docker pull rosskukulinski/dne没有一个确切的标签)失败,那么我们有一个更大的问题-这种形象不存在,在所有我们的形象注册表。默认情况下,Kubernetes使用Dockerhub注册表。如果您使用Quay.ioAWS ECRGoogle Container Registry,则需要在图像字符串中指定注册表URL。例如,在Quay上,图像将为quay.io/rosskukulinski/dne:v1.0.0

如果您正在使用Dockerhub,那么你应该仔细检查所发布图片到注册表系统。确保名称和标记与您的部署要使用的名称匹配。

注意:丢失映像和不正确的注册表权限之间的Pod状态没有明显的区别。无论哪种情况,Kubernetes都会报告Pod的ErrImagePull状态。

2.启动后应用程序崩溃

无论是在Kubernetes上启动新应用程序还是迁移现有平台,启动时应用程序崩溃都是很常见的。

让我们用一个在1秒后崩溃的应用程序创建一个新的Deployment:

<span style="color:#f8f8f2"><code class="language-bash">
$ kubectl run crasher --image<span style="color:#f8f8f2">=</span>rosskukulinski/crashing-app
</code></span>

然后,让我们看一下Pod的状态:

<span style="color:#f8f8f2"><code class="language-bash">
$ kubectl get pods
NAME                       READY     STATUS             RESTARTS   AGE
crasher-2443551393-vuehs   0/1       CrashLoopBackOff   2          54s
</code></span>

好的,所以CrashLoopBackOff告诉我们,Kuberenetes正在尝试启动此Pod,但是一个或多个容器崩溃或被杀死。

让我们describe来获得更多信息:

<span style="color:#f8f8f2"><code class="language-bash">
$ kubectl describe pod crasher-2443551393-vuehs
Name:        crasher-2443551393-vuehs
Namespace:    fail
Node:        gke-nrhk-1-default-pool-a101b974-wfp7/10.142.0.2
Start Time:    Fri, 10 Feb 2017 14:20:29 -0500
Labels:        pod-template-hash<span style="color:#f8f8f2">=</span>2443551393
        run<span style="color:#f8f8f2">=</span>crasher
Status:        Running
IP:        10.0.0.74
Controllers:    ReplicaSet/crasher-2443551393
Containers:
  crasher:
    Container ID:    docker://51c940ab32016e6d6b5ed28075357661fef3282cb3569117b0f815a199d01c60
    Image:        rosskukulinski/crashing-app
    Image ID:        docker://sha256:cf7452191b34d7797a07403d47a1ccf5254741d4bb356577b8a5de40864653a5
    Port:        
    State:        Terminated
      Reason:        Error
      Exit Code:    1
      Started:        Fri, 10 Feb 2017 14:22:24 -0500
      Finished:        Fri, 10 Feb 2017 14:22:26 -0500
    Last State:        Terminated
      Reason:        Error
      Exit Code:    1
      Started:        Fri, 10 Feb 2017 14:21:39 -0500
      Finished:        Fri, 10 Feb 2017 14:21:40 -0500
    Ready:        False
    Restart Count:    4
<span style="color:#f8f8f2">..</span>.
</code></span>

太棒了!Kubernetes告诉我们,此Pod是Terminated由于容器内的应用程序崩溃而导致的。具体来说,我们可以看到该应用程序Exit Code1。我们可能还会看到一个OOMKilled错误,但是稍后再解决。

所以我们的应用程序崩溃了……为什么?

我们可以做的第一件事是检查应用程序日志。假设您要将应用程序日志发送到stdout(应该是!),您可以使用查看应用程序日志kubectl logs

<span style="color:#f8f8f2"><code class="language-bash">
$ kubectl logs crasher-2443551393-vuehs
</code></span>

不幸的是,这个Pod似乎没有任何日志数据。我们可能正在查看一个新近启动的应用程序实例,因此我们应该检查以前的容器:

<span style="color:#f8f8f2"><code class="language-bash">
$ kubectl logs crasher-2443551393-vuehs --previous
</code></span>

老鼠!我们的应用程序仍然没有给我们任何帮助。现在可能是在启动时添加一些其他日志消息以帮助调试问题的时候了。我们可能还想尝试在本地运行容器,以查看是否缺少环境变量或装入的卷。

3.缺少ConfigMap或Secret

Kubernetes最佳实践建议通过ConfigMapsSecrets传递应用程序运行时配置。该数据可以包括数据库凭证,API端点或其他配置标志。

我见过开发人员经常犯的一个错误是创建引用不存在甚至不存在的ConfigMap / Secrets的ConfigMap或Secrets属性的Deployment。

让我们看看它可能是什么样。

缺少ConfigMap

对于第一个示例,我们将尝试创建一个将ConfigMap数据加载为环境变量的Pod。

<span style="color:#f8f8f2"><code class="language-yaml">
<span style="color:slategray"># configmap-pod.yaml</span>
<span style="color:#e6db74">apiVersion</span><span style="color:#f8f8f2">:</span> v1
<span style="color:#e6db74">kind</span><span style="color:#f8f8f2">:</span> Pod
<span style="color:#e6db74">metadata</span><span style="color:#f8f8f2">:</span>
  <span style="color:#e6db74">name</span><span style="color:#f8f8f2">:</span> configmap<span style="color:#f8f8f2">-</span>pod
<span style="color:#e6db74">spec</span><span style="color:#f8f8f2">:</span>
  <span style="color:#e6db74">containers</span><span style="color:#f8f8f2">:</span>
    <span style="color:#f8f8f2">-</span> <span style="color:#e6db74">name</span><span style="color:#f8f8f2">:</span> test<span style="color:#f8f8f2">-</span>container
      <span style="color:#e6db74">image</span><span style="color:#f8f8f2">:</span> gcr.io/google_containers/busybox
      <span style="color:#e6db74">command</span><span style="color:#f8f8f2">:</span> <span style="color:#f8f8f2">[</span> <span style="color:#a6e22e">"/bin/sh"</span><span style="color:#f8f8f2">,</span> <span style="color:#a6e22e">"-c"</span><span style="color:#f8f8f2">,</span> <span style="color:#a6e22e">"env"</span> <span style="color:#f8f8f2">]</span>
      <span style="color:#e6db74">env</span><span style="color:#f8f8f2">:</span>
        <span style="color:#f8f8f2">-</span> <span style="color:#e6db74">name</span><span style="color:#f8f8f2">:</span> SPECIAL_LEVEL_KEY
          <span style="color:#e6db74">valueFrom</span><span style="color:#f8f8f2">:</span>
            <span style="color:#e6db74">configMapKeyRef</span><span style="color:#f8f8f2">:</span>
              <span style="color:#e6db74">name</span><span style="color:#f8f8f2">:</span> special<span style="color:#f8f8f2">-</span>config
              <span style="color:#e6db74">key</span><span style="color:#f8f8f2">:</span> special.how

</code></span>

让我们创建一个Pod kubectl create -f configmap-pod.yaml。等待几分钟后,我们可以窥视我们的吊舱:

<span style="color:#f8f8f2"><code class="language-bash">
$ kubectl get pods
NAME            READY     STATUS              RESTARTS   AGE
configmap-pod   0/1       RunContainerError   0          3s
</code></span>

我们的Pod的状态为RunContainerError。我们可以kubectl describe用来了解更多信息:

<span style="color:#f8f8f2"><code class="language-bash">
$ kubectl describe pod configmap-pod
<span style="color:#f8f8f2">[</span><span style="color:#f8f8f2">..</span>.<span style="color:#f8f8f2">]</span>
Events:
  FirstSeen    LastSeen    Count   From                        SubObjectPath           Type        Reason      Message
  ---------    --------    -----   ----                        -------------           --------    ------      -------
  20s        20s     1   <span style="color:#f8f8f2">{</span>default-scheduler <span style="color:#f8f8f2">}</span>                                Normal      Scheduled   Successfully assigned configmap-pod to gke-ctm-1-sysdig2-35e99c16-tgfm
  19s        2s      3   <span style="color:#f8f8f2">{</span>kubelet gke-ctm-1-sysdig2-35e99c16-tgfm<span style="color:#f8f8f2">}</span>   spec.containers<span style="color:#f8f8f2">{</span>test-container<span style="color:#f8f8f2">}</span> Normal      Pulling     pulling image <span style="color:#a6e22e">"gcr.io/google_containers/busybox"</span>
  18s        2s      3   <span style="color:#f8f8f2">{</span>kubelet gke-ctm-1-sysdig2-35e99c16-tgfm<span style="color:#f8f8f2">}</span>   spec.containers<span style="color:#f8f8f2">{</span>test-container<span style="color:#f8f8f2">}</span> Normal      Pulled      Successfully pulled image <span style="color:#a6e22e">"gcr.io/google_containers/busybox"</span>
  18s        2s      3   <span style="color:#f8f8f2">{</span>kubelet gke-ctm-1-sysdig2-35e99c16-tgfm<span style="color:#f8f8f2">}</span>                   Warning     FailedSync  Error syncing pod, skipping: failed to <span style="color:#a6e22e">"StartContainer"</span> <span style="color:#66d9ef">for</span> <span style="color:#a6e22e">"test-container"</span> with RunContainerError: <span style="color:#a6e22e">"GenerateRunContainerOptions: configmaps \"special-config\" not found"</span>
</code></span>

Events节的最后一项解释了哪里出了问题。Pod尝试访问名为的ConfigMap special-config,但在此命名空间中找不到。创建ConfigMap后,Pod应该重新启动并提取运行时数据。

将Pod作为规范中的环境变量访问Secrets会导致类似的错误,就像我们在ConfigMaps中看到的那样。

但是,如果要通过卷访问Secret或ConfigMap,该怎么办?

失踪的秘密

这是一个Pod规范,它引用一个Secret命名,myothersecret并尝试将其作为卷挂载。

<span style="color:#f8f8f2"><code class="language-yaml">
<span style="color:slategray"># missing-secret.yaml</span>
<span style="color:#e6db74">apiVersion</span><span style="color:#f8f8f2">:</span> v1
<span style="color:#e6db74">kind</span><span style="color:#f8f8f2">:</span> Pod
<span style="color:#e6db74">metadata</span><span style="color:#f8f8f2">:</span>
  <span style="color:#e6db74">name</span><span style="color:#f8f8f2">:</span> secret<span style="color:#f8f8f2">-</span>pod
<span style="color:#e6db74">spec</span><span style="color:#f8f8f2">:</span>
  <span style="color:#e6db74">containers</span><span style="color:#f8f8f2">:</span>
    <span style="color:#f8f8f2">-</span> <span style="color:#e6db74">name</span><span style="color:#f8f8f2">:</span> test<span style="color:#f8f8f2">-</span>container
      <span style="color:#e6db74">image</span><span style="color:#f8f8f2">:</span> gcr.io/google_containers/busybox
      <span style="color:#e6db74">command</span><span style="color:#f8f8f2">:</span> <span style="color:#f8f8f2">[</span> <span style="color:#a6e22e">"/bin/sh"</span><span style="color:#f8f8f2">,</span> <span style="color:#a6e22e">"-c"</span><span style="color:#f8f8f2">,</span> <span style="color:#a6e22e">"env"</span> <span style="color:#f8f8f2">]</span>
      <span style="color:#e6db74">volumeMounts</span><span style="color:#f8f8f2">:</span>
        <span style="color:#f8f8f2">-</span> <span style="color:#e6db74">mountPath</span><span style="color:#f8f8f2">:</span> /etc/secret/
          <span style="color:#e6db74">name</span><span style="color:#f8f8f2">:</span> myothersecret
  <span style="color:#e6db74">restartPolicy</span><span style="color:#f8f8f2">:</span> Never
  <span style="color:#e6db74">volumes</span><span style="color:#f8f8f2">:</span>
    <span style="color:#f8f8f2">-</span> <span style="color:#e6db74">name</span><span style="color:#f8f8f2">:</span> myothersecret
      <span style="color:#e6db74">secret</span><span style="color:#f8f8f2">:</span>
        <span style="color:#e6db74">secretName</span><span style="color:#f8f8f2">:</span> myothersecret
</code></span>

让我们用创建一个Pod kubectl create -f missing-secret.yaml

几分钟后,当我们拿到Pods时,我们会看到它仍然处于状态ContainerCreating

<span style="color:#f8f8f2"><code class="language-bash">
$ kubectl get pods
NAME            READY     STATUS              RESTARTS   AGE
secret-pod   0/1       ContainerCreating   0          4h
</code></span>

奇怪...让我们describe在Pod看看发生了什么事。

<span style="color:#f8f8f2"><code class="language-bash">
$ kubectl describe pod secret-pod
Name:        secret-pod
Namespace:    fail
Node:        gke-ctm-1-sysdig2-35e99c16-tgfm/10.128.0.2
Start Time:    Sat, 11 Feb 2017 14:07:13 -0500
Labels:        
Status:        Pending
IP:        
Controllers:    

<span style="color:#f8f8f2">[</span><span style="color:#f8f8f2">..</span>.<span style="color:#f8f8f2">]</span>

Events:
  FirstSeen    LastSeen    Count   From                        SubObjectPath   Type        Reason      Message
  ---------    --------    -----   ----                        -------------   --------    ------      -------
  18s        18s     1   <span style="color:#f8f8f2">{</span>default-scheduler <span style="color:#f8f8f2">}</span>                        Normal      Scheduled   Successfully assigned secret-pod to gke-ctm-1-sysdig2-35e99c16-tgfm
  18s        2s      6   <span style="color:#f8f8f2">{</span>kubelet gke-ctm-1-sysdig2-35e99c16-tgfm<span style="color:#f8f8f2">}</span>           Warning     FailedMount MountVolume.SetUp failed <span style="color:#66d9ef">for</span> volume <span style="color:#a6e22e">"kubernetes.io/secret/337281e7-f065-11e6-bd01-42010af0012c-myothersecret"</span> <span style="color:#f8f8f2">(</span>spec.Name: <span style="color:#a6e22e">"myothersecret"</span><span style="color:#f8f8f2">)</span> pod <span style="color:#a6e22e">"337281e7-f065-11e6-bd01-42010af0012c"</span> <span style="color:#f8f8f2">(</span>UID: <span style="color:#a6e22e">"337281e7-f065-11e6-bd01-42010af0012c"</span><span style="color:#f8f8f2">)</span> with: secrets <span style="color:#a6e22e">"myothersecret"</span> not found
</code></span>

Events部分再次说明了问题。告诉我们,Kubelet无法从机密装入卷myothersecret。要解决此问题,请创建myothersecret包含必需的安全凭据的文件。一旦myothersecret被创建,容器将正常启动。

4.活动/准备就绪探针故障

An important lesson for developers to learn when working with containers and Kubernetes is that just because your application container is running, doesn't mean that it's working.

Kubernetes provides two essential features called Liveness Probes and Readiness Probes. Essentially, Liveness/Readiness Probes will periodically perform an action (e.g. make an HTTP request, open a tcp connection, or run a command in your container) to confirm that your application is working as intended.

If the Liveness Probe fails, Kubernetes will kill your container and create a new one. If the Readiness Probe fails, that Pod will not be available as a Service endpoint, meaning no traffic will be sent to that Pod until it becomes Ready.

如果您尝试将未通过“活跃性/就绪性”探针的更改部署到您的应用程序,则滚动部署将挂起,因为它等待所有Pod都准备就绪。

那看起来像什么呢?这是一个Pod规范,该规范定义了“活动与就绪”探针,该探针检查/healthz端口8080上的HTTP响应是否正常。

<span style="color:#f8f8f2"><code class="language-yaml">
<span style="color:#e6db74">apiVersion</span><span style="color:#f8f8f2">:</span> v1
<span style="color:#e6db74">kind</span><span style="color:#f8f8f2">:</span> Pod
<span style="color:#e6db74">metadata</span><span style="color:#f8f8f2">:</span>
  <span style="color:#e6db74">name</span><span style="color:#f8f8f2">:</span> liveness<span style="color:#f8f8f2">-</span>pod
<span style="color:#e6db74">spec</span><span style="color:#f8f8f2">:</span>
  <span style="color:#e6db74">containers</span><span style="color:#f8f8f2">:</span>
    <span style="color:#f8f8f2">-</span> <span style="color:#e6db74">name</span><span style="color:#f8f8f2">:</span> test<span style="color:#f8f8f2">-</span>container
      <span style="color:#e6db74">image</span><span style="color:#f8f8f2">:</span> rosskukulinski/leaking<span style="color:#f8f8f2">-</span>app
      <span style="color:#e6db74">livenessProbe</span><span style="color:#f8f8f2">:</span>
        <span style="color:#e6db74">httpGet</span><span style="color:#f8f8f2">:</span>
          <span style="color:#e6db74">path</span><span style="color:#f8f8f2">:</span> /healthz
          <span style="color:#e6db74">port</span><span style="color:#f8f8f2">:</span> <span style="color:#ae81ff">8080</span>
        <span style="color:#e6db74">initialDelaySeconds</span><span style="color:#f8f8f2">:</span> <span style="color:#ae81ff">3</span>
        <span style="color:#e6db74">periodSeconds</span><span style="color:#f8f8f2">:</span> <span style="color:#ae81ff">3</span>
      <span style="color:#e6db74">readinessProbe</span><span style="color:#f8f8f2">:</span>
        <span style="color:#e6db74">httpGet</span><span style="color:#f8f8f2">:</span>
          <span style="color:#e6db74">path</span><span style="color:#f8f8f2">:</span> /healthz
          <span style="color:#e6db74">port</span><span style="color:#f8f8f2">:</span> <span style="color:#ae81ff">8080</span>
        <span style="color:#e6db74">initialDelaySeconds</span><span style="color:#f8f8f2">:</span> <span style="color:#ae81ff">3</span>
        <span style="color:#e6db74">periodSeconds</span><span style="color:#f8f8f2">:</span> <span style="color:#ae81ff">3</span>
</code></span>

让我们创建这个Pod,kubectl create -f liveness.yaml然后看几分钟后会发生什么:

<span style="color:#f8f8f2"><code class="language-bash">
$ kubectl get pods
NAME           READY     STATUS    RESTARTS   AGE
liveness-pod   0/1       Running   4          2m
</code></span>

2分钟后,我们可以看到我们的Pod仍未“就绪”,并且已经重新启动了四次。让我们describe在Pod中获取更多信息。

<span style="color:#f8f8f2"><code class="language-bash">
$ kubectl describe pod liveness-pod
Name:        liveness-pod
Namespace:    fail
Node:        gke-ctm-1-sysdig2-35e99c16-tgfm/10.128.0.2
Start Time:    Sat, 11 Feb 2017 14:32:36 -0500
Labels:        
Status:        Running
IP:        10.108.88.40
Controllers:    
Containers:
  test-container:
    Container ID:    docker://8fa6f99e6fda6e56221683249bae322ed864d686965dc44acffda6f7cf186c7b
    Image:        rosskukulinski/leaking-app
    Image ID:        docker://sha256:7bba8c34dad4ea155420f856cd8de37ba9026048bd81f3a25d222fd1d53da8b7
    Port:        
    State:        Running
      Started:        Sat, 11 Feb 2017 14:40:34 -0500
    Last State:        Terminated
      Reason:        Error
      Exit Code:    137
      Started:        Sat, 11 Feb 2017 14:37:10 -0500
      Finished:        Sat, 11 Feb 2017 14:37:45 -0500
<span style="color:#f8f8f2">[</span><span style="color:#f8f8f2">..</span>.<span style="color:#f8f8f2">]</span>
Events:
  FirstSeen    LastSeen    Count   From                        SubObjectPath           Type        Reason      Message
  ---------    --------    -----   ----                        -------------           --------    ------      -------
  8m        8m      1   <span style="color:#f8f8f2">{</span>default-scheduler <span style="color:#f8f8f2">}</span>                                Normal      Scheduled   Successfully assigned liveness-pod to gke-ctm-1-sysdig2-35e99c16-tgfm
  8m        8m      1   <span style="color:#f8f8f2">{</span>kubelet gke-ctm-1-sysdig2-35e99c16-tgfm<span style="color:#f8f8f2">}</span>   spec.containers<span style="color:#f8f8f2">{</span>test-container<span style="color:#f8f8f2">}</span> Normal      Created     Created container with docker <span style="color:#e6db74">id</span> 0fb5f1a56ea0<span style="color:#f8f8f2">;</span> Security:<span style="color:#f8f8f2">[</span>seccomp<span style="color:#f8f8f2">=</span>unconfined<span style="color:#f8f8f2">]</span>
  8m        8m      1   <span style="color:#f8f8f2">{</span>kubelet gke-ctm-1-sysdig2-35e99c16-tgfm<span style="color:#f8f8f2">}</span>   spec.containers<span style="color:#f8f8f2">{</span>test-container<span style="color:#f8f8f2">}</span> Normal      Started     Started container with docker <span style="color:#e6db74">id</span> 0fb5f1a56ea0
  7m        7m      1   <span style="color:#f8f8f2">{</span>kubelet gke-ctm-1-sysdig2-35e99c16-tgfm<span style="color:#f8f8f2">}</span>   spec.containers<span style="color:#f8f8f2">{</span>test-container<span style="color:#f8f8f2">}</span> Normal      Created     Created container with docker <span style="color:#e6db74">id</span> 3f2392e9ead9<span style="color:#f8f8f2">;</span> Security:<span style="color:#f8f8f2">[</span>seccomp<span style="color:#f8f8f2">=</span>unconfined<span style="color:#f8f8f2">]</span>
  7m        7m      1   <span style="color:#f8f8f2">{</span>kubelet gke-ctm-1-sysdig2-35e99c16-tgfm<span style="color:#f8f8f2">}</span>   spec.containers<span style="color:#f8f8f2">{</span>test-container<span style="color:#f8f8f2">}</span> Normal      Killing     Killing container with docker <span style="color:#e6db74">id</span> 0fb5f1a56ea0: pod <span style="color:#a6e22e">"liveness-pod_fail(d75469d8-f090-11e6-bd01-42010af0012c)"</span> container <span style="color:#a6e22e">"test-container"</span> is unhealthy, it will be killed and re-created.
  8m    16s 10  <span style="color:#f8f8f2">{</span>kubelet gke-ctm-1-sysdig2-35e99c16-tgfm<span style="color:#f8f8f2">}</span>   spec.containers<span style="color:#f8f8f2">{</span>test-container<span style="color:#f8f8f2">}</span> Warning Unhealthy   Liveness probe failed: Get http://10.108.88.40:8080/healthz: dial tcp 10.108.88.40:8080: getsockopt: connection refused
  8m    1s  85  <span style="color:#f8f8f2">{</span>kubelet gke-ctm-1-sysdig2-35e99c16-tgfm<span style="color:#f8f8f2">}</span>   spec.containers<span style="color:#f8f8f2">{</span>test-container<span style="color:#f8f8f2">}</span> Warning Unhealthy   Readiness probe failed: Get http://10.108.88.40:8080/healthz: dial tcp 10.108.88.40:8080: getsockopt: connection refused
</code></span>

再次,该Events部分进行了救援。我们可以看到“准备就绪”和“活跃度”探针都失败了。要查找的关键字符串是container "test-container" is unhealthy, it will be killed and re-created。这告诉我们Kubernetes正在杀死容器,因为“活动性探测”失败了。

可能存在三种可能性:

  1. 您的探测现在不正确-健康URL是否更改?
  2. 您的探针过于敏感-您的应用程序是否需要花一些时间才能启动或响应?
  3. 您的应用程序不再正确响应探针-数据库配置是否错误?

查看Pod中的日志是开始调试的好地方。解决此问题后,全新的部署应会成功。

5.超出CPU /内存限制

Kubernetes使集群管理员可以限制分配给Pod和Containers的CPU或内存。作为应用程序开发人员,您可能不知道限制,然后在部署失败时会感到惊讶。

让我们尝试在具有未知CPU /内存请求限制的群集中创建此Deployment:

<span style="color:#f8f8f2"><code class="language-yaml">
<span style="color:slategray"># gateway.yaml</span>
<span style="color:#e6db74">apiVersion</span><span style="color:#f8f8f2">:</span> extensions/v1beta1
<span style="color:#e6db74">kind</span><span style="color:#f8f8f2">:</span> Deployment
<span style="color:#e6db74">metadata</span><span style="color:#f8f8f2">:</span>
  <span style="color:#e6db74">name</span><span style="color:#f8f8f2">:</span> gateway
<span style="color:#e6db74">spec</span><span style="color:#f8f8f2">:</span>
  <span style="color:#e6db74">template</span><span style="color:#f8f8f2">:</span>
    <span style="color:#e6db74">metadata</span><span style="color:#f8f8f2">:</span>
      <span style="color:#e6db74">labels</span><span style="color:#f8f8f2">:</span>
        <span style="color:#e6db74">app</span><span style="color:#f8f8f2">:</span> gateway
    <span style="color:#e6db74">spec</span><span style="color:#f8f8f2">:</span>
      <span style="color:#e6db74">containers</span><span style="color:#f8f8f2">:</span>
        <span style="color:#f8f8f2">-</span> <span style="color:#e6db74">name</span><span style="color:#f8f8f2">:</span> test<span style="color:#f8f8f2">-</span>container
          <span style="color:#e6db74">image</span><span style="color:#f8f8f2">:</span> nginx
          <span style="color:#e6db74">resources</span><span style="color:#f8f8f2">:</span>
            <span style="color:#e6db74">requests</span><span style="color:#f8f8f2">:</span>
              <span style="color:#e6db74">memory</span><span style="color:#f8f8f2">:</span> 5Gi
</code></span>

您会注意到,我们正在设置5Gi 的资源请求。让我们创建部署:kubectl create -f gateway.yaml

现在我们来看看我们的Pod:

<span style="color:#f8f8f2"><code class="language-bash">
$ kubectl get pods
No resources found.
</code></span>

??让我们使用describe以下命令检查部署:

<span style="color:#f8f8f2"><code class="language-bash">
$ kubectl describe deployment/gateway
Name:            gateway
Namespace:        fail
CreationTimestamp:    Sat, 11 Feb 2017 15:03:34 -0500
Labels:            app<span style="color:#f8f8f2">=</span>gateway
Selector:        app<span style="color:#f8f8f2">=</span>gateway
Replicas:        0 updated <span style="color:#f8f8f2">|</span> 1 total <span style="color:#f8f8f2">|</span> 0 available <span style="color:#f8f8f2">|</span> 1 unavailable
StrategyType:        RollingUpdate
MinReadySeconds:    0
RollingUpdateStrategy:    0 max unavailable, 1 max surge
OldReplicaSets:        
NewReplicaSet:        gateway-764140025 <span style="color:#f8f8f2">(</span>0/1 replicas created<span style="color:#f8f8f2">)</span>
Events:
  FirstSeen    LastSeen    Count   From                SubObjectPath   Type        Reason          Message
  ---------    --------    -----   ----                -------------   --------    ------          -------
  4m        4m      1   <span style="color:#f8f8f2">{</span>deployment-controller <span style="color:#f8f8f2">}</span>            Normal      ScalingReplicaSet   Scaled up replica <span style="color:#66d9ef">set</span> gateway-764140025 to 1
</code></span>

在最后一行的基础上,我们的部署创建了一个ReplicaSet(gateway-764140025)并将其缩放到1。ReplicaSet是管理Pod生命周期的实体。我们可以describe设置ReplicaSet:

<span style="color:#f8f8f2"><code class="language-bash">
$ kubectl describe rs/gateway-764140025
Name:        gateway-764140025
Namespace:    fail
Image<span style="color:#f8f8f2">(</span>s<span style="color:#f8f8f2">)</span>:    nginx
Selector:    app<span style="color:#f8f8f2">=</span>gateway,pod-template-hash<span style="color:#f8f8f2">=</span>764140025
Labels:        app<span style="color:#f8f8f2">=</span>gateway
        pod-template-hash<span style="color:#f8f8f2">=</span>764140025
Replicas:    0 current / 1 desired
Pods Status:    0 Running / 0 Waiting / 0 Succeeded / 0 Failed
No volumes.
Events:
  FirstSeen    LastSeen    Count   From                SubObjectPath   Type        Reason      Message
  ---------    --------    -----   ----                -------------   --------    ------      -------
  6m        28s     15  <span style="color:#f8f8f2">{</span>replicaset-controller <span style="color:#f8f8f2">}</span>            Warning     FailedCreate    Error creating: pods <span style="color:#a6e22e">"gateway-764140025-"</span> is forbidden: <span style="color:#f8f8f2">[</span>maximum memory usage per Pod is 100Mi, but request is 5368709120., maximum memory usage per Container is 100Mi, but request is 5Gi.<span style="color:#f8f8f2">]</span>
</code></span>

啊!好了 群集管理员已将每个Pod的最大内存使用量设置为100Mi(可恶!)。您可以通过运行来检查当前的名称空间限制kubectl describe limitrange

现在,您有三个选择:

  1. 要求您的集群管理员增加限制
  2. 减少部署的请求或限制设置
  3. 随意一点并 限制编辑(kubectl editFTW!)